Page MenuHomePhabricator

Nuance Source stuff - Twitter Public Stream and Twitter User Stream
Needs RevisionPublic

Authored by btrahan on Dec 5 2013, 11:49 PM.
Tags
None
Referenced Files
F14449558: D7723.diff
Thu, Dec 26, 10:32 PM
Unknown Object (File)
Sat, Dec 21, 8:32 PM
Unknown Object (File)
Fri, Dec 20, 8:45 PM
Unknown Object (File)
Thu, Dec 12, 10:53 AM
Unknown Object (File)
Sun, Dec 8, 5:36 PM
Unknown Object (File)
Fri, Dec 6, 6:50 AM
Unknown Object (File)
Wed, Dec 4, 4:46 PM
Unknown Object (File)
Mon, Dec 2, 11:49 AM
Tokens
"Love" token, awarded by hach-que."Baby Tequila" token, awarded by chad.

Details

Reviewers
epriestley
Summary

This lets you create Nuance Sources for Twitter Public Streams ("all tweets with the word 'phabricator' in it") and Twitter User Streams ("all tweets that mention @phabricator; all tweets by @phabricator; all direct messages to @phabricator, all new followers of @phabricator, etc.) Creation and editing of these is all nice and whatnot.

Once created, they don't do anything though. For that, see the included TwitterSourceDaemon. I can not for the life of my figure out how to use HTTPFuture et al to pull data off these long lived connections...? I've gotten real output once or twice, but its an error about how the return value isn't actually JSON (and it looks like a JSON string...) Is there like a "give me the data that's hit this connection so far" method?

Test Plan

created sources - yay. swapped source types - got error message on an edit and no message on a new one. tried out all error paths on edit and got appropriate errors.

Diff Detail

Branch
n4
Lint
Lint Passed
Unit
Tests Passed

Event Timeline

Let me send you something for streaming reads.

Just saw your update. Was typing this missive, which is obsoleted by handing me streaming stuff --

On the daemon bit, there should architecturally be two pieces to keep twitter happy

  • reading as fast as possible, while reacting nicely to timeouts, keeping only a single connection per account open, generally not opening a bunch of connections, etc
  • actually creating issues, running herald rules, whatever

The provided daemon was just going for reading...

Given this break down, I find it wildly intriguing to use something open source for the "reading" bit that would then just call into Conduit to write a lightweight version of what was just read. (https://dev.twitter.com/docs/open-source-examples ) PHP versions all seem broken.

Then some heavier, full-on Phabricator bot could run against this lightweight queue.

src/applications/nuance/source/bot/NuanceTwitterSourceBot.php
24

rather than resolveJSON, I think I need something that explodes new lines, then json decodes each element in the new array

https://dev.twitter.com/docs/streaming-apis/processing#Parsing_responses

That said, I couldn't get data to flow reliably when it was supposed to. (ie put my "track" word as something like "boob" which has tons of hits and get no data.)

D7724 probably adds streaming.

For architecture, I think it can work something like this:

  • One FirehoseDaemon or StreamDaemon or whatever, which holds all of the connections we need open.
  • When it gets a message, it just inserts a task into the task queue with the data.
  • A Taskmaster picks it up and does whatever processing needs to be done.

Seem reasonable?

(Where "all of the connections we need" is theoretically "tons and tons of different services", but in practice is probably just "twitter".)

For parsing the thing, I think you're going to need something like PhutilJSONProtocolChannel which holds a buffer and emits messages. Roughly like this:

$new_data = $future->read();

$messages = $protocol_buffer_object->write($new_data);

foreach ($messages as $message) {
  // queue a new task
}

The idea is the object returns some messages if $new_data completes any message frames, and otherwise it just keeps those bytes inside itself and returns nothing ("no complete messages yet").

I'll play with the new stuff and get the Bot working.

The biggest ugly in what's here -- aside from Bot -- is the ability to edit source type after you've made it. I could axe that feature, add a delete, and the code would look a lot cleaner. It feels a little funky as is to use.

Axing it seems fine to me. We don't let you, e.g., change a Git repository into a Subversion one either.

btrahan updated this revision to Unknown Object (????).Dec 12 2013, 12:12 AM

not quit done, but many updates

  • twitter bot works mostly
    • TODO - need to enqueue tasks
    • TODO - need to write a task worker to then create items
    • QUESTION - as far as I can tell, the existing infrastructure handles keeping one connection open, and each "future" is just a different request on that connection. Correct? (This makes things waaaay easy for me if so as I was thinking of each future as a connection...)
    • TODO - error handling
      • fill out the switch statement for messages for various ways things can go awry on start
      • react to error objects - ie errors that happen once we've got a good connect - and use new messages from above
      • retry strategy for when things go awry on connect or mid-stream basis
  • TODO - need to make deleting a source clean up transactions from that sourcea
    • Note I plan to have items say something like "created manually or source(s) deleted" as opposed to mess with item data

...i figure this is substantive enough for any major feedback (or minor feedback like file layout) while I finish the above.

some inlines on stuff i need to fix and a Q or two

src/applications/nuance/controller/NuanceSourceDeleteController.php
3 ↗(On Diff #17547)

will make extend source controller

src/applications/nuance/controller/NuanceSourceListController.php
69 ↗(On Diff #17547)

will fix spacing

src/applications/nuance/editor/NuanceSourceEditor.php
10

do you think i should just kill this source type transaction altogether? it can't be changed once initially set now

src/applications/nuance/source/bot/buffer/NuanceTwitterProtocolBuffer.php
25–46 ↗(On Diff #17547)

i sort of stole this from BaseHTTPFuture. it didn't seem easy to split this into some separate HTTP header parsing class, but I can do that if you want. :D

src/applications/nuance/source/bot/buffer/__tests__/NuanceTwitterProtocolBufferTestCase.php
39 ↗(On Diff #17547)

i used a search for "christmas" to test this.

src/applications/nuance/source/definition/NuanceTwitterSourceDefinition.php
14–15

...I also need to make this happen in my new list controller. i wanted twitter types (for example) to get to render a little extra info

QUESTION - as far as I can tell, the existing infrastructure handles keeping one connection open, and each "future" is just a different request on that connection. Correct? (This makes things waaaay easy for me if so as I was thinking of each future as a connection...)

It should work like this, yes, assuming the connections are sequential, although we generally can't tell the difference so I'm not 100% certain this is actually what cURL will do in all cases.

Generally, this all looks great to me. It would be nice to figure out how to share that HTTP parsing code but I agree it's not very easy/obvious. We can sort that out later. (Yell if I missed anything.)

src/applications/nuance/controller/NuanceSourceDeleteController.php
31 ↗(On Diff #17547)

(At some point, this should be deactivate rather than delete.)

src/applications/nuance/editor/NuanceSourceEditor.php
10

I'd probably toss it, yeah. I don't think there are any real advantages either way, but less code is usually better.

src/applications/nuance/source/bot/buffer/__tests__/NuanceTwitterProtocolBufferTestCase.php
39 ↗(On Diff #17547)

haha maybe beautify this :P

Cool, i think i'll go the de-activate route then sooner than later as that obviates some of my other problems.