Page MenuHomePhabricator

Allow transaction publishers to pass binary data to workers
ClosedPublic

Authored by epriestley on Aug 19 2015, 1:10 PM.
Tags
None
Referenced Files
F12806524: D13939.id33700.diff
Wed, Mar 27, 6:21 PM
F12806265: D13939.id33647.diff
Wed, Mar 27, 6:04 PM
F12804388: D13939.id.diff
Wed, Mar 27, 4:09 PM
F12803888: D13939.id33700.diff
Wed, Mar 27, 3:36 PM
Unknown Object (File)
Sun, Mar 24, 8:45 PM
Unknown Object (File)
Tue, Mar 5, 1:59 AM
Unknown Object (File)
Sun, Mar 3, 7:36 PM
Unknown Object (File)
Sun, Mar 3, 7:36 PM
Subscribers

Details

Summary

Ref T8672. Ref T9187. Root issue in at least one case is:

  • User makes a commit including a file with some non-UTF8 text (say, a Japanese file full of Shift-JIS).
  • We pass the file to the TransactionEditor so it can inline or attach the patch if the server is configured for these things.
    • When inlining patches, we convert them to UTF8 before inlining. We must do this since the rest of the mail is UTF8.
    • When attaching patches, we send them in the original encoding (as file attachments). This is correct, and means we need to give the worker the raw patch in whatever encoding it was originally in: we can't just convert it to utf8 earlier, or we'd attach the wrong patch in some cases.
  • TransactionEditor does its thing (e.g., creates the commit), then gets ready to send mail about whatever it did.
  • The publishing work now happens in the daemon queue, so we prepare to queue a PublishWorker and pass it the patch (with some other data).
  • When we queue workers, we serialize the state data with JSON.

So far, so good. But this is where things go wrong:

  • JSON can't encode binary data, and can't encode Shift-JIS. The encoding silently fails and we ignore it.

Then we get to the worker, and things go wrong-er:

  • Since the data is bad, we fatal. This isn't a permanent failure, so we continue retrying the task indefinitely.

This applies several fixes:

  1. When queueing tasks, fail loudly when JSON encoding fails.
  2. In the worker, fail permanently when data can't be decoded.
  3. Allow Editors to specify that some of their data is binary and needs special handling.

This is fairly messy, but some simpler alternatives don't seem like good ways forward:

  • We can't convert to UTF8 earlier, because we need the original raw patch when adding it as an attachment.
  • We could encode only this field, but I suspect some other fields will also need attention, so that adding a mechanism will be worthwhile. In particular, I suspect filenames may be causing a similar problem in some cases.
  • We could convert task data to always use a serialize()-based binary safe encoding, but this is a larger change and I think it's correct that things are UTF8 by default, even if it makes a bit of a mess. I'd rather have an explicit mess like this than a lot of binary data floating around.

The change to make LiskDAO will almost certainly catch some other problems too, so I'm going to hold this until after stable is cut. These problems were existing problems (i.e., the code was previously breaking or destroying data) so it's definitely correct to catch them, but this will make the problems much more obvious/urgent than they previously were.

Test Plan
  • Created a commit with a bunch of Shift-JIS stuff in a file.
  • Tried to import it.

Prior to patch:

  • Broken PublishWorker with distant, irrelevant error message.

With patch partially applied (only new error checking):

  • Explicit, local error message about bad key in serialized data.

With patch fully applied:

  • Import went fine and mail generated.

Diff Detail

Repository
rP Phabricator
Branch
utf1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 7686
Build 8409: [Placeholder Plan] Wait for 30 Seconds
Build 8408: arc lint + arc unit

Event Timeline

epriestley retitled this revision from to Allow transaction publishers to pass binary data to workers.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: chad.
chad edited edge metadata.
chad added inline comments.
src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php
2642–2656

Intended to be in this diff?

This revision is now accepted and ready to land.Aug 19 2015, 3:11 PM
src/applications/transactions/editor/PhabricatorApplicationTransactionEditor.php
2642–2656

Yeah, this was just lower down in the file and I moved it up to be with similar code. (I added it not-the-best place a few diffs ago.)

src/infrastructure/storage/lisk/LiskDAO.php
1654

This is the change which will make errors more obvious: the old call returns false for bad data, the new one throws.

The exception is pretty detailed so I think issues will be easy to fix, and these are all real (and potentially severe) bugs so we should definitely identify and fix them, but they may not occur in convenient places.

This revision was automatically updated to reflect the committed changes.