Page MenuHomePhabricator

Asynchronously calculate or pre-calculate patches for Phragment files
Closed, WontfixPublic

Description

One of the major problems with phragment.getpatch is that it calculates the patch list when the method is called. When a fragment has a reasonable number of fragments underneath it, it quickly reaches the execution limit (and in some cases hits max_allowed_packet in MySQL).

So this method either needs to use a pre-calculated list of patches, or work in an asynchronous manner (where the caller requests patch calculation and then daemons pick the request up and handle it). A pre-calculated list of patches have the problem that we don't know what patch combinations to pre-calculate; it won't always be the last patch. We could however, combine the two methods, caching the results from an asynchronous calculation.

For asynchronous I'm thinking something like:

phragment.startpatchcalculation with the same input values as phragment.getpatch. This returns immediately with an identifier for the patch calculation work ID.

phragment.getpatchcalculationstatus with the patch calculation work ID returns what files have been calculated for, and what is remaining. Callers can use this to check on the process of the patch calculation (since if it's taking longer than 30 seconds, callers probably want to inform the user what's going on).

phragment.downloadpatchcalculations with the patch calculation work ID. When phragment.getpatchcalculationstatus indicates that it is complete, the caller uses this to get a list of patch URIs and hashes to apply.

@epriestley What are your thoughts?

Related Objects

Event Timeline

hach-que claimed this task.
hach-que raised the priority of this task from to Needs Triage.
hach-que updated the task description. (Show Details)
hach-que added a project: Phabricator.
hach-que added subscribers: hach-que, epriestley.

For the execution limit, maybe a simpler fix is to use set_time_limit(0) to disable it for now? We probably have a similar issue if a file takes more than 30 seconds to download, too.

In the long run, we have a broad array of issues with large file handling (primarily, lack of streaming at multiple levels of the infrastructure). Once we outgrow set_time_limit(), I think handing things off to Node is probably the right way to go -- then phragment.getpatch would look something like:

  • Generate a unique ID.
  • Queue a daemon to build the patch.
  • Hand back a files.phabricator.com/resource/$secret/$unique-id/ sort of URL.
  • That is handled by Node which holds the connection, polls for resource generation, and streams the result efficiently.

For max_allowed_packet, just bump that up to a gazillion?

hach-que triaged this task as Wishlist priority.May 16 2014, 5:27 AM

This should be considered a really low priority. We ended up just re-downloading the latest file when the hash changes, rather than using differencing.

Given that differencing takes longer than 30 seconds; it often means the differencing calculation takes longer than it would to just download the new file in the first place. Thus even with asynchronous calculation of patches, what it really needs is offline precalculation of patches to be useful, but this raises issues around what patches should be calculated (should version 2 -> 3 be calculated? should version 2 -> 4 be calculated? etc)