Page MenuHomePhabricator

Provide `phragment.getstate` and `phragment.getpatch` Conduit methods
ClosedPublic

Authored by hach-que on Dec 7 2013, 12:58 PM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 19, 4:47 PM
Unknown Object (File)
Wed, Apr 17, 12:43 PM
Unknown Object (File)
Mon, Apr 15, 4:23 PM
Unknown Object (File)
Thu, Apr 11, 5:30 PM
Unknown Object (File)
Mon, Apr 8, 10:49 AM
Unknown Object (File)
Sun, Apr 7, 7:53 PM
Unknown Object (File)
Thu, Apr 4, 3:23 AM
Unknown Object (File)
Wed, Apr 3, 7:19 PM

Details

Summary

This provides a phragment.getstate and a phragment.getpatch Conduit method.

phragment.getstate - This returns the current state of the fragment and all of it's children.

phragment.getpatch - This accepts a base path and a mapping of paths to hashes. The mapping is for the caller to specify the current state of the files it has. This returns a list of patches that the caller needs to apply to it's files to get to the latest version.

Test Plan

Ran the following script in a folder which had content matching a fragment and it's children:

#!/bin/bash

STATE=""
for i in $(find ./ -type f); do
    HASH=$(cat $i | sha1sum | awk '{ print $1 }')
    BASE=${i:2}
    STATE="$STATE,\"$BASE\":\"$HASH\""
done
STATE=${STATE:1}
STATE="{$STATE}"

echo '{"path":"tychaia3.zip","state":'$STATE'}' | arc --conduit-uri=http://phabricator.local/ call-conduit phragment.getpatch

and I got:

{"error":null,"errorMessage":null,"response":[]}

I updated one of the child fragments with a new file and ran the script again (patch has been omitted due to it's size):

{"error":null,"errorMessage":null,"response":[{"path":"Content\/TitleFont.xnb","hash_old":"4a927d7b90582e50cdd330de9f4b59b0cc5eb5c7","hash_new":"25867504642a3a403102274c68fbb9b430c1980f","patch":"..."}]}

Diff Detail

Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

src/applications/phragment/conduit/ConduitAPI_phragment_getpatch_Method.php
60

interate -> iterate

src/applications/phragment/conduit/ConduitAPI_phragment_getstate_Method.php
10 ↗(On Diff #17486)

it's -> its

hach-que updated this revision to Unknown Object (????).Dec 8 2013, 1:23 AM

Spelling

I think there are two high-level issues here:

Client Recovery of Errors: If the client passes a sha1 for x/y.exe, but the server does not have the corresponding binary, it looks like the whole thing will die right now (I think). We could fix that and just generate a null -> modern y.exe patch for y.exe, but the client won't know how to apply it, because it can't tell whether a patch applies against its y.exe or whether the server failed to find the original binary and the patch applies against emptiness.

One approach would be to return both a list of patches and a map of which sha1 each patch is based on. The client can then delete all the files for which the patches are against null.

Binary Data and JSON: We can't encode arbitrary binary data in JSON -- it isn't valid and PHP isn't OK with it, at least on deserialization. So I would expect this will not actually work end-to-end for most binaries.

I think the easiest approach is for this method to return URIs, not patches. So maybe it returns a map like this:

{
  "a.exe" : {
    "action" : "update",
    "originalSHA1" : "98sdba9yb9d8g8abb",
    "newSHA1" : "as9bga9s8hb9a8sb",
    "patchURI" : "http://.../path/to/patch"
  },
  "b.jpg" : {
    "action" : "create",
    "originalSHA1" : null,
    "newSHA1" : "zxc98b7zcx98b7zx",
    "patchURI" : "http://.../path/to/file"
  },
  "c.bin" : {
    "action" : "delete",
    "originalSHA1" : "298y893r2f3",
    "newSHA1" : null,
    "patchURI" : null
  }
}

The client then examines the return record:

  • If a file has been removed, it doesn't need to download anything. Particularly, it doesn't need to download a potentially huge negative-patch just to delete a file. It can just remove the file.
  • If a file has been added, the URL can just point at the download URI. The client can download it and drop it into the directory, without dealing with patch application.
  • If a file has been updated, the client can download and apply the patch for it.
  • The "action" field is optional, I've included it only for clarity. We could omit it, since you can figure it out by examining the sha1s.

This will probably perform a lot better, and centralizes all of the large-file performance issues discussed in https://secure.phabricator.com/T4205#6 into Files, which will let us deal with them in one place. And it works around all the JSON/base64 issues in a simple way.

It's safe to return the getBestURI() URI, clients can always access it without authentication. For new files, you can just return the URI of the File directly. For not-new files, we could compute the patch inline today and save it to a File. Eventually, we could move this to caching, etc.

src/applications/phragment/conduit/ConduitAPI_phragment_getpatch_Method.php
143–146

We should do this through PhabricatorFileQuery. You shouldn't be able to get a patch from a file you can't see, even if you know its hash, since the patch reveals file contents.

150

There's no guarantee this exists, right?

167–195

This is pure code duplication, right? We can just put this stuff on the base. See the other copy for a likely simplification.

src/applications/phragment/conduit/ConduitAPI_phragment_getstate_Method.php
6 ↗(On Diff #17493)

Let's call this phragment.queryfragments, and change path to paths (and then eventually add ids, phids, etc., as necessary). Basically, make it consistent with other modern app.query methods.

82 ↗(On Diff #17493)

Can't this use a minimum depth and get everything in one go? This will execute one query per subdirectory but if we just say depth >= N we can just do one big call, right?

10 ↗(On Diff #17486)

and pht()

Diff-Match-Patch encodes all binary characters as %XX. This isn't something I'm doing explicitly in code; this is actually how the Diff-Match-Patch algorithm works. So there's never raw binary data being sent via JSON or out to the client.

We should be able to use "hash_old": "00000..." to indicate that the patch is based on the empty file.

Oh -- are these patches only applyable by DMP on the other side? They're not standard patches?

Correct; these aren't standard diffs since standard diffs don't handle binary data (and DMP uses different algorithms that allow it to send only the differences in a line vs. replacing an entire line).

hach-que updated this revision to Unknown Object (????).Dec 9 2013, 4:04 AM

Return download URIs instead of patches inline

hach-que updated this revision to Unknown Object (????).Dec 9 2013, 10:15 PM

Change phragment.getstate to phragment.queryfragments and support multiple paths

hach-que updated this revision to Unknown Object (????).Dec 9 2013, 10:35 PM

Unify getFragmentMappings

hach-que updated this revision to Unknown Object (????).Dec 9 2013, 10:38 PM

pht

epriestley added inline comments.
src/applications/phragment/conduit/ConduitAPI_phragment_getpatch_Method.php
169–172

Oh -- for consistency, we should use camelCase here. Older methods aren't consistent about this, but newer methods should almost all be using camelCase -- e.g. fileOld, hashOld, etc. The theory is that we're returning "objects", so we're using the same casing convention as objects do elsewhere in the codebase.

hach-que updated this revision to Unknown Object (????).Dec 11 2013, 12:18 AM

Use camel case