Page MenuHomePhabricator

Conduit can't transfer diffs that have invalid UTF8 characters, because json_encode returns FALSE
Closed, DuplicatePublic

Description

Initial screenshot:

pasted_file (668×817 px, 61 KB)

As per the comments below, I traced this down to json_encode inside AphrontResponse returning false (and the code not performing any error checking to make sure json_encode succeeded). This is because the diff contains invalid UTF8 characters, so it can't be JSON encoded.

Event Timeline

hach-que assigned this task to epriestley.
hach-que raised the priority of this task from to Needs Triage.
hach-que updated the task description. (Show Details)
hach-que added projects: Conduit, Diffusion.
hach-que added a subscriber: hach-que.

I modified diffusion.rawdiffquery to do the following before returning the result:

$result = $raw_query->loadRawDiff();
if ($result === null) {
  file_put_contents('/tmp/diff', 'null');
} else {
  file_put_contents('/tmp/diff', $result);
}
return $result;

and there's definitely data being returned here, because I can cat /tmp/diff and see the response that would have been returned.

I ran the query directly against the Conduit API on the storage machine, from my local machine, and it returns an empty response (so it's got nothing to do with the requesting machine):

$ echo '{
  "commit": "762468e4bfcee38c202ad24bc6b0761fa02a3ce6",
  "callsign": "P",
  "branch": "master"
}' | arc call-conduit --conduit-uri http://10.42.2.180/ --conduit-token ...... diffusion.rawdiffquery
Waiting for JSON parameters on stdin...
{"error":"ERR-CONDUIT-CORE","errorMessage":"ERR-CONDUIT-CORE: Host returned HTTP\/200, but invalid JSON data in response to a Conduit method call.","response":null}

For some reason, the storage tier is attempting to forward the request on to another machine inside the cluster (presumably itself?) and is then recursing. This makes no sense given that the code in src/applications/diffusion/conduit/DiffusionQueryConduitAPIMethod.php at line 123 is designed to prevent recursion.

Something is totally whack with the cluster configuration in the latest HEAD. For some reason, the storage tier is now constructing a Conduit client to code.pageuppeople.com, even though it should just run the command locally. Even worse, the $is_cluster_request variable is set to true, yet it still constructs a new Conduit client.

I'm like 95% sure this has something to do with the content of the actual diff returned, like UTF-8 encoding or something like that...

When I change line 119 of DiffusionQueryConduitAPIMethod.php to:

if ($client && false) {

Then I just get the blank response and invalid JSON from the Arcanist call itself.

Putting:

$result = $this->getResult($request);
file_put_contents('/tmp/diff2', $result);
return $result;

at line 136 of DiffusionQueryConduitAPIMethod on the storage tier results in the file having the diff content, so the issue must be further on than this.

Adding file_put_contents('/tmp/diff3', $result); on line 92 of PhabricatorConduitAPIController.php shows that the result is still correctly stored at this point.

Adding:

file_put_contents('/tmp/diff4.'.Filesystem::readRandomCharacters(20), print_r($response, true));

to line 147 of PhabricatorConduitAPIController shows that the diff content does pass through this line at least once.

Adding:

phutil_json_decode($response);

to line 27 of AphrontJSONResponse (right after it constructs the JSON), shows:

$ echo '{
  "commit": "762468e4bfcee38c202ad24bc6b0761fa02a3ce6",
  "callsign": "P",
  "branch": "master"
}' | arc call-conduit --conduit-uri http://10.42.2.180/ --conduit-token api-4wnywf22wulqcojnwaaiozy46mol diffusion.rawdiffquery
Waiting for JSON parameters on stdin...
{"error":"ERR-CONDUIT-CORE","errorMessage":"ERR-CONDUIT-CORE: [HTTP\/500] Internal Server Error\nPhutilJSONParserException: Parse error on line 1 at column 0: Expected one of: 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['","response":null}

which basically means that Phabricator is not constructing the JSON as a valid JSON string.

Well, I think we're in a bad position, because this code:

public static function encodeJSONForHTTPResponse(array $object) {

  array_walk_recursive(
    $object,
    array(__CLASS__, 'processValueForJSONEncoding'));

  $response = json_encode($object);
  phutil_json_decode($response);

  // Prevent content sniffing attacks by encoding "<" and ">", so browsers
  // won't try to execute the document as HTML even if they ignore
  // Content-Type and X-Content-Type-Options. See T865.
  $response = str_replace(
    array('<', '>'),
    array('\u003c', '\u003e'),
    $response);

  return $response;
}

throws an exception on phutil_json_decode.. :/

When I do:

public static function encodeJSONForHTTPResponse(array $object) {

  array_walk_recursive(
    $object,
    array(__CLASS__, 'processValueForJSONEncoding'));

  $response = json_encode($object);
  if ($response === false) {
    throw new Exception('Unable to encode object as JSON');
  }

  // Prevent content sniffing attacks by encoding "<" and ">", so browsers
  // won't try to execute the document as HTML even if they ignore
  // Content-Type and X-Content-Type-Options. See T865.
  $response = str_replace(
    array('<', '>'),
    array('\u003c', '\u003e'),
    $response);

  return $response;
}

I get:

$ echo '{
  "commit": "762468e4bfcee38c202ad24bc6b0761fa02a3ce6",
  "callsign": "P",
  "branch": "master"
}' | arc call-conduit --conduit-uri http://10.42.2.180/ --conduit-token api-4wnywf22wulqcojnwaaiozy46mol diffusion.rawdiffquery
Waiting for JSON parameters on stdin...
{"error":"ERR-CONDUIT-CORE","errorMessage":"ERR-CONDUIT-CORE: [HTTP\/500] Internal Server Error\nException: Unable to encode object as JSON","response":null}

so json_encode can't encode the string as JSON.

Yeah, the diff contains invalid UTF8:

ip-.....:/srv/phabricator/phabricator # cat /tmp/diff | python -c 'import json,sys; print json.dumps(sys.stdin.read())'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python2.7/json/__init__.py", line 243, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib64/python2.7/json/encoder.py", line 201, in encode
    return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf1 in position 7576: invalid continuation byte

We do need some way of transferring this data between cluster machines, and obviously we can't use JSON (or we need to strip the invalid UTF8 before encoding? that seems bad). @epriestley, any thoughts?

This "works":

php -r '$content = file_get_contents("/tmp/diff"); $content = iconv("UTF-8", "UTF-8//IGNORE", $content); echo $content;'  | python -c 'import json,sys; print json.dumps(sys.stdin.read())'

but is obviously all kinds of bad because it's modifying the result of git diff.

hach-que renamed this task from Empty response from Conduit when diffusion.rawdiffquery is called to Conduit can't transfer diffs that have invalid UTF8 characters, because json_encode returns FALSE.Jun 25 2015, 4:57 AM
hach-que updated the task description. (Show Details)