Resolve structural problems with Conduit API methods related to large input/output sizes and binary data
Open, LowPublic
Actions

Assigned To

None

Authored By

	epriestley
	Jul 12 2019, 4:23 PM

Description

Previously, see T5955.

Some API methods need to read or write data which is not a good match for JSON. In particular:

JSON is not naturally stream-oriented, and isn't an ideal format for transmitting very large blocks of data (for example, a 1GB git diff output).
JSON can not naturally encode binary data, and isn't an ideal format for transmitting binary data (for example, repositories may include path names which can not be represented naturally in JSON).

Earlier, I imagined navigating the binary issue by having the client say things like Content-Type: application/bson and Accept: application/json, application/bson and having the server fall back to BSON/protobuf/messagepack/whatever. However, I'm now generally less excited about this approach (see T5955#247571 for more details). It also doesn't help with the "large data size" issue at all.

Instead, I'm inclined to pursue these approaches:

Uploading large blocks of data: the client uploads the data as a file, then submits the file PHID. "Large data" generally means the File chunking block size (4MB).
Downloading large blocks of data: the server stores a temporary file and gives the client a file PHID / URI.
Uploading binary data: case-by-case? I'd ideally like the answer to be "not allowed" but this will mean that the API does not support certain operations like using hg|git grep to search for binary sequences in files. Good riddance?
Downloading binary data: we provide a "readable" encoding and a base64 raw encoding in some kind of standard type-format.

This isn't entirely exhaustive. Some open questions:

What do we do about arbitrarily long data in unusual places? An example is 2GB path names (see T10832). There are likely many weird/abusive variants of this where things like author names, branch names, tag names, etc., might be specifiable or corruptible to be arbitrarily long. (Offhand, hg bookmark happily accepts bookmark names up to the point where xargs fails.) Ideally we just reject these use cases, but inevitably someone wants arbitrarily long unit test names or whatever.
What do we do about framing calls over SSH?
In cases where a call may return an arbitrary amount of data, we'd generally like to provide time and/or byte limits and communicate these to callers. How do we do this? The existing tooHuge / tooSlow support feels a little silly as a general pattern.

Related Objects
Search...

Status	Assigned	Task
Open	None	T13337 Resolve structural problems with Conduit API methods related to large input/output sizes and binary data
Open	None	T13338 Provide "differential.diff.edit" with an option to create a diff from an uploaded file source
Open	None	T13339 Update "diffusion.searchquery" to handle multiple path arguments and various weird cases

Event Timeline

epriestley triaged this task as Low priority.Jul 12 2019, 4:23 PM

epriestley created this task.

Herald added a subscriber: amckinley. · View Herald TranscriptJul 12 2019, 4:23 PM

epriestley mentioned this in T5955: Refactor Conduit auth to be stateless, token-based, and support wire encodings.Jul 12 2019, 4:28 PM

epriestley mentioned this in T13397: "bin/host upload" and the Uploader wrapper should retry chunks on HTTP/504, and perhaps other status codes.Sep 2 2019, 1:10 PM

epriestley mentioned this in T13507: Compress Conduit client requests.Apr 8 2020, 3:58 PM

epriestley mentioned this in T13552: "Close Revision" actions are executed by the Message worker, and do not execute if a previously discovered commit becomes reachable from a permanent branch.Jul 10 2020, 4:11 PM

Resolve structural problems with Conduit API methods related to large input/output sizes and binary dataOpen, LowPublicActions

Description

Related ObjectsSearch...

Event Timeline

Resolve structural problems with Conduit API methods related to large input/output sizes and binary data
Open, LowPublic
Actions

Related Objects
Search...