Implemented support for build logs
ClosedPublic
Actions

Authored by hach-que on Nov 7 2013, 11:53 AM.

Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Maniphest Tasks

T1049: Implement Harbormaster

Commits

Restricted Diffusion Commit
rP0ac1be709434: Implemented support for build logs

Summary

Depends on D7519.

This implements support for build logs in Harbormaster. This includes support for appending to a log from the "Run Remote Command" build step.

It also adds the ability to cancel builds.

Currently the build view page doesn't update the logs live; I'm sure this can be achieved with Javelin, but I don't have enough experience with Javelin to actually make it poll from updates to content in the background.

{F79151}

{F79153}

{F79150}

{F79152}

Test Plan

Tested this by setting up SSH on a Windows machine and using a Remote Command configured with:

C:\Windows\system32\cmd.exe /C cd C:\Build && mkdir Build_${timestamp} && cd Build_${timestamp} && git clone --recursive https://github.com/hach-que/Tychaia.git && cd Tychaia && Protobuild.exe && C:\Windows\Microsoft.NET\Framework\v4.0.30319\MSBuild.exe Tychaia.Windows.sln

and observed the output of the build stream from the Windows machine into Phabricator.

Diff Detail

Lint

Lint Skipped

Unit

Tests Skipped

Event Timeline

hach-que added inline comments.Nov 7 2013, 12:00 PM

src/applications/harbormaster/step/VariableBuildStepImplementation.php
12	This provides users with a way to get something unique into the build; e.g I use it to create a unique build folder on the remote machine.

hach-que added inline comments.Nov 7 2013, 12:09 PM

src/applications/harbormaster/storage/build/HarbormasterBuildLog.php
62–68	I've observed some strange behaviour from MySQL around this. It seems around 3000 or so lines the log will be truncated and concatenation will start again. The alternative of adding a new row for each log entry seems unscalable since we'd quickly end up with a very large number of records, and retrieving them from the database for a particular build would be very slow. I'm not sure what else we can do here, other than perhaps store the log on disk as a file instead?

Can this be un-depended on D7519? Probably only part of that can drop into the upstream until Drydock, since I think it has all the same "run arbitrary commands" issues that the local flavor did. Looking at the code now...

Rebase on master, remove dependency on D7519.

This looks good to me, although there are a bunch of features I think we should consider planning for now, before anyone has a gig of data in this table. Some of this is a bit YAGNI, but I think reasonable use cases exist for almost all of it:

I think we should assume each build step can have more than one thing which generates logs, and add logSource (or similar), which is a short-ish human-readable string describing where the log came from (for example, the command which was run, the API endpoint which was requested, etc). For build steps which run several commands in sequence or in parallel, this will let us distinguish between the different log sources.
I think we should assume each log source can generate multiple logs, and add a logType (or similar), which is a short-ish human-readable-but-kind-of-conventional string describing exactly what the log captures. Examples would be "return" (return code), "stdout", "stderr", "file:side_effect.log", "full" (full command / full request?), "http-headers", "http-body", etc.

I think we should also consider these fields:

duration: Nullable field recording the command duration. This wouldn't always be populated, but could be populated on well-known logs by convention. For example, every step could add a "full" log which populates "duration", and the convention could be that the UI pulls just these logs to provide an overview. I think having this information in detail could be helpful for complex builds. It will also let us determine how much overhead Harbormaster itself is adding, and give us a heads up if there's a problem.
encoding: Field allowing us to compress the actual log data for the row and set a flag here so we know to decompress it later. This is sort of a weak optimization since it's maybe like 4x in the best case, but logs should generally compress well and I don't anticipate needing to search through them except maybe in very local scopes ("in this build"). This is probably premature, too, but if we wait until the logs are unwieldy it will be more painful because everyone will have to migrate.
live: This could be overlaid on encoding since a live log will always be plain text, but it might be good to have a flag to tell the UI that the log is live (so it can show an indicator, and Ajax-refresh it) vs complete. I'd lean toward having live, text, and gzipped values in encoding. This is a little fishy but I can't come up with any case where a live log would not be plain text.
Store text as binary, not UTF-8: If we're going to compress logs, we have to do this. Even if we don't, this might be a good idea? It makes it easier to compress them later, and logs aren't necessarily UTF-8 and we might run into weird issues with terminal junk and ascii art if we store them as utf-8. Since the field is LONGTEXT, there's no risk of truncating in the middle of a character. This does mean we need to be a little more careful when displaying the logs.
size: I don't think this is really that useful, but it's easy and it might be nice to have a field recording the byte size of the log. Particularly, a case I anticipate is being able to show "This log is very large. Only the first 1MB of 615MB is shown." without pulling all the data over the wire, although this gets trickier with compression. I'm not sure if we can easily store a large compressed value such that SELECT SUBSTR(0, 1024 * 1024) will be decompressible, although it should be possible in theory. There are various ways we could work around this to get good behavior. Generally, writing size is easy and gives us more options for dealing with edge cases involving unreasonably large data.

For the special case of stdout and stderr, I think they should be stored separately (for example, some commands produce a machine-readable stdout and it's believable that this will eventually be available as a build artifact or something like that, maybe, or some output will produce a variable for some other build step)? To interleave them for display, we could write a third log with sequence information, which just looked like "OEOEOEOOOOEEOE", maybe (indicating which buffer to take each line from in order to produce a combined result)? (Or it could just have all of the data in the right sequence, although that's a little less efficient, since it would be in both the "combined" and "stdout" or "stderr" logs.) This would be optional, and the UI could use it to interleave the logs if it was available. Otherwise, it could show them separately (if a build step can't produce this information for some reason). This is kind of messy, but the whole stdout/stderr thing is kind of messy in general. I'm loathe to just mush them together in the database since I think being able to separate them is sometimes valuable.

For the append issue, I haven't seen the CONCAT problem before. I'll see if I can reproduce it once it's more practical to write logs from build steps. One approach might be to add a finalize() method, which would let append() start buffering and doing fewer writes. I'm not sure if this would fix CONCAT, but it would probably reduce overhead by a fair bit. finalize() could flush the buffer and, in the future, compress the log itself and mark it as compressed in the encoding field and clear the live flag.

(I do generally agree that one row per log transcript is a better approach than one row per log line.)

Here's how I'm leaning for this diff:

Add logSource.
Add logType.
Add duration.
Add encoding, for now always "text".
Make logData field binary, and utf8-sanitize it on output.
Separate stdout and stderr.
Add size and populate it.

Then, somewhat soonish:

Add finalize() for writing.
Add return logs.
Add full command logs.
Populate duration on full command logs.
Add the sequence log for stdout/stderr.
Figure out garbage collection policies for these.

Eventually;

Add "live" encoding flag.
Add Ajax update.
Add UI logic for enormous logs.
Add gzipping encoding on finalize.

Does that sound reasonableish? It's kind of a lot of architecture for log files, but this table is probably going to get big, which will make changing it harder in the future without making users do long migrations. I'm not completely convinced we need most of this, but if we think we probably will we likely save a bit of effort by adding the fields now, even if we don't get around to using them for a while.

Cancel

For "Cancel", I have a mild leaning toward making it a separate field -- see inlines.

UI View

I'd like to maybe unify the UI a bit (we have some logs in Daemons too, and SourceCodeView or whatever might be generalizable) at some point, but ShellLogView is entirely reasonable for the moment.

resources/sql/patches/20131107.buildlog.sql
7	If this is quoted because it's a MySQL keyword, can we name the field something else? All of the code should be safe with respect to handling keywords in column names, but I think the smartest plan is not to use them. Like, there's no technical reason not to do this, but it's sort of playing with fire and `getLogText()` is probably more clear anyway.
src/applications/harbormaster/controller/HarbormasterBuildCancelController.php
19	I wonder if we should introduce CAN_EDIT. Not necessary for now, but maybe something to think about. We could also drive this off the MANAGE_PLANS capability, perhaps.
24	For consistency, just `return new Aphront404Response()`.
30	Maybe this should be a separate flag (roughly, "user has requested build cancel"). I'm not sure we can guarantee the daemons will be able to examine the status before they next update it without adding a bunch of careful locking. If it's a different field, we don't have to worry about that. The logic in the daemon is somewhat straightforward for now, but I wonder if it will remain as straightforward once we have builds with parallelism.
src/applications/harbormaster/controller/HarbormasterBuildViewController.php
43	Prefer `pht('Build %d', $id)`.
72	Prefer `phutil_split_lines()`, although note the default mode retains newlines.
src/applications/harbormaster/query/HarbormasterBuildLogQuery.php
66	This can be removed, I think -- it looks like it's identical to willFilterPage(). The difference between willFilterPage() and didFilterPage() is a bit subtle, but basically the code goes: $page = willFilterPage($page); $page = doPolicyChecks($page); // (a little more magic here) $page = didFilterPage($page); So: related loads which are required for policy checks must go in willFilterPage(). related loads which are not required for policy checks and which will not filter the objects out when they fail should go in didFilterPage(), so we don't have to load objects just to throw them away in a policy filter. didFilterPage() can also take advantage of the "query workspace", a local cache, since we know that the $page is now visible to the user. other related loads -- generally those which would filter the object out (e.g., the related object is required, even if not for policy checks) don't need to go in willFilterPage() but usually should, since we save the cost of policy checking them. Moving them to didFilterPage() may be appropriate if they're unusually expensive to load. Since this is a cheap, filtering load, willFilterPage() is the right place for it.
src/applications/harbormaster/worker/HarbormasterBuildWorker.php
67	This will wipe out the status change on line 44. For lock-sensitive fields like this, we might be best off writing some `writeBuildStatus()` which does an `UPDATE ... SET buildStatus = %s WHERE ...` explicitly, and then using that instead of `setX()`. (The general difficulty of reasoning about concurrency like this motivates "intent to cancel" as a separate field.)

For posterity, the issue with CONCAT() is that MySQL artificially limits this approach. I'd figured it was OK because we actually do it elsewhere already (see D3392) but that doesn't work either. See that revision for a detailed breakdown if you are some far-future individual arriving here long after the fact.

Instead, maybe we add a separate logchunk table which looks like: