Change Details

See PHI351. An install has 256+ character unit test names. I think this essentially always a problem before the data hits Phabricator (i.e., these names aren't useful to humans, either) but this is probably a case where we should give installs enough rope to shoot themselves in the foot. See also T11402, which discusses the total size of the test table. I expect that extracting these strings to a dimension table likely makes sense, with some absurd soft limit (like 1MB) to forestall disaster when an install uploads a unit test with a 750MB name and every interface becomes mysteriously slow. General questions about "which specific tests are failing" which storage changes should support: - T9365 is probably related but needs triage. - Same with T12029. - This gets a mention in PHI383. - T9951 wants this storage to have a basket of random properties. Maybe? This tends to make archival/storage more difficult but doesn't seem unreasonable. - T10123 is probably some aggregation of other stuff here. - See also T11763. T10635 is perhaps adjacent although I'm not sure what the current state of it is. Harbormaster build logs are mostly okay up to the storage part and then garbage from there forward. In particular: - (T5822) They should support archiving and GC. - (T9124) They should be accessible via the API -- but note issues with T5955 for logs with ANSI character codes. - (T9516, T8656, PHI382, T10179) The UI is generally clumsy and managing large logs is a headache. - (T6139) Build logs should support summarization. - (T11810) Live log output. - (T10868) Line-anchors. From PHI383: > What unit tests have recently failed? ... On which buildables? > What build plans have failed recently? ... On which buildables? > What build steps have failed recently? ... On which buildables? > What builds are running right now against buildables of this type? You can get //some// of this in the UI but it's very scattered. These are all largely reasonable questions to ask. Harbormaster doesn't do a good job of showing an overall "state of the world in builds" status dashboard or giving you the components to build one today. > What builds are running right now? What builds are running of a given plan? These can be answered at `/harbormaster/build/` but it (or some other interface) could be better at answering questions instead of just presenting information. > [ Which Drydock resources have had unit test, plan and step failures recently? ] > What builds are running within a given [Drydock] resource pool? > What builds has this drydock resource/resource pool had run on it? What build plans? What tests have failed on it? These are likely reasonable but the path is currently a little muddier since Harbormaster/Drydock have little explicit bridging today. See PHI405. This task discusses some reasonable improvements to the Harbormaster/Differential integration. See PHI446. This task discusses providing ways to access older builds and build logs instead of vanishing them from the UI completely. (This works now, but it would be nice to make the UI a little richer.) See PHI430. Policy exceptions raised while rendering "Build Status" are currently not handled as well as they could be. The source code CSS changes have made `{P123}` render a little weirdly, and it should get a pass before this finishes up. See PHI507. An install would like better support for richer build artifacts, particularly screenshots. See T10568. The checkmark icon tooltip in Diffusion for builds could be more useful on failures, e.g. "3/5 builds passed" or, more likely, break out which builds failed. --- In T13124 / PHI531, I'm introducing a "magic" step ("Abort Older Builds") which applies before the build actually starts. If this sticks: - It should probably be called out in the UI. - It shouldn't be able to generate artifacts or depend on other steps. - It should either skip target generation entirely, or generate an empty target, and then immediately re-enter the build update. Currently, we'll return to the queue to execute an empty step which does nothing. - T13072 should happen, and BuildCommand needs to stop being transactional; it can currently race. - Also, this whole thing might be a bad idea. - It should possibly issue a special build command like "Render Obsolete", not "Abort". - Long-running steps, like "Sleep" and "Drydock: Run Command", should test for build aborts while sitting in their local equivalent of a `select()` loop. ---- See PHI766, which is roughly "allow builds to fail when they produce too much output". See PHI533, which is roughly "allow builds to fail when a 'wait' after an HTTP request takes too long". I'm pretty sure there's also an "ssh exec should have a timeout" task somewhere. It clearly should. See T12701, about improving URL validation for "Make HTTP Request" steps. This is a relatively easy thing to tighten up. See T11350. This is a minor but reasonable UI improvement. See PHI859. If you pass a poorly-named variable to Harbormaster, it should complain immediately. See PHI875, which is roughly T10510. cURL timeouts are not configurable, and are reported to Harbormaster in a confusing way. See PHI877, which is roughly T5936: when a build is aborted, in-flight steps should be able to terminate or clean up. See PHI901, which is roughly T10260: Harbormaster notification rules aren't currently very flexible, and rules like "Notify on build failure: ..." and/or Herald support for something like "When build status changes to failed" would help address some reasonable use cases.

See PHI351. An install has 256+ character unit test names. I think this essentially always a problem before the data hits Phabricator (i.e., these names aren't useful to humans, either) but this is probably a case where we should give installs enough rope to shoot themselves in the foot. See also T11402, which discusses the total size of the test table. I expect that extracting these strings to a dimension table likely makes sense, with some absurd soft limit (like 1MB) to forestall disaster when an install uploads a unit test with a 750MB name and every interface becomes mysteriously slow. General questions about "which specific tests are failing" which storage changes should support: - T9365 is probably related but needs triage. - Same with T12029. - This gets a mention in PHI383. - T9951 wants this storage to have a basket of random properties. Maybe? This tends to make archival/storage more difficult but doesn't seem unreasonable. - T10123 is probably some aggregation of other stuff here. - See also T11763. T10635 is perhaps adjacent although I'm not sure what the current state of it is. Harbormaster build logs are mostly okay up to the storage part and then garbage from there forward. In particular: - (T5822) They should support archiving and GC. - (T9124) They should be accessible via the API -- but note issues with T5955 for logs with ANSI character codes. - (T9516, T8656, PHI382, T10179) The UI is generally clumsy and managing large logs is a headache. - (T6139) Build logs should support summarization. - (T11810) Live log output. - (T10868) Line-anchors. From PHI383: > What unit tests have recently failed? ... On which buildables? > What build plans have failed recently? ... On which buildables? > What build steps have failed recently? ... On which buildables? > What builds are running right now against buildables of this type? You can get //some// of this in the UI but it's very scattered. These are all largely reasonable questions to ask. Harbormaster doesn't do a good job of showing an overall "state of the world in builds" status dashboard or giving you the components to build one today. > What builds are running right now? What builds are running of a given plan? These can be answered at `/harbormaster/build/` but it (or some other interface) could be better at answering questions instead of just presenting information. > [ Which Drydock resources have had unit test, plan and step failures recently? ] > What builds are running within a given [Drydock] resource pool? > What builds has this drydock resource/resource pool had run on it? What build plans? What tests have failed on it? These are likely reasonable but the path is currently a little muddier since Harbormaster/Drydock have little explicit bridging today. See PHI405. This task discusses some reasonable improvements to the Harbormaster/Differential integration. See PHI446. This task discusses providing ways to access older builds and build logs instead of vanishing them from the UI completely. (This works now, but it would be nice to make the UI a little richer.) See PHI430. Policy exceptions raised while rendering "Build Status" are currently not handled as well as they could be. The source code CSS changes have made `{P123}` render a little weirdly, and it should get a pass before this finishes up. See PHI507. An install would like better support for richer build artifacts, particularly screenshots. See T10568. The checkmark icon tooltip in Diffusion for builds could be more useful on failures, e.g. "3/5 builds passed" or, more likely, break out which builds failed. --- In T13124 / PHI531, I'm introducing a "magic" step ("Abort Older Builds") which applies before the build actually starts. If this sticks: - It should probably be called out in the UI. - It shouldn't be able to generate artifacts or depend on other steps. - It should either skip target generation entirely, or generate an empty target, and then immediately re-enter the build update. Currently, we'll return to the queue to execute an empty step which does nothing. - T13072 should happen, and BuildCommand needs to stop being transactional; it can currently race. - Also, this whole thing might be a bad idea. - It should possibly issue a special build command like "Render Obsolete", not "Abort". - Long-running steps, like "Sleep" and "Drydock: Run Command", should test for build aborts while sitting in their local equivalent of a `select()` loop. ---- See PHI766, which is roughly "allow builds to fail when they produce too much output". See PHI533, which is roughly "allow builds to fail when a 'wait' after an HTTP request takes too long". I'm pretty sure there's also an "ssh exec should have a timeout" task somewhere. It clearly should. See T12701, about improving URL validation for "Make HTTP Request" steps. This is a relatively easy thing to tighten up. See T11350. This is a minor but reasonable UI improvement. See PHI859. If you pass a poorly-named variable to Harbormaster, it should complain immediately. See PHI875, which is roughly T10510. cURL timeouts are not configurable, and are reported to Harbormaster in a confusing way. See PHI877, which is roughly T5936: when a build is aborted, in-flight steps should be able to terminate or clean up. See PHI901, which is roughly T10260: Harbormaster notification rules aren't currently very flexible, and rules like "Notify on build failure: ..." and/or Herald support for something like "When build status changes to failed" would help address some reasonable use cases. See PHI919, which discusses improvements to failed/aborted messaging.