Page MenuHomePhabricator

Unify intracluster sync and Drydock working copy construction timeouts as a repository "copy time limit"
ClosedPublic

Authored by epriestley on Nov 16 2018, 5:19 PM.
Tags
None
Referenced Files
Unknown Object (File)
Tue, Mar 19, 10:42 PM
Unknown Object (File)
Tue, Mar 19, 10:42 PM
Unknown Object (File)
Tue, Mar 19, 10:42 PM
Unknown Object (File)
Tue, Mar 19, 10:41 PM
Unknown Object (File)
Feb 10 2024, 5:05 PM
Unknown Object (File)
Feb 10 2024, 7:07 AM
Unknown Object (File)
Feb 5 2024, 9:27 PM
Unknown Object (File)
Feb 5 2024, 9:11 PM
Subscribers
Restricted Owners Package

Details

Summary

Depends on D19814. Ref T13216. See PHI885. For various eldritch reasons, git fetch can hang. Although we'd probably like to fix this with git fetch --require-sustained-network-transfer-rate=512KB/5s or similar, that flag doesn't exist and we don't have a reasonable way to build it.

Short of that, move toward formalizing a repository "copy time limit": the longest amount of time anything may spend trying to make a copy of this repository.

This grows out of the existing intracluster sync limit, which is effectively the same thing. Here, apply it to git clone and git fetch in Drydock working copy construction, too. A future change may make it configurable.

Test Plan
  • Set the limit to 0.001.
  • Tried to build and lease working copies, got sensible timeout errors (see D19815).
<Activation Failed> Lease activation failed: [CommandException] Command killed by timeout after running for more than 0.001 seconds.
COMMAND
ssh '-o' 'LogLevel=quiet' '-o' 'StrictHostKeyChecking=no' '-o' 'UserKnownHostsFile=/dev/null' '-o' 'BatchMode=yes' -l '********' -p '2222' -i '********' '127.0.0.1' -- '(cd '\''/var/drydock/workingcopy-163/repo/spellbook/'\'' && git clean -d --force && git fetch && git reset --hard)'

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

Owners added a subscriber: Restricted Owners Package.Nov 16 2018, 5:19 PM
amckinley added inline comments.
src/applications/diffusion/protocol/DiffusionCommandEngine.php
138

Maybe make this "See T13108 and PHI885."?

src/applications/drydock/blueprint/DrydockWorkingCopyBlueprintImplementation.php
192

Could this result in a collision? Or are these directories already guaranteed to be unique?

286–289

We don't want to also do these in parallel like activateResource() now does?

src/applications/drydock/interface/command/DrydockSSHCommandInterface.php
33–34 ↗(On Diff #47324)

Just checking that this was intentional, and not a temporary logging change.

src/applications/repository/storage/PhabricatorRepository.php
1900–1902

Presumably this will eventually become configurable on a per-repository basis, right?

This revision is now accepted and ready to land.Nov 16 2018, 8:13 PM
epriestley added inline comments.
src/applications/drydock/blueprint/DrydockWorkingCopyBlueprintImplementation.php
192

The directories are "sort of guaranteed unique for our purposes" if we get this far since they're already keys in $map.

They're originally derived elsewhere from $repository->getCloneName(), which is not always guaranteed to be unique, but I think that's more of an issue for a higher-level layer to care about. We aren't creating any new problems by doing this, at least.

286–289

It would probably be slightly preferable, but since the two commands have to go in sequence it would be a bigger rework than the clone rework above. In practice, although we use this stuff (multiple working copies: libphutil/ + arcanist/ + phabricator/) other projects normally don't so the benefit is likely minimal. I reworked above but not here just out of guesses about patch size and comfort with how thoroughly I was testing.

I think this stuff is likely to get more significant work in some future iteration, e.g. see T9492.

src/applications/drydock/interface/command/DrydockSSHCommandInterface.php
33–34 ↗(On Diff #47324)

This got backed out in master already, see T13179.

src/applications/repository/storage/PhabricatorRepository.php
1900–1902

Yeah, see T13216#242393 for my current thinking.

This revision was automatically updated to reflect the committed changes.