Large Git repositories often benefit from regularly (say, daily or weekly) running some maintenance commands, from the general family of git prune, git gc, git repack, git reflog expire, or similar. The particular problems which occur (and the best commands to run to remedy them) can vary from repository to repository.
PHI1996 reports an issue where writing to a repository while it is running one of these three commands:
$ git reflog expire --expire-unreachable=now --all $ git gc --prune=now $ git prune
...caused a ref to go missing. I'm currently unsure about the exact mechanism here, but Phabricator should support maintenance windows which guarantee:
- the node will process no writes during the maintenance window; and
- the node is not the only cluster leader, unless it is also the only cluster node; and
- ideally, reads are routed to nodes under maintenance at reduced precedence. It's still better to serve a read from a node under maintenance than to fail to serve it. (If problems arise with reads during maintenance commands, these reads could block once read routing is precedence-aware.)
Note that repositories already have a bin/repository maintenance mode, but this is aimed at Phacility SAAS migrations, is repository-level rather than node-level, and just stops new writes without guaranteeing writes have aborted. So this mechanism isn't really appropriate here, and probably primarily motivates calling this something other than "maintenance" mode to limit overloading.
See PHI2004. When a repository node is writing backups, we don't need a lock, but it would be nice to be able to provide a hint to Phabricator that the node is temporarily less-desirable for routing purposes.