Fix bad counting in SQL when enforcing Drydock allocator soft limits
ClosedPublic
Actions

Authored by epriestley on Oct 14 2015, 12:26 PM.

Details

Reviewers

hach-que
chad

Maniphest Tasks

T9252: Unprototype Drydock (v1)

Commits

Restricted Diffusion Commit
rPac7edf54afe4: Fix bad counting in SQL when enforcing Drydock allocator soft limits

Summary

Ref T9252. This fixes a bug from D14236. D14272 discusses the observable effects of the bug, primarily that the window for racing is widened from ~a few milliseconds to several minutes under our configuration.

This SQL query is missing a GROUP BY clause, so all of the resources get counted as having the same status (specifically, the alphabetically earliest status any resource had, I think). For test cases this often gets the right result since the number of resources may be small and they may all have the same status, but in production this isn't true. In particular, the allocator would sometimes see "35 destroyed resources" (or whatever), when the real counts were "32 destroyed resources + 3 pending resources".

Since this allocator behavior is soft/advisory this didn't cause any actual problems, per se (we do expect races here occasionally), it just made the race very very easy to hit. For example, Drydock in production currently has three pending working copy resources. Although we do expect this to be possible, getting 4 resources when the configured limit is 1 should be hard (not lightning strike / cosmic radiaion hard, but "happens once a year" hard).

Also exclude destroyed resources since we never care about them.

Test Plan

Followed the plan from D14272 and restarted two Harbormaster workers at the same time.

After this patch was applied, they no longer created two different resources (we expect it to be possible for this to happen, just very hard).

We should still be able to force this race by putting something like sleep(10) right before the query, then sleep(10) right after it. That would prevent the allocators from seeing one another (so they would both think there were no other resources) and push us down the pathway where we exceed the soft limit.