Support renaming Phacility instances
Open, LowPublic
Actions

Assigned To

None

Authored By

	epriestley
	Aug 1 2016, 9:01 PM

Description

Companies occasionally get married and change their names, then want to change their instance names to match.

Some approaches:

Create a new instance, export the first instance, import it into a second instance. This is doable but a mess for us and for users.
Keep the instance, but rename all the stuff we put the instance name in. This is the nicest approach in some ways and most stuff is easy to rename (mostly databases and directories) but some is not (like S3).
Put an indirection layer into instances so that all the database/directory stuff stays the same but the instance name can change. Downsides are: stuff gets tougher to administrate because instance "abc" may have a database named "definitelynotabc_user"; we may need to do more queries/lookups in more places to get the mapping from instance name to underlying resource locations.
Combine (3) and (4) to do some renaming and some indirection. Maybe fine to rename the easy stuff and put indirection on S3.

Since this is currently very rare (one request ever) I'm inclined to try to pursue (2) and then maybe compromise and do something like (4) if moving resources in S3 turns out to be more than I want to deal with.

Revisions and Commits

	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit
	Restricted Differential Revision	Restricted Diffusion Commit

Related Objects

Mentioned In: T13393: Improve repository shard migration pathway in the shared cluster
T13076: Plans: Phacility cluster caching, renaming, and rebalance/compaction
T12847: A Pathway Towards Private Clusters
T12798: Decommission cluster host `repo012`
T7148: Allow users to export their data from Phacility
Mentioned Here: D17640: Add "bin/storage optimize" to run OPTIMIZE TABLE on everything
T12720: Record of hibernating daemon was garbage collected
T11230: Phacility: Private Clusters

Event Timeline

epriestley created this task.Aug 1 2016, 9:01 PM

Herald added subscribers: chad, eadler. · View Herald TranscriptAug 1 2016, 9:01 PM

It would also be nice to recognize that "oldname" became "newname" so we could redirect, continue delivering mail, etc. This is a little more involved.

Thinking about the S3 issue, It's probably better to just migrate all the file data to the blob store, rename the database (which we have to do anyway), then migrate it out of the blob store.

This is slower (maybe a lot slower), but can reuse processes we already have in place and use on other workflows instead of needing a weird one-off process. The less custom code we have on this the better, since it's probably only going to run a few times a year for a while.

I think that makes things fairly easy technically:

Migrate files to blob storage.
Stop daemons.
Rename databases.
Rename database backup and log directories on disk.
Rename repository directories on disk.
Rename repository backups and log directories on disk.
Use move-paths to rename repository directories in storage.
Start daemons.
Migrate large files out of blob storage.

The actual old backups and log files will still have the old names, but that's fine. Maybe we don't even bother renaming them.

There's still a big mess with the instance record itself on admin, though. It's possible that treating this as an export + import is actually cleaner?

I don't really have a pathway forward on this that I like, so I'm just planning to do the one outstanding request for this manually as a copy operation and kick the can down the road.

Another possibility is to not rename any of the resources and just do the rename at the display level. This is a little messy, and occasionally a little confusing (instance newname will still have databases, backups, etc., named oldname) but we can probably do a fairly good job of covering that up.

I'm inclined to think this approach may be more promising.

epriestley mentioned this in T7148: Allow users to export their data from Phacility.Aug 21 2016, 12:03 AM

After more thinking, I'm leaning heavily toward just doing a "display rename":

Add a table mapping host names to instances, and populate whatever.phacility.com in all of them. We need this for private clusters anyway (T11230) and for "this instance has moved elsewhere", and for stuff like hosting blogs if we ever do that.
Populate this table, but don't provide a way to interact with it yet.
Swap the Services and Instances code over to read from this table instead of assuming that <instance>.phacility.com is the one true instance hostname.
Mangle all of the Instances UI to reasonably reflect the idea that an instance has an internal name ("originalname") but may have a different canonical name ("newname"). This is kind of tricky but the goal is just to make stuff obvious to human users, so it's probably not too bad. Staff still need to see the internal name (since all the underlying resources will still use it). Maybe the easiest thing to do is to introduce a "Display Name", which defaults to "instance.phacility.com" but can be renamed to anything, and is used to refer to the instance in email, billing, etc. This won't be unique but that's probably a surmountable issue. Basically, we get to keep a nice simple setup process but then get more flexible/complicated as instances need it.
Manually rename an internal instance, if that seems OK then we can add a tool for doing renames.
Probably rename our meta instance so that we can catch issues when the rename code doesn't do sensible things.

This a bit involved, but I think it generally puts us on the path to T11230, and at the end we'll be able to do renames in a few seconds at the cost of slightly higher operational overhead when manually mangling things (since you may need to look up / remember the internal name, which may differ from the display name), but I think that's generally a reasonable tradeoff.

• tomekj2ee added a subscriber: • tomekj2ee.May 25 2017, 8:32 PM

epriestley mentioned this in T12798: Decommission cluster host `repo012`.Jun 5 2017, 6:15 PM

FWIW, I'm heavily in favor of the "display rename" approach, and allocating logical ids like ErrantAnalyticalGiraffe. Breaking the coupling between instance display name and its backing resources is a band-aid that will only get more painful to remove after we launch private instances.

If we stop generating internal IDs from instance names, I think we should pick a generation scheme such that they can not ever collide -- backups/ErrantAnalyticalGiraffe/ on disk should never exist and be a different instance than ErrantAnalyticalGiraffe.phacility.com in the web (technically these namespaces are different today because you can not use uppercase letters in an instance name, but that feels very flimsy as a dividing line).

I also don't anticipate ever letting x.phacility.com rename itself to y.phacility.com and then letting z.phacility.com rename to x.phacility.com -- if you ever had a name, it's your name forever. So the backups for z.phacility.com might not be in backups/z/, but they'll never be in backups/<some other valid instance name>/.

Once we did the work to support "Display Rename", we could reasonably change the algorithm to generate random internal IDs instead of using the instance names (or use their PHIDs, or whatever). If we don't, 99.99% of instances will have the same subdomain and disk/database/S3 name (well, maybe fewer if we give "Rename Instance" to users, but I'd expect to keep it "mail us" only). This is legitimately pretty convenient and it's nice to not have to run everything through an extra layer of lookups, but it could definitely also cause problems where it isn't obvious that we used the wrong identifier in some context since they're almost always the same. I don't know if the risk of a mistake outweighs the value of having a simpler relationship or not.

Renaming meta could hedge this risk a bit, but maybe we really should fully separate things. Stuff I use disk names for right now is things like this:

du -sh * to see which instances are huge (e.g., D17640).
show full processlist in MySQL to see which instances are running some weird workload.
ps auxwww | grep -i instancename to check on daemons (e.g., T12720).

None of these would be crippled by needing to do some kind of lookup.

We don't need to make a decision on this until everything else is done:

Implement all this stuff.
Rename meta.
Depending on how we feel about the risks after that, maybe change the algorithm to select a random internal name instead of defaulting to the display name.

If we stop generating internal IDs from instance names, I think we should pick a generation scheme such that they can not ever collide

Sure, I just picked that naming scheme as an example. Auto-incrementing IDs would be fine too; it's mostly a question of which disambiguation takes more effort, which I think is mostly a matter of taste:

instance-ab011Oll vs instance-abOll011
instance-ErrantAnalyticalGiraffe vs instance-ErraticAnalyticalGopher

backups/ErrantAnalyticalGiraffe/ on disk should never exist and be a different instance than ErrantAnalyticalGiraffe.phacility.com in the web

I think I disagree with this but I want to be sure about what you're saying. I'm proposing that on instance creation (for, say, conglomo.phacility.com), we allocate some new logical identifier (Giraffe-related or otherwise) that has nothing to with the string conglomo. That logical ID is used for S3 bucket naming, backups, DB names, etc. We track the mapping between the instance display name and the logical ID elsewhere. Display names and logical IDs can only be allocated, never freed. The display name can be changed arbitrarily, but the logical ID never changes.

The biggest point of this is to force an audit of every callsite that relies on the mapping between logical and display names and forcing them to run through the same bit of code. Then it doesn't matter how we might decide down the road to alter the relationship between logical and display names; as long as we provide the same API for going back and forth between display and logical names, we can change it at our leisure. v0 of this system would be the status quo: logical_id == instance_name. Breaking the coupling between logical and display names gives us way more flexibility to shard things in all kinds of wacky ways (one huge instance gets its own DB, free tier instances all get crammed together on a Raspberry PI on my desk, next-gen EC2 instance types get more instances per host than the previous-gen), and encourages building out tooling that minimizes the importance of littering an instance's resources with display

I also don't anticipate ever letting x.phacility.com rename itself to y.phacility.com and then letting z.phacility.com rename to x.phacility.com -- if you ever had a name, it's your name forever.

Agreed with this. Display names, once claimed, are never freed, even after renames.

Renaming meta could hedge this risk a bit, but maybe we really should fully separate things. Stuff I use disk names for right now is things like this:

I'm an adherent of the "SSH'ing to a host and running a command is an anti-pattern that should be replaced with better tooling" school-of-thought. Obviously SSH is always there as a last resort, but the three examples you provided are all great examples of ops work that should be automated away.

I think I disagree with this but I want to be sure about what you're saying.

I'm concerned about this situation:

User A registers a.phacility.com, which gets allocated ErrantAnalyticalGiraffe as an internal name, so it has (for example) stuff at backups/ErrantAnalyticalGiraffe/.
User B registers ErrantAnalyticalGiraffe.phacility.com, which gets allocated FrigidMonkeyComedian as an internal name, so it has (for example) stuff at backups/FrigidMonkeyComedian/.

One day, I'm troubleshooting a problem with ErrantAnalyticalGiraffe.phacility.com. I see ErrantAnalyticalGiraffe in the output of backups/ $ ls and backups/ $ cd ErrantAnalyticalGiraffe into it without thinking carefully. This isn't the right directory, and I'm now operating on the backups for a.phacility.com. The correct command was cd FrigidMonkeyComedian.

This is possible if we use an ID allocation algorithm where the internal name may be the same as another instance's external name. If we allocate those instances with internal names INST-a9f82h3/ and INST-981fh13/ there's no way I can ever see ErrantAnalyticalGiraffe.phacility.com in my browser and ErrantAnalyticalGiraffe/ on disk and think they're the same by mistake when they're really two totally different instances (except that inst-a9f72h3.phacility.com is currently a valid instance name, but we would blacklist this pattern before adopting this scheme).

Of course, I can still make other errors, like not noticing that I'm in INST-a9f82h39 when I should be in INST-a9f62h39. We have this problem today with test instances, though.

Obviously SSH is always there as a last resort

I do think there's at least some merit to designing to de-risk manual SSH. We should definitely be doing as little of it as possible, but it's also soooo much riskier than everything else.

Oh, I see what you're saying. Yeah, whatever logical naming scheme we come up with, we should block that pattern for display names.

I take your point about making cd errors, but that's another reason to start with lightweight tooling/aliases like cd_backups conglomo that uses the mapping service to automatically land you in the correct place. That eventually becomes internal tools pages with plentiful buttons and soothing colors for all operations on instances, which are logged and announced in Conpherence and automatically trigger maintenance notifications to customers, etc etc.

epriestley added a revision: Restricted Differential Revision.Jun 5 2017, 9:05 PM

epriestley added a commit: Restricted Diffusion Commit.Jun 6 2017, 11:31 AM

epriestley mentioned this in T12847: A Pathway Towards Private Clusters.Jun 15 2017, 10:21 PM

We now have three of these in Support (PHI63, PHI107, PHI128). I think the pathway forward is something like:

Add a new instance_name table with <domainName, instancePHID, isCanonical> and a unique key on domainName.
Write into this table on instance creation.
Copy every instance's existing name into the name table.
Load and attach records in InstanceQuery, and make getInstanceDomain() use the InstanceName records.
Then, update the API and SiteSource to query by domain and redirect to canonicalize if necessary.

Also, there are really three separate names here:

The immutable internal name, used to namespace databases, backups, etc. Today this is turtle but could be PHID-INST-abcdef123456 or any other unique value in the future.
The display name, used to identify the instance to humans. Today this is turtle, and it should remain turtle, but should become mutable. For example, if an instance renames from turtle.phacility.com to tortise.phacility.com, they'd presumably prefer to identify as tortise going forward, even though they will still own turtle.phacility.com and tortise.phacility.com.
The list of domains which map to the instance. Today this is explicitly turtle.phacility.com only, but should become flexible, per above.

To support the second case, instance records should also get a new unique displayName field, which is initially set to the value of the name field.

We can likely make all of these changes in instances/ very safely (they basically only add new stuff), and they're easy to test locally. The API changes are the only ones that carry any real risk of breaking (and are harder to test), but I think this risk is also quite small. The most likely source of issues is probably stuff using the wrong name (e.g., display where it should use internal) but we can flush those out by renaming some test instances.

epriestley added a revision: Restricted Differential Revision.Nov 1 2017, 5:37 PM

epriestley added a revision: Restricted Differential Revision.Nov 28 2017, 1:02 PM

epriestley added a revision: Restricted Differential Revision.Nov 28 2017, 1:06 PM

epriestley added a revision: Restricted Differential Revision.Nov 28 2017, 1:28 PM

epriestley added a revision: Restricted Differential Revision.Nov 28 2017, 2:31 PM

epriestley added a revision: Restricted Differential Revision.Nov 28 2017, 2:43 PM

epriestley added a revision: Restricted Differential Revision.Nov 28 2017, 3:51 PM

epriestley added a commit: Restricted Diffusion Commit.Nov 28 2017, 7:13 PM

epriestley added a commit: Restricted Diffusion Commit.

epriestley added a revision: Restricted Differential Revision.Nov 30 2017, 12:34 PM

epriestley added a commit: Restricted Diffusion Commit.Nov 30 2017, 7:04 PM

epriestley mentioned this in T13076: Plans: Phacility cluster caching, renaming, and rebalance/compaction.Feb 14 2018, 2:10 PM

epriestley added a revision: Restricted Differential Revision.Jan 31 2019, 5:39 PM

epriestley added a commit: Restricted Diffusion Commit.Feb 2 2019, 5:31 PM

Deploying the "instances.state" cache hit one issue on the repo tier:

repo024.phacility.net] [2019-02-10 18:03:30] EXCEPTION: (CommandException) Command failed with error #255!
[repo024.phacility.net] COMMAND
[repo024.phacility.net] /core/lib/phabricator/bin/phd restart --gently --autoscale-reserve 0.05
[repo024.phacility.net] 
[repo024.phacility.net] STDOUT
[repo024.phacility.net] (empty)
[repo024.phacility.net] 
[repo024.phacility.net] STDERR
[repo024.phacility.net] [2019-02-10 18:03:30] EXCEPTION: (PhutilLockException) file:/core/tmp/cache/cluster/ref.ubuntu.almanac.cache.lock at [<phutil>/src/filesystem/PhutilFileLock.php:90]
[repo024.phacility.net] arcanist(head=stable, ref.master=3b6b523c2b23, ref.stable=97ddb9d5a1be), libcore(), phabricator(head=stable, ref.stable=4974995c3008), phutil(head=stable, ref.master=13a200ca7621, ref.stable=9adb7f5dbc3c), services(head=stable, ref.master=772620edd80c, ref.stable=b1530afaaf1b)
[repo024.phacility.net]   #0 PhutilFileLock::doLock(double) called at [<phutil>/src/filesystem/PhutilLock.php:168]
[repo024.phacility.net]   #1 PhutilLock::lock(integer) called at [<phutil>/src/cache/PhutilOnDiskKeyValueCache.php:144]
[repo024.phacility.net]   #2 PhutilOnDiskKeyValueCache::loadCache(boolean) called at [<phutil>/src/cache/PhutilOnDiskKeyValueCache.php:92]
[repo024.phacility.net]   #3 PhutilOnDiskKeyValueCache::setKeys(array, NULL) called at [<phutil>/src/cache/PhutilKeyValueCacheProxy.php:26]
[repo024.phacility.net]   #4 PhutilKeyValueCacheProxy::setKeys(array, NULL) called at [<phutil>/src/cache/PhutilKeyValueCacheProfiler.php:78]
[repo024.phacility.net]   #5 PhutilKeyValueCa... (2,249 more byte(s)) ... at [<phutil>/src/future/exec/ExecFuture.php:380]

Basically:

The disk cache isn't super high-performance (it doesn't need to be in any role where we currently use it). In particular, a read or a write reads or writes the entire cache file (not just the keys it is affecting), and can take some real amount of time (hopefully, not more than a generous handful of milliseconds).
During reads and writes, we lock the cache file to prevent reads and writes from racing against one another.
By default, the disk cache will not wait for this lock, and raises PhutilLockException if it is unable to acquire the lock when it attempts a write (if it can't lock a read, it just returns an empty cache value).
When we start the daemons on the repo tier, we launch a whole bunch of processes at once.
These things conspire to make it fairly likely that some process will be unable to grab the lock during a write to fill the cache with an "instances.search" result, and will raise a PhutilLockException, which currently escapes to top level.

I landed a likely fix for this (f2786f58cbd4df5ef25c238e9137f9025b5ce798) and am deploying the repo tier again. The fix does this:

Reconfigures the disk cache to wait 1 second for the lock.
Catches PhutilLockException and continues if the updates fail.

The design of the "instances.state" cache (mostly-intentionally) makes it robust to writes failing. The cache may become older than we'd like, but the write rate against this cache is extremely low and reading old caches almost never has any effect, and when it does the effect isn't destructive (an instance might not turn on or turn off when it's supposed to, but that's about it). If writes ever succeed, the cache will eventually converge to the correct state.

(Deploy with the fix just finished, without any more lock issues.)

... and can take some real amount of time ...

The cache is also about 750KB today. So not huge, but not trivially small, either.

We also serialize(), write, and rename() while holding the lock. We can trivially serialize() before grabbing the lock to reduce the lock window. We could also write before grabbing the lock. But I'm not sure any of this is too worthwhile since "read and write the entire cache" won't scale very far no matter how clever we are about minimizing the lock window.

We can trivially serialize() before grabbing the lock to reduce the lock window. We could also write before grabbing the lock.

Oh, no, we can't do this easily. We lock, read, update, write, rename, unlock -- so our write is never "some old version of the cache plus our updates". We could read, update, write, lock, rename, unlock, but then our read might be out of date. We can have a separate optimistic lock clock and check if something else beat us, etc., but this isn't just shuffling a little code around. I think this is generally not worth it since "a giant blob of cache data in a single file on disk" is just never going to get us to the moon (which is fine, we aren't asking it to).

I don't want to speak too soon, but the instances.state cache doesn't seem to have any other glaring issues. This load chart for admin001, where steady state CPU load has dropped from ~20% to ~10%, suggests that the new scheme is legitimately more efficient and the deploy actually changed something:

Screen Shot 2019-02-10 at 10.46.13 AM.png (582×922 px, 123 KB)

I was able to briefly race the cache on instance startup and hit a "this instance is still starting..." page, but that's a minor issue in the scheme of things and relatively easy to smooth over.

This load chart for admin001, where steady state CPU load has dropped from ~20% to ~10%, suggests that the new scheme is legitimately more efficient and the deploy actually changed something:

hackerman

epriestley added a revision: Restricted Differential Revision.Mar 15 2019, 8:55 PM

epriestley added a commit: Restricted Diffusion Commit.Mar 18 2019, 10:21 PM

After a few followup fixes, this now appears to be working. Remaining stuff I'd like to do:

The documentation needs some minor updates.
I think creating an instance currently does not check the Domains table. It should verify that <internalname>.phacility.com does not exist in the domains table before allowing a name through. Probably we currently fail slightly later on (when trying to claim that domain) and end up in a weird state.

epriestley added a revision: Restricted Differential Revision.Apr 10 2019, 7:43 PM

epriestley added a commit: Restricted Diffusion Commit.May 23 2019, 4:57 PM

epriestley mentioned this in T13393: Improve repository shard migration pathway in the shared cluster.Aug 29 2019, 3:04 PM

	F6203616: Screen Shot 2019-02-10 at 10.46.13 AM.png
	Feb 10 2019, 6:49 PM

Support renaming Phacility instancesOpen, LowPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Support renaming Phacility instances
Open, LowPublic
Actions