Page MenuHomePhabricator

Provide a standalone script entry point for resolving a repository identity
Open, WishlistPublic

Description

See PHI1526. An install ran into an issue where a user's email was not correctly associated with their repository identity.

Manually re-executing the PhabricatorRepositoryIdentityChangeWorker resolved the link, leaving things a bit of a mystery.

A proper bin/repository resolve-identity ... sort of script to provide a real entrypoint into this workflow would make this kind of thing easier to debug in the future.

Event Timeline

epriestley triaged this task as Wishlist priority.Nov 4 2019, 8:23 PM
epriestley created this task.

We currently have bin/repository rebuild-identities, which takes a list of repository identifiers or --all. This has a lot of history in T12164 and we currently queue an "activity" for it during migrations in 20180809.repo_identities.activity.php.

Still, this can be adjusted mostly-freely as long as the activity text is also adjusted.

The core behavior ("extract identities from commits") needs to continue to exist. However, since there seem to be quite a few bugs with this and it's not observable, I suspect these additional modes are desirable:

  • rebuild all identities associated with a commit;
  • rebuild all identities associated with a user;
  • rebuild all identities associated with a raw identity string;
  • rebuild all identities associated with an email address.

Most of these feed into eachother. "Repository" generates a commit list; commits generate an identity string list; users generate an email list and a PHID list. Then, emails and user PHIDs generate actual identities; identity strings do too, but might be in "create if does not exist" mode or not.

So I think I'm going to start by changing the flagless arguments to bin/repository rebuild-identities into repeatable --repository arguments, then add --commit, --user, --raw, and --email, then make all of this work in an observable way (perhaps also with --dry-run), then actually start fixing behaviors.

Here's a list of bugs I expect exist, although I haven't made it far as reproducing them yet:

  • Looking up identities by email address is improperly case-sensitive, because the query is a LIKE query against a binary column.
  • PhabricatorUserEditor->createNewUser() does not associate identities.
  • Destroying a user does not properly disassociate identities. This is arguably okayish if repair tools are okay, but it's probably better to just wipe the data out. Elsewhere, in PHI1544, an install masked a user with a mailing list, and just automatically fixing things would have made my life easier. (Disassociating guesses is fairly unambiguous, disassociating explicit manual selections is a little more questionable.)
  • Removing an email does not properly disassociate identities. This unambiguously should.

PHI1526 has several other cases which are less clear. These may include:

  • "Author" and "Committer" in Herald rules may be incorrectly returning identity PHIDs; they should never return anything except a real user PHID.

Looking up identities by email address is improperly case-sensitive, because the query is a LIKE query against a binary column.

This is a real bug which reproduces easily.

lookup-identity.php
<?php

require_once 'scripts/init/init-script.php';

$args = new PhutilArgumentParser($argv);
$args->parseStandardArguments();

$viewer = PhabricatorUser::getOmnipotentUser();

$identities = id(new PhabricatorRepositoryIdentityQuery())
  ->setViewer($viewer)
  ->withEmailAddress($argv[1])
  ->execute();

echo pht(
  "Found %s identities.\n",
  count($identities));
epriestley@orbital ~/dev/phabricator $ php -f lookup-identity.php git@epriestley.com
Found 2 identities.
epriestley@orbital ~/dev/phabricator $ php -f lookup-identity.php GIT@epriestley.com
Found 0 identities.

The internal construction with LIKE '%...' is also not great:

$ php -f lookup-identity.php GIT@epriestley.com --trace
...
>>> [3] (+1) <query> SELECT `repository_identity`.* FROM `repository_identity` `repository_identity` WHERE (repository_identity.identityNameRaw LIKE '%<GIT@epriestley.com>') ORDER BY `repository_identity`.`id` DESC 
...
Found 0 identities.
  • When you set an identity to "Unassigned", we also set the effective PHID to "Unassigned". This isn't strictly incorrect, but probably makes everything more complicated than it needs to be.

When you set an identity to "Unassigned", we also set the effective PHID to "Unassigned". This isn't strictly incorrect, but probably makes everything more complicated than it needs to be.

I'm going to declare this is a bug; PhabricatorRepositoryIdentityQuery->withHasEffectivePHID() is broken at a minimum and I'm guessing a bunch of other stuff is too. This is a very niche feature so I'm not too worried about adjusting semantics slightly.

Removing an email does not properly disassociate identities. This unambiguously should.

This is a reproducible bug: add an unclaimed identity email to your account, then remove it. You stick to the identity, but should not.

Destroying a user does not properly disassociate identities.

This is a reproducible bug: destroy a user automatically associated with an identity. The identity doesn't update and we end up with a mess:

Screen Shot 2019-11-14 at 9.35.15 AM.png (262×490 px, 32 KB)

PhabricatorUserEditor->createNewUser() does not associate identities.

This is a reproducible bug: register a new account with an email address that already exists in an identity. The identity should update but does not.

Currently, the flow here is that changes queue a daemon task.

When this task is queued by a DestructionEngine, it is possible that the task may execute before the corresponding objects are destroyed. In this case, it would resolve the identity to the existing object, not make any changes, and end up in a bad state.

I think the only real way to avoid this is to add a didDestroyObject(...) flow to DestructionEngine so the task queueing is strictly sequenced after the object destruction.

Another likely bug is:

  • Add unverified address to user A.
  • Steal the address as user B through a verification workflow.
  • Identity doesn't update.

This is very rare/unusual and probably a mess to test, but I'm probably fixing it.

Uhhhh, absolutely none of this works because PhabricatorUserEmail does not have a PHID.