Page MenuHomePhabricator

Provide a standalone script entry point for resolving a repository identity
Open, WishlistPublic

Description

See PHI1526. An install ran into an issue where a user's email was not correctly associated with their repository identity.

Manually re-executing the PhabricatorRepositoryIdentityChangeWorker resolved the link, leaving things a bit of a mystery.

A proper bin/repository resolve-identity ... sort of script to provide a real entrypoint into this workflow would make this kind of thing easier to debug in the future.

Event Timeline

epriestley triaged this task as Wishlist priority.Mon, Nov 4, 8:23 PM
epriestley created this task.

We currently have bin/repository rebuild-identities, which takes a list of repository identifiers or --all. This has a lot of history in T12164 and we currently queue an "activity" for it during migrations in 20180809.repo_identities.activity.php.

Still, this can be adjusted mostly-freely as long as the activity text is also adjusted.

The core behavior ("extract identities from commits") needs to continue to exist. However, since there seem to be quite a few bugs with this and it's not observable, I suspect these additional modes are desirable:

  • rebuild all identities associated with a commit;
  • rebuild all identities associated with a user;
  • rebuild all identities associated with a raw identity string;
  • rebuild all identities associated with an email address.

Most of these feed into eachother. "Repository" generates a commit list; commits generate an identity string list; users generate an email list and a PHID list. Then, emails and user PHIDs generate actual identities; identity strings do too, but might be in "create if does not exist" mode or not.

So I think I'm going to start by changing the flagless arguments to bin/repository rebuild-identities into repeatable --repository arguments, then add --commit, --user, --raw, and --email, then make all of this work in an observable way (perhaps also with --dry-run), then actually start fixing behaviors.

Here's a list of bugs I expect exist, although I haven't made it far as reproducing them yet:

  • Looking up identities by email address is improperly case-sensitive, because the query is a LIKE query against a binary column.
  • PhabricatorUserEditor->createNewUser() does not associate identities.
  • Destroying a user does not properly disassociate identities. This is arguably okayish if repair tools are okay, but it's probably better to just wipe the data out. Elsewhere, in PHI1544, an install masked a user with a mailing list, and just automatically fixing things would have made my life easier. (Disassociating guesses is fairly unambiguous, disassociating explicit manual selections is a little more questionable.)
  • Removing an email does not properly disassociate identities. This unambiguously should.

PHI1526 has several other cases which are less clear. These may include:

  • "Author" and "Committer" in Herald rules may be incorrectly returning identity PHIDs; they should never return anything except a real user PHID.

Looking up identities by email address is improperly case-sensitive, because the query is a LIKE query against a binary column.

This is a real bug which reproduces easily.

lookup-identity.php
<?php

require_once 'scripts/init/init-script.php';

$args = new PhutilArgumentParser($argv);
$args->parseStandardArguments();

$viewer = PhabricatorUser::getOmnipotentUser();

$identities = id(new PhabricatorRepositoryIdentityQuery())
  ->setViewer($viewer)
  ->withEmailAddress($argv[1])
  ->execute();

echo pht(
  "Found %s identities.\n",
  count($identities));
epriestley@orbital ~/dev/phabricator $ php -f lookup-identity.php git@epriestley.com
Found 2 identities.
epriestley@orbital ~/dev/phabricator $ php -f lookup-identity.php GIT@epriestley.com
Found 0 identities.

The internal construction with LIKE '%...' is also not great:

$ php -f lookup-identity.php GIT@epriestley.com --trace
...
>>> [3] (+1) <query> SELECT `repository_identity`.* FROM `repository_identity` `repository_identity` WHERE (repository_identity.identityNameRaw LIKE '%<GIT@epriestley.com>') ORDER BY `repository_identity`.`id` DESC 
...
Found 0 identities.
  • When you set an identity to "Unassigned", we also set the effective PHID to "Unassigned". This isn't strictly incorrect, but probably makes everything more complicated than it needs to be.

When you set an identity to "Unassigned", we also set the effective PHID to "Unassigned". This isn't strictly incorrect, but probably makes everything more complicated than it needs to be.

I'm going to declare this is a bug; PhabricatorRepositoryIdentityQuery->withHasEffectivePHID() is broken at a minimum and I'm guessing a bunch of other stuff is too. This is a very niche feature so I'm not too worried about adjusting semantics slightly.