Depends on D19429. Depends on D19423. Ref T12164. This creates new columns authorIdentityPHID and committerIdentityPHID on commit objects and starts populating them. Also adds the ability to explicitly set an Identity's assignee to "unassigned()" to null out an incorrect auto-assign. Adds more search functionality to identities. Also creates a daemon task for handling users adding new email address and attempts to associate unclaimed identities.
Details
Imported some repos, watched new columns get populated. Added a new email address for a previous commit, saw daemon job run and assign the identity to the new user. Searched for identities in various and sundry ways.
Diff Detail
- Repository
- rP Phabricator
- Branch
- repo-id-3a
- Lint
- Lint Passed 
- Unit
- Tests Passed 
- Build Status
- Buildable 20319 - Build 27591: Run Core Tests - Build 27590: arc lint + arc unit 
Event Timeline
| resources/sql/autopatches/20180509.repo_identity.commits.sql | ||
|---|---|---|
| 2–4 | I have mixed feelings about DEFAULT NULL. Right now it's required to get the unit tests to pass, because PhabricatorRepositoryDiscoveryEngine doesn't set these fields during discovery (I guess PhabricatorRepositoryCommitMessageParserWorker->updateCommitData() runs in a later phase?) Aside from that, I was developing with NOT NULL and it works fine, since updateCommitData() will definitely have a handle to the identity. | |
| src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php | ||
| 112 | Not sure if we need to put this in the metadata too, but it seemed reasonable. | |
For committer (vs author), I think DEFAULT NULL is definitely right because Mercurial and Subversion commits don't have a committer as separate from the author, so the only sensible value for this field is null.
For author, the discovery/parse phases are basically:
- Discovery: create objects as quickly as possible. Can not be parallelized.
- Various parsing phases: can be parallelized. Actually parse the commit.
Discovery creates a bare-bones placeholder because it must run serially, so it's theoretically the slow part if you have enough daemons. If we don't run it serially, we'd need to do a lot of work to make sure that one or more workers exiting at some random time can't leave any "holes" in the repository where some commit is forgotten about. The current discovery algorithm resists disruption, lightning strikes, etc., and converges toward getting exactly one Commit object into the database for every commit in the repository.
In practice, Discovery is currently very fast and parsing is relatively slower, and we're rarely actually blocked on discovery. We could reasonably do more work in discovery, and pulling identities might make sense. We do already pull epoch during discovery, even though it isn't strictly required, but it's "almost free" to do in every VCS.
But, at least for now, I think DEFAULT NULL ends up simpler and it's sort of nice to have it be consistent with comitterIdentity, so I'd just keep it like this rather than trying to do more work in discovery. The UI is generally good about identifying these commits as "still importing" / "parsing and incomplete", etc., so users shouldn't be surprised by the behavior the null creates.
| src/applications/diffusion/controller/DiffusionIdentityViewController.php | ||
|---|---|---|
| 127 | I think there is, but it's super tricky. Something like you instantiate a copy of the Datasource, then set the value, render wire tokens from it, rebuild handles out of those wire tokens, then render tags out of the handles? There's a comment somewhere like "make this easier". The approach you've taken here is quite reasonable. | |
| 129 | Just for UI consistency, do something like: phutil_tag('em', array(), pht('Explicitly Unassigned')); | |
| src/applications/diffusion/typeahead/DiffusionIdentityAssigneeDatasource.php | ||
| 17 | viewer() probably doesn't work well here. :) | |
| src/applications/people/editor/PhabricatorUserEditor.php | ||
| 426 | Ideally, pass the user PHID as the objectPHID in the third parameter too -- I think like this: array( 'objectPHID' => $user->getPHID(), ) That makes it a little easier to hunt down tasks by querying the table in some unusual cases, since we can find tasks "about object X" by querying on this column, but it tends to be a mess to try to find tasks by LIKE'ing the JSON blob. | |
| src/applications/repository/worker/commitmessageparser/PhabricatorRepositoryCommitMessageParserWorker.php | ||
| 112 | I think it won't be necessary, since I believe we always have a Commit whenever we have CommitData. But it also can't really hurt anything. | |
| src/applications/repository/xaction/PhabricatorRepositoryIdentityAssignTransaction.php | ||
| 76 | I think you can just do if ($new === ThingeySource::FUNCTION) { continue; } a little above. | |
Requested changes, fix formatting for transactions involving the "Explicitly Unassigned" state.