Page MenuHomePhabricator

Add migration to encourage rebuilding repository identities
ClosedPublic

Authored by amckinley on Jun 18 2018, 10:39 PM.
Tags
None
Referenced Files
F12841598: D19497.id46783.diff
Thu, Mar 28, 9:19 PM
Unknown Object (File)
Fri, Mar 22, 4:12 PM
Unknown Object (File)
Thu, Mar 14, 4:52 PM
Unknown Object (File)
Feb 14 2024, 7:08 AM
Unknown Object (File)
Feb 4 2024, 2:45 PM
Unknown Object (File)
Feb 3 2024, 7:03 PM
Unknown Object (File)
Feb 1 2024, 4:15 AM
Unknown Object (File)
Jan 28 2024, 10:21 AM
Subscribers
Restricted Owners Package
Tokens
"Hungry Hippo" token, awarded by epriestley.

Details

Summary

Ref T12164. Defines a new manual activity that suggests rebuilding repository identities before Phabricator begins to rely on them.

Test Plan
  • Ran migration, observed expected setup issue:
    Screen Shot 2018-08-09 at 12.21.26 PM.png (1×1 px, 263 KB)
  • Ran bin/config done identities and observed setup issue get marked as done.
  • Ran /bin/storage upgrade --apply phabricator:20170912.ferret.01.activity.php to make sure I didn't break the reindex migration; observed reindex setup issue appear as expected.
  • Ran ./bin/config done reindex and observed reindex issue cleared as expected.

Diff Detail

Repository
rP Phabricator
Branch
repo-activity
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 20589
Build 27969: Run Core Tests
Build 27968: arc lint + arc unit

Event Timeline

Owners added a subscriber: Restricted Owners Package.Jun 18 2018, 10:39 PM

We could do something like this instead:

$workflow = new PhabricatorRepositoryManagementRebuildIdentitiesWorkflow();

$args = new PhutilArgumentParser(
  array(
    'bin/repository',
    'rebuild-identities',
    '--all',
  ));
$args->parseWorkflows(array($workflow));

....which seems a little cleaner even if it does encode "rebuild-identities", but that exits the script with the workflow's error code.

I don't love just copy-pasting the whole thing, but it seems generally fine.

One possible alternative we could do is to generate a "manual activity" instead that just says "run rebuild-identities" (see T11932). However, there's currently only support for one type of manual activity (search index rebuilds) and it might be more work than it's worth. I could also fix ArgumentParser to have better support for this kind of execution mode (and might have other reasons to do so after T13098) but I'm hesitant about that since it would otherwise be totally unused.

Let's hold this until after the release cut in any case so I can issue some "run rebuild-identities" changelog guidance, but I expect we can just land this as-is after that. I'll maybe look at "manual activity" and ArgumentParser-as-related-to-T13098 and see if either of those looks cleaner, but if nothing particularly enticing turns up this seems reasonable.

@epriestley any updates on this, or should I land it before this week's release cut as-is?

Sorry for the very long delay on this.

After some thought and poking at the code a bit, I'm leaning toward doing this as a "manual activity" instead of a forced migration since I think it's actually pretty straightforward to hook up, avoids copy/pasting all this code, I don't think we'll end up with too much cruft in the codebase, it's less scary for large installs, and it lets us fix mistakes at significantly lower cost.

To do this, I think the path looks like this:

  1. Add a TYPE_IDENTITIES or similar to PhabricatorConfigManualActivity.
  2. Write a migration similar to 20170912.ferret.01.activity.php which checks if there are any any commits (or: any commits with null author identities?) and schedules a "rebuild-identities" activity if there are.
  3. Change PhabricatorManualActivitySetupCheck to show two different messages, one for this activity and one for the existing reindex activity (maybe just pull them into separate methods?)

I think that's about it, and most of it's just writing instructions.

I think the major downside is that we'll need to keep some if ($identity === null) { ... } cases in the frontend code for a while, but I think there isn't / won't be too much of this. We can always force a migration some time in the future if we want to get rid of this.

This revision now requires changes to proceed.Aug 2 2018, 11:15 PM

I think the major downside is that we'll need to keep some if ($identity === null) { ... } cases in the frontend code for a while, but I think there isn't / won't be too much of this.

As a very rough cut at the number of callers:

~/src/phacility/phabricator (repo-migrate) $ grep -r getAuthorName . | wc -l
      23

I guess that's not the end of the world, but that's a lot of soon-to-be-pointless checks to add (and 2x as much testing that also involves rolling DB state forward/back). I'd rather write those front-end changes assuming that repo identities exist for every commit, and then hold them a few weeks until The Future™ arrives.

I guess this is really targeted at "installs that update frequently, don't read the changelog, but do respond to setup issues (which I assume is how these manual activities get raised)"? That does describe the way I've admin'ed previous Phabricator installs at other orgs, but I wonder how large that group of users is.

Overall I'm on board with making this a manual activity, but I think we should definitely plan on forcing the migration a few weeks(?) after this lands.

I'll also change PhabricatorRepositoryManagementRebuildIdentitiesWorkflow to check for commits with null author identities to speed up the case where the task gets killed and restarted. Also, unlike the search index rebuild, rebuilding identities can detect when it's finished (no commits with null identities). So should I copy/paste some of the logic from PhabricatorConfigManagementDoneWorkflow at the end of the rebuild workflow to automate this and mark the activity as done? Or should the guidance say "run ./bin/config done rebuild-identities after the rebuild completes"?

I think we don't need that much === null checking and can just stick some methods on Commit and end up with everything simplified. For example, CommitController has this mess:

$author_phid = $data->getCommitDetail('authorPHID');
$author_name = $data->getAuthorName();
$author_epoch = $data->getCommitDetail('authorEpoch');

$committed_info = id(new PHUIStatusItemView())
  ->setNote(phabricator_datetime($commit->getEpoch(), $viewer));

$committer_phid = $data->getCommitDetail('committerPHID');
$committer_name = $data->getCommitDetail('committer');
if ($committer_phid) {
  $committed_info->setTarget($handles[$committer_phid]->renderLink());
} else if (strlen($committer_name)) {
  $committed_info->setTarget($committer_name);
} else if ($author_phid) {
  $committed_info->setTarget($handles[$author_phid]->renderLink());
} else if (strlen($author_name)) {
  $committed_info->setTarget($author_name);
}

$committed_list = new PHUIStatusListView();
$committed_list->addItem($committed_info);
$view->addProperty(
  pht('Committed'),
  $committed_list);

That would be uncomfortable to split into identity and non-identity halves and maintain indefinitely -- but it's also just bad, period, and I think the real problem is that it's a mess already and we should fix the mess.

That can probably be something like:

$commit->renderAnyCommitter();

...or maybe it has to be something like:

$commit->renderAnyCommitter($viewer, $handles);

And I bet that decomposes into a couple of other simple methods:

public function renderAnyCommitter(...) {
  if ($this->hasCommitter()) {
    return $this->renderCommitter($viewer, $handles);
  } else {
    return $this->renderAuthor($viewer, $handles);
  }
}

...and we probably end up with a couple of === null checks here and there but not like 20ish to deal with.

But maybe I'm underestimating how whacky this will be once we start converting callsites.

Perhaps one approach would be to try to consolidate those 20 callsites into a small number of methods first. If we can't get away with it, we could re-evaluate the "activity" plan. If we can, it should make the "activity + maintain" plan more appealing by reducing the downside.

but I wonder how large that group of [users who don't read changelogs but do read setup issues] is.

I think this is probably quite a lot of installs.

I'll also change PhabricatorRepositoryManagementRebuildIdentitiesWorkflow to check for commits with null author identities to speed up the case where the task gets killed and restarted.

I suspect the speedup is trivial-ish, and this might require us to introduce a new --really-seriously-all flag if we fix a bug with this, since otherwise we won't fix the bugged commits. Maybe add this as --only-unattributed (or some better name, that's maybe not terribly clear) if you want to add it rather than requiring a --seriously-everything flag to disable this behavior later.

Or should the guidance say "run ./bin/config done rebuild-identities after the rebuild completes"?

I think this is desirable, since gives installs more control over running rebuild-identities in pieces and not requiring rebuild-identities --all to actually complete.

Rewrite migration to raise setup issue via manual activity.

amckinley retitled this revision from Add migration for creating repository identities to Add migration to encourage rebuilding repository identities.Aug 9 2018, 7:28 PM
amckinley edited the summary of this revision. (Show Details)
amckinley edited the test plan for this revision. (Show Details)

The setup issue message might be a little verbose? Let me know what you think.

That messaging looks great to me. One inline about possibly tweaking the bailout condition, but looks good otherwise.

resources/sql/autopatches/20180809.repo_identities.activity.php
8–11

I'd expect this test to look for any commit with authorIdentityPHID = null instead, so we skip this guidance if you: (a) have no commits; or (b) already ran rebuild-identities.

We actually should run rebuild-identities if you somehow have no users but do have unattributed commits, right (this would be bizarre, though)?

This revision is now accepted and ready to land.Aug 9 2018, 8:57 PM
amckinley edited the test plan for this revision. (Show Details)

Switch bailout condition to look for commits, not users.

Oh, we should add a LIMIT 1 to that so we don't load a trillion commits. Good otherwise.

This revision was automatically updated to reflect the committed changes.