Page MenuHomePhabricator

Populate the blame cache when parsing commits
Closed, WontfixPublic

Description

After T2450, it would be nice to pre-populate the blame cache during the commit parsing pipeline. This should make blame feel a lot faster.

Revisions and Commits

Event Timeline

epriestley added a project: Diffusion.
epriestley added a project: OS Students 2013.
epriestley added a subscriber: epriestley.

When we parse commits, they're run through a series of small steps which each parse one part of the commit. The current steps are:

  1. Discovery
  2. Message
  3. Change
  4. Herald/Feed
  5. Owners

We've broken the pipeline into a lot of small steps because it makes it easier to test each step individually, and gives us better-isolated failures when things do go wrong. To populate the blame cache, we should add a new "cache" step after the "change" step. The change step is responsible for figuring out which paths a commit touches, and is generally the largest parsing step.

The first thing you should do is create a test repository and have Phabricator track it, using the "Repositories" tool. You can just "git init" a new local repository, or you can import an existing one (like Phabricator itself). Making your own may be helpful so you can create commits. You can find some instructions in this document -- the links are all borked right now, I'm working on getting that fixed.

After you've imported a repository, we'll first add a new step which doesn't do anything, and let you run it on its own (it won't be part of the pipeline yet):

  • Define a new PhabricatorRepositoryCommitCacheParserWorker class, which extends PhabricatorRepositoryCommitParserWorker.
  • (Run arc liberate after defining a new class.)
  • Implement parseCommit() to do something like echo "hi";.
  • Edit scripts/repository/reparse.php to add a --cache flag. When the --cache flag is set, queue up a new PhabricatorRepositoryCommitCacheParserWorker near the bottom of the file.
  • Now, run scripts/repository/reparse.php --cache rXnnnnnn, where rXnnnnnnn is some commit identifier.

You should see the script run, and it should run your worker, echo "hi", and exit. Send me a diff for this, and then we can move on to making it do something.

All the stuff above you can do without actually needing the blame cache to exist, but we'll hit some cases that do need it fairly soon afterward. If you get through that and it isn't ready yet, {T2534} and {T879} might be good tasks to look at, or I can help you find something else.

I ran into a problem when trying to run arc liberate. I have the name of the main file with space in it and apparently arcanist can't figure it out and parses it, thus is not able to run liberate. If I change the name of the file then phabricator local is not working anymore. How should I try to fix this issue?

Can you copy/paste all the output / error text you get into this task?

I have the name of the file course "Facebook course" and when I try to run arc liberate it only parses the first half until the space and says "No such file or directory". It doesn't take into account spaces as well in file names and considers the name ended when it finds a space.

It works OK for me when I create a file called "facebook course". Am I doing something differently than you are?

$ touch src/'facebook course'
$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#	src/facebook course
nothing added to commit but untracked files present (use "git add" to track)
$ arc liberate src/
Finding source files...
Found 1,534 files.
Loading symbol cache...
Found 1,532 files in cache.
Analyzing 2 files with 8 subprocesses...
..
Done.
Building library map...
Writing map...
Done.
$

@irinav, which code were you unsure about where it should go? The --cache flag stuff?

yes, I wasn't sure where to put it in reparse.php

@epriestley, yes, I don't really know where to put the --cache flag.

To make the script recognize --cached: Look at this - https://secure.phabricator.com/diffusion/P/browse/master/scripts/celerity_mapper.php;2497e5b5ed07e27a$181 (also look at the usage help one line above, pick the right example for your needs)

To fetch the flag in the script, look at this: https://secure.phabricator.com/diffusion/P/browse/master/scripts/celerity_mapper.php;2497e5b5ed07e27a$200

chad changed the visibility from "All Users" to "Public (No Login Required)".Jul 3 2015, 5:21 AM

I'm just going to close this out, since I'm not sure we really need it -- D14963, D14962, and other work adjacent to T2450 found some general performance improvements in rendering. It's possible we might pursue this (and/or a highlighting cache) in the future, but I think we're fast enough now that the huge amount of upfront work the daemons would have to do to populate these caches probably doesn't make a ton of sense.