Phabricator User DocsDiffusion User Guide
Phabricator User DocumentationApplication User Guides

Diffusion User Guide Article

Guide to Diffusion, the Phabricator repository browser.

Overview

Diffusion is a repository browser which allows you to explore source code in a Git or SVN repository, similar to software like Trac and GitWeb.

Diffusion provides a very high-performance SVN browser, a moderately high-performance Git browser and relatively slow Mercurial browser. It achieves performance by denormalizing large amounts of data about repository history into a database and using this information like a cache so it can avoid querying the repository directly. This data is generated by daemons which track repositories, discover new commits, and parse and import them.

Diffusion is integrated with the other tools in the Phabricator suite. For instance:

  • when you commit Differential revisions to a tracked repository, they are automatically updated and linked to the corresponding commits;
  • you can add Herald rules to notify you about commits that match certain rules;
  • the Owners tool uses Diffusion to map repositories; and
  • in all the tools, commit names are automatically linked.

Repository Callsigns and Commit Names

Each repository is identified by a "callsign", which is a short uppercase string like "P" (for Phabricator) or "ARC" (for Arcanist).

Each repository must have a unique callsign. Callsigns must be unique within an install but do not need to be globally unique, so you are free to use the single-letter callsigns for brevity. For example, Facebook uses "E" for the Engineering repository, "O" for the Ops repository, "Y" for a Yum package repository, and so on, while Phabricator uses "P", "ARC", "PHU" for libphutil, and "J" for Javelin. Keeping callsigns brief will make them easier to use, and the use of one-character callsigns is recommended if they are reasonably evocative and you have no more than 26 tracked repositories.

The primary goal of callsigns is to namespace commits to SVN repositories: if you use multiple SVN repositories, each repository has a revision 1, revision 2, etc., so referring to them by number alone is ambiguous. However, even for Git they impart additional information to human readers and allow parsers to detect that something is a commit name with high probability (and allow distinguishing between multiple copies of a repository).

Diffusion uses this callsign and information about the commit itself to generate a commit name, like "rE12345" or "rP28146171ce1278f2375e3646a1e1ea3fd56fc5a3". The "r" stands for "revision". It is followed by the repository callsign, and then a VCS-specific commit identifier (for SVN, the commit number; for Git and Mercurial, the commit hash). When writing the name of a Git commit you may abbreviate the hash, but note that hash collisions are probable for short prefix lengths. See this post on the LKML for a historical explanation of Git's occasional internal use of 7-character hashes:

https://lkml.org/lkml/2010/10/28/287

Because 7-character hashes are likely to collide for even moderately large repositories, Diffusion generally uses either a 16-character prefix (which makes collisions very unlikely) or the full 40-character hash (which makes collisions astronomically unlikely).

Adding Repositories

Repository administration is accomplished through the "Repository" tool, which is primarily a set of administrative interfaces for Diffusion. To add a repository to Diffusion, you need to:

  • create a new repository in the Repository tool; and
  • start the daemons that will track and import the repository.

To create a new repository (or edit or delete an existing repository), you must be an administrator (see Configuring Accounts and Registration for instructions on making an existing account an administrator account). As an administrator, go to the Repository tool and you'll have the options to create or edit repositories.

When you create a new repository, you need to specify a human-readable name, a permanent "Callsign" (see previous section), and the underlying VCS type. Once you have created a repository, you can go to the "Tracking" tab and set up tracking in Diffusion.

Most of the options in the Tracking tab should be self-explanatory or are safe to leave at their defaults. In broad strokes, Diffusion tracks SVN repositories by issuing an "svn log" command periodically against the remote to look for new commits. It tracks Git and Mercurial repositories by cloning a local copy and issuing git fetch or hg pull periodically.

Once you've configured everything (and made sure Tracking is set to "Enabled"), you can launch the daemons to begin actually tracking the repository.

Running Diffusion Daemons

In most cases, it is sufficient to run:

phabricator/bin/ $ ./phd start

...to start the daemons. For a more in-depth explanation of phd and daemons, see Managing Daemons with phd.

NOTE: If you have an unusually large install with multiple web frontends, see notes in Managing Daemons with phd.

You can use the Daemon Console to monitor the daemons and their progress importing the repository. Small repositories should import quickly, while larger repositories may take some time (it takes about 10 minutes to begin discovering commits in Facebook's 350,000-commit primary repository, and about 18 hours to import it all with 64 taskmasters on modern hardware). Commits should begin appearing in Diffusion within a few minutes for all but the largest repositories.

Tuning Daemons

By default, Phabricator launches one daemon to pull and discover all of the tracked repositories. This works well for a small number of repositories or a large number of relatively inactive repositories, but might benefit from tuning in some cases. The daemon makes a rough effort to respect pull frequencies defined in repository configuration, but may not be able to import new commits very quickly if you have a large number of repositories (as it is blocked waiting on I/O from other repositories).

If you want to provide lower commit import latency for some repositories, you can either launch more daemons (which will generally lower latency for all repositories) or launch additional dedicated daemons (which will give you very fine-grained control over import latency).

More Daemons

The coarse approach to reducing import latency is to simply launch more daemons, using phd:

phabricator/bin $ ./phd launch RepositoryPullLocal

This will launch another copy of the daemon. The daemons acquire a global lock before pulling a repository, so you can launch additional daemons without causing contention or race conditions.

Dedicated Daemons

You can take a more fine-grained approach and launch dedicated daemons for specific repositories or groups of repositories. For example, if you want low latency on the repositories with callsigns A and B, but don't care about latency for the other repositories, you could launch two daemons like this:

phabricator/bin $ ./phd launch RepositoryPullLocal -- A B
phabricator/bin $ ./phd launch RepositoryPullLocal -- --not A --not B

The first one will work only on A and B, and should be able to import commits with low latency more reliably. The second one will work on all other repositories.

Next Steps