Page MenuHomePhabricator

In Differential, expose "featurename (branched from master)" via Conduit
Closed, WontfixPublic

Description

Currently, we record the branch name of a diff when "arc diff" is run, and show it in Differential like:

Branch: examplebranch

It would be nice to also record the most recent remote-tracking branch name in history, if it's available (maybe with some reasonable cutoff for performance, so we only examine the N most recent commits to try to find a remote-tracking branch). When a remote-tracking branch is found, display something like this:

Branch: examplebranch (branched from master)

Event Timeline

epriestley triaged this task as Normal priority.Jun 27 2013, 6:12 PM
epriestley added subscribers: epriestley, csilvers.

This also relates to T3216, as it gives us a hint for which branch to jump to in Diffusion.

From IRC, we can often get the right answer by checking which branch the current branch tracks, but can't in two cases:

  • If the branch was forked without --track, which is common (since it's an extra / non-default flag).
  • If the branch is two or more levels deep, e.g. master -> newfeature -> newerfeature.

fwiw, I don't think the branch depth is a problem. When I run git branch -vv, I get the following output:

auto-inject  c26373c [master: ahead 2, behind 1630] description...
csilvers     1a68a53 [origin/csilvers] description...
master       01cea52 [origin/master] description...
  • sort-plurals 8e1a3e0 [master: ahead 1] description... sort-plurals-subbranch 8e1a3e0 [sort-plurals] description... upload-html dc79310 [master] description...

(All these were created using --track.) It should be easy to parse this output to go from sort-plurals-subbrance -> sort-plurals -> master -> origin/master, where we stop because of the /.

There's no guarantee that "sort-plurals" exists in the working copy (for example, it may already have been landed and then deleted).

This would also be helpful in making sure that, for instance, you're developing on the right branch for the right project.

avivey changed the visibility from "All Users" to "Public (No Login Required)".Oct 13 2015, 10:03 PM

This would be really helpful when reviewing code (having context about where its going especially which multiple release branch scenario). It would also be really helpful when building revisions in a build system like Jenkins where different branches need to be built differently (different SDK's etc) and getting from revision to what branch its going into is non trivial at the moment.

After D14737, we will now show this information in the UI if it's available. Here are the requirements:

  • the revision must have been made with arc diff (we can't know/guess it if you copy/paste a random diff into the web UI, for example);
  • arc must be D14736 or newer (and must have been up to date when arc diff was run; older versions do not upload this data);
  • Phabricator must be D14737 or newer (but only needs to be up to date when the revision is viewed; older versions to not display this data);
  • the repository must be a git repository;
  • the diff must have come from a branch in that repository (not a tag or a detached HEAD or --raw).

If these conditions are met, we identify a target branch by trying these rules:

  • if the branch tracks a remote branch (or tracks a local branch which tracks a remote branch, etc), we use that branch;
  • otherwise, if arc.land.onto.default is set, we use that branch.

You can use arc feature, git checkout --track, git branch --track, git branch --set-upstream-to, or branch.autoSetupMerge in Git config to make sure tracking is getting set up.

In particular, we do not currently try to guess which branch you mean by examining repository history. We may do this in the future, but it will bump into a lot of ambiguous cases and would likely give us much greater exposure to the issues in T9898, and I think almost everyone probably wants tracking, it's just not on by default.

See "Branch" in this screenshot for an example:

Screen Shot 2015-12-10 at 3.03.02 PM.png (171×420 px, 28 KB)

@epriestley can this information (branched from master) be exposed via conduit as well? Seems like a simple addition to differential.query

avivey renamed this task from Show "Branch: featurename (branched from master)" in Differential to Show "Branch: featurename (branched from master)" in Differential UI, conduit.Dec 1 2016, 9:29 PM
avivey added a project: Differential.
epriestley renamed this task from Show "Branch: featurename (branched from master)" in Differential UI, conduit to In Differential, expose "featurename (branched from master)" via Conduit.Apr 21 2017, 11:43 AM
epriestley lowered the priority of this task from Normal to Wishlist.
epriestley claimed this task.

Actually, I'm just going to WONTFIX this. We may actually fix it, but I don't currently see a legitimate use case for it described anywehre.

Particularly, this task doesn't describe a use case, and T10272 describes a very odd use case, where a build system depends on branch names. My expectation is that build systems should not need to know branch names to function, and that building commit abcdef1234 should always produce the same build result regardless of which branch refs abcdef1234 is an ancestor of at the time of the build. Parameterizing builds on branch names strikes me as sufficiently bizarre that I think it is insufficient to motivate this feature in the upstream.

If you have a use case for this, feel free to file a new task clearly describing that use case, or explaining why parameterizing builds on branch names is actually a great idea that we should heartily endorse.

@epriestley calling it a odd use case is a little rough. It's quiet common in the systems programming world. Especially when you are building operating systems. The main reason for the branch naming scheme is that we have multiple sdks being produced (one for each version of the OS). There are multiple versions being developed for release (master may have moved on to far future work and it's sdk is no longer compatible when building another branch that branched because a release is coming soon). Even the sdk is being used is built iteratively from the results of other projects completing their build.

It's a nice convention that evey project going into the sdk follows so we know what branches to pull and build for that release. Sometimes we have to go to a very old branch from a year ago to patch a new dot release to fix an urgent issue on that OS version and that branch then needs to build against he much older sdk. I hope this helps clear up the need and I understand it's definetly different from what you may be used to building web apps but building and operating system is its own beast.

We adopted this convention (instead of a file containing the mappings or settings etc) because we have many many major versions of the OS we have to support with fixes and it's for all developers to just create branches with a specific scheme instead of constantly updating a file everywhere or having to look in a file to know where this code will go.

So when you cut a new minor release of your OS (say, v9.9.7), you go to every single library/project you depend on, cut a minor release there with the exact same version number, and create a branch with exactly the same name?

(If not, how does OSv977 know that it depends on HTTP-v224, Scheduler-v416, Documentation-v84, etc?)

So happy that this is now being discussed again. I have almost the same use case is jcarrillo7.

There's two parts to the process (being transparent here). For official builds we submit tags to the "real" build team for a given release line. The submission info contains any new dependency information etc. Teams tend to generate these tags from branches with names matching the release code name.

Our problems stems from attempting to build reviews before they land. Since we don't know what branch they are associated with really (intended to be landed into), we don't know which sdk to pick. If we pick the wrong sdk we can fail to build. This causes noise and people end up trusting the build system less. Currently we try to guess by looking at the ancestors and guess where you branched from. Sometimes we guess wrong or pick to broad of a target (if you match multiple sdks then you generate multiple concurrent builds). Master branch currently submits to 3 different release trains for us so any commits there get built 3 times for 3 different platforms. Our central build team has information for submissions that will "forward" into other platforms as well so we can query them to determine some of this data.

The real problem is really just building reviews.

We already build reviews after they land since it's now obvious to our build system what it needs to do.

So branch names are basically a key into a big map somewhere else which stores the actual complicated dependency information in a normal way and is probably backed by a file on disk somewhere?

Can you determine this yourself by doing this?

  • Walk backward from HEAD until you find a commit which is also a branch head.
  • If the commit is the head of exactly one branch, you're done: you know which version you should be building against.
  • If the commit is the head of two or more branches, two different versions of your software are exactly the same, which is presumably an error.

This makes it impossible to build after branch heads move forward, but presumably you already have a solution for that or you would be unable to repeat builds in the general case.

Dependency information between projects in the real build team is kept for the general project a needs b and c and when that new dependency is introduced. They can update this before a build by looking for any new dylib/static lib/header dependencies that were not there before and looking up what project produces that file. They do not have a version level dependency since teams don't always coordinate like that unless an API breakage is coming. Even then the build team may roll back a projects submission if it causes other teams to fail to build in the official nightlies.

Our issue is with building when heads move forward. The case you listed is the simple case to detect and we already do this.

What is bad though is that this is assuming that the branch point somewhere in the past is where you intend to land which is mostly true. But really what we care about is where you intend to land (not trying to guess it by proxy) and phabricator captures this information at arc diff time.

Is exposing this information in conduit which Phabricator already stores that bad?

Today, how do you build any commit which isn't a branch head (for example, to bisect a bug to a commit which introduced it)?

For someone debugging on their desk they will normally build with the latest sdk for that release or if they suspect some issue related to build then they will use the sdk generated from when that commit was built in the official (internal) release.

Commits may be on multiple releases but issues are reported from a platform that is associated with a particular sdk/release. Engineers work backwards from there. In addition, for us every commit message has a tracking ticket in it as well and the official build system updates those tickets with information on where it got incorporated when etc.

For reproduceable builds we only need to be able to reproduce official builds and for that the central build team stores all the tags and projects submitted to each build for a given train.

Is exposing this information in conduit which Phabricator already stores that bad?

Phabricator is built on the assumption that a build is only a function of a particular repository tree, build = f(tree). I don't want to bring features upstream which violate that assumption or give different parts of the software different models of how things work.

It sounds like the only use case for this feature is defining builds in terms of build = f(tree, branch name). Even if this model is the right model for your use case, I think this model is not a good model in the general case. It's also not how any general-purpose dependency management system I'm aware of works, across a variety of languages. But maybe the systems I'm aware of are all just toys for building web apps and all serious systems programmers use branch names to manage dependencies, but this is a very hard sell for me. Perhaps I'll have a change of heart eventually.

You can add this feature yourself locally with a few lines of code, but Phabricator fundamentally has a different model of what a build is a function of, so I'd expect you will continue to encounter mismatches between Phabricator's model and your model of builds.

I have a related question regarding this quote:

My expectation is that build systems should not need to know branch names to function ...... Parameterizing builds on branch names strikes me as sufficiently bizarre that I think it is insufficient to motivate this feature in the upstream.

My assumption is that branches and refs are relatively interchangeable and a lightweight tag (used for PUSH STAGING in arc) is functionally the same as a branch.

Given all of the above, why is repository.staging.ref provided as a Harbormaster build variable? We consume DifferentialDiffs as our HarbormasterBuildableInterface but that means that buildable.commit is null. We don't even have the option to parameterize on anything except the ref. Are we doing something wrong by using DifferentialDiff as our interface?

That was unclear: we can parameterize on lots of things, just not the commit hash, which is conceivably what you want us to parameterize on.

Given all of the above, why is repository.staging.ref provided as a Harbormaster build variable?

Maybe I'm misrembering/misunderstanding, but it's so you can fetch just the ref you need? If git fetch <commit hash> worked we'd just give you that, but git fetch only accepts refs. If we gave you a hash you'd have to fetch all refs. This is how we do the fetch in Drydock (fetch exactly the ref):

https://secure.phabricator.com/source/phabricator/browse/master/src/applications/drydock/blueprint/DrydockWorkingCopyBlueprintImplementation.php;3a608bae7b7127b937b143cd1b0ab27eefe009a9$288-291

But maybe the systems I'm aware of are all just toys for building web apps and all serious systems programmers use branch names to manage dependencies, but this is a very hard sell for me. Perhaps I'll have a change of heart eventually.

Did not mean to imply that web apps are not serious business. Sorry if I came off that way (just re read my comments in hindsight). I *think* the difference comes mainly from the fact that in a lot of scripting languages and modern dependency management tools each repo is responsible for building itself and listing its dependencies so the manager can fulfill them from some central package repository or what not.

The difference when building something like an operating system (general systems programming I *think*) is that you explicitly cannot trust the repo to build itself correctly (due to platform specific settings shared among all projects going into the build or other "rules" that need to be enforced in order to produce a useable OS). This now ends up being enforced by the central build team. Hence the repos lack any builtin way of knowing how to build themselves and must then rely on querying the external build system. This is all fine most of the time except for building reviews before they are submitted anywhere since then have to figure out how to query the build system for the right build parameters.

Is this making a bit more sense? I think this is why we are viewing the world a little differently and I really do believe this isn't a model Phabricator should shy away from since it can alienate a lot of potential use cases for systems programming.

We could reasonably give you the hash we expect the ref to contain, and should do this anyway as a correctness check to make sure someone hasn't --force pushed something else over it to do something tricky.

(Although that gets into a whole other mess.)

Also, projects in this world are not allowed to explicitly manage their dependencies since other teams need to submit newer and newer versions and you NEED to build against whatever code they submit to the build. I think Phabricator is approaching build with the assumption that you can use a dependency manager like pip/composer/name-your-favorite-here and that a project can explicitly manage its dependencies and their versions which is just not true when building any OS.