Page MenuHomePhabricator

Link Diviner books to repositories
Closed, ResolvedPublic

Description

I have like half a diff for this but figured I should ask for clarification before proceeding... I want to add repositories to Diviner books. Specifically, I want to add a repositoryPHID field to the DivinerLiveBook class... mainly for the purpose of providing more context.

My understanding of Diviner is limited, but I think that documentation for a book should always come from a single repository? If this is not the case then we would need to move this field to DivinerLiveAtom or DivinerLiveSymbol (I don't really understand the differences between these classes).

Revisions and Commits

Event Timeline

joshuaspence raised the priority of this task from to Needs Triage.
joshuaspence updated the task description. (Show Details)
joshuaspence added a project: Diviner.
joshuaspence added a subscriber: joshuaspence.

Adding the PHID: I think this is quite reasonable, since the vast majority of books have a meaningful value for the field, but it should be nullable (i.e., you should be able to generate a book from a random collection of text files on your local disk if you want) and that it's likely not the best primary organization mechanism for finding/listing books.

Multiple/No Source: If atoms in a book come from multiple sources, I think we should just use whatever repository the actual .book file itself is in (or whatever repository it explicitly configures, if we support that), or null if that's not available or not meaningful. For example, I could imagine a "huge_server.book" including some atoms from Git submodules, but it would make sense for it to still primarily belong to the parent repository where the .book file lived.

Listing Books in Diffusion: This isn't actually discussed here, but I wouldn't necessarily favor automatically listing all books for a repository in Diffusion. This might make sense, but I'd like to pull us in the direction of organizing information as "hubs with spokes", not "a cloud where all objects are totally connected to all other objects".

For example, I think it's good that tasks serve as a "hub" for items of work, and you go "up" from a revision/mock/commit to the task, then it lists all the spokes (other tasks, revisions, commits, mocks, etc), and you follow a link "outward/downward" from there. I generally think this is a better way to organize information than connecting everything to everything else.

Obviously, there are plenty of exceptions which are quite valuable and I don't plan to remove them or stop adding them (e.g., revision dependencies, commits attaching to revisions) but those exceptions generally have a pretty strong/obvious reason to exist. In cases where there's no strong reason, it's at-least-mostly-intentional that you can't create relationships: for example, you can't attach mocks to revisions, and I don't plan to ever support that.

A practical technical distinction is that with hub/spoke, N types of objects have about O(N) types of relationships (each object type has a relationship to the hub object type). With "cloud of connections", N types of objects have about O(N^2) types of relationships (each object type has a relationship to every other object type). This doesn't necessarily mean anything in practice since those relationships may be totally undifferentiated, but as the relationships become more differentiated the amount of work to build and maintain them potentially increases greatly in a "cloud" model. At a bare minimum, relationships in Phabricator tend to be differentiated at least as far as having unique translations for each relationship, which mounts up fast for even small values of "N". At least some desirable features depend on differentiating relationships more, too.

In GitHub, repositories are a "hub" object, but they currently aren't in Phabricator and I think that's good and the direction we want to continue forward in. Our hub objects are Tasks (which serve as a local/granular hub) and Projects (which serve as a global/coarse hub) instead.

Projects are fairly bad as hubs right now for objects other than tasks, although we've been making a small amount of progress forward there (e.g., T5558), but I anticipate us turning them into better hubs for other object types in the future.

Upshot being that I think objects should attach primarily to hub objects for organizational purposes, and only attach to other objects organizationally if a strong reason exists for the relationship. The Book/Repository relationship is somewhat-strong but doesn't necessarily feel strong enough to me to turn into a primary organizational relationship.

In concrete terms, I'd favor adding a "Documentation about this project" tab to Projects over adding a "documentation generated from this repository" UI to Diffusion.

Would it be valuable to record the repositoryPHID for DivinerLiveSymbol / DivinerLiveAtom as well? Furthermore, are you able to explain the distinction between these two classes.

I think that's reasonable.

The distinction is basically that some of Diviner is not Diviner-the-web-app, and LiveAtom is trying to just be a persist-able version of that object, i.e. Atom and LiveAtom are approximately the same object.

LiveSymbol is the web-app layer on top of that with a bunch of derived data, indexes, and a persistent PHID.

I think the distinction was mostly philosophical, but some of the ideas behind it are along the vein of:

  • If you delete a class, the non-web-app part expects to completely destroy the Atom, but the web-app part may want to keep the fact that the class previously existed around (e.g., maybe to preserve comments that we allow you to add later).
  • Specifically, I think some of the "keep PHIDs stable from update to update" code ended up being easier like this, especially in the presence of deletes.
  • If you want to load all the functions defined in some book, it's probably desirable to not have to load all of the actual text in their documentation, which is often hundreds of times larger, although maybe this rarely matters in practice.

(There's also a uri.source in Books which should probably get an implicit value if the repository is set but uri.source is not.)