Page MenuHomePhabricator

Support diffs between abstract block lists in the UI
Closed, ResolvedPublic

Description

See PHI1453. See PHI1459. See PHI456. See PHI452. See T13105. See T13414.

Various interests are served by abstracting diffs and diff presentation so that the unit of change is a "block" rather than a "line of text". An immediate application is sensible UI diffing of Jupyter notebooks. A related application is more scalable prose diffs.

This is not conceptually hard because we can transform a "list of blocks" into a "list of lines of text" by hashing each block (and this is what diff does internally anyway, more or less). There's just a lot of mapping and separation of concepts ("block content" vs "block hash" vs "text line"; "line number" vs "block identifier") that are currently conflated.

There may be some technical limits later -- for example, inlines are associated with a numeric "line number", not a freeform "block identifier". There may be a path out of this with clever mapping. In the case where the block is a 3D model and we want to attach an inline to a vertex or something we probably can't just do clever mapping, but this is a different flavor of abstraction; for now, we're only going as far as block lists with a well-defined one-dimensional order.

There's a support issue somewhere about inline on specific offsets within a line which might arise here, but hopefully that can be scoped separately. The ideal abstraction is that inlines have a "block identifier" (normally, a "line number") and then a separate "location within the block", which might be "characters 9-13" for a line of text or "<12, 34, 29>" for a 3D model.

My planned path is:

  • Allow DocumentEngine to elect into diff rendering.
  • Elect the image engine into diff rendering.
  • Make the image engine emit a generic block list. Make the ChangesetParser render a block list though the Renderer.
    • The "image stage" code moves to the image engine (or the block list if reuse is interesting).
    • The "scaffolding" code moves to "renderer->renderBlockList()".

Then, in whatever order is easiest -- these steps don't have obvious dependencies:

  • Between emission of the block list and rendering, diff the block list.
  • Make inlines work on blocks emitted as part of a block list.
  • Elect the Jupyter engine into diff rendering and emit Jupyter books as a block list.

Revisions and Commits

rPHU libphutil
D20837
rP Phabricator
D20959
D20897
D20866
D20866
D20865
D20851
D20848
D20845
D20844
D20843
D20840
D20838
D20836
D20835
D20834
D20833
D20832
D20831
D20830

Event Timeline

epriestley triaged this task as Normal priority.Sep 25 2019, 4:19 PM
epriestley created this task.
  • TwoUp image comments aren't triggering (also in master).
  • OneUp image comments need some cell span adjustments.

Screen Shot 2019-09-25 at 10.08.10 AM.png (671×272 px, 27 KB)

In an effort to "do no harm", I'm planning to add a "Render with Document Engine..." option to the View Options dropdown next. This will let you (for example) view a Jupyter notebook as a raw source diff if you want, if the "fancy" diff is broken or unhelpful for some reason, so you always have an escape hatch back to a lower level representation.

Porting inline comments across representations seems hopeless. In the future, the preferred representation may be a 3D model viewer, and the lower level representation may be a hexdump. Porting a comment at <13, 14, 15> in three-dimensional space into an offset in the hexdump of a model file is meaningless.

For now, I'm just going to let the comments do whatever they want (at least, as long as it isn't "fatal"). In the future, I'll probably hide comments on the non-preferred representation with a note ("6 inline comments on the "Jupyter Notebook" view of this file are hidden because you're viewing it as "Source".") or maybe stick them at the bottom or in a popup or something. In the far future, we probably need to store which representation an inline appears on, in case the preferred representation changes because of configuration changes. We could then make the non-preferred configurations read-only, or, I guess, let users hide comments on different representations to create a fun puzzle for reviewers.

  • The options in the "View As..." dropdown are exhaustive, and most do not work, because they aren't based on the changeset being rendered (so we'll give you an option to render a Jupyter notebook as audio, for example). This isn't trivial to fix and it isn't terribly important for this to function as an escape hatch back to old behavior.
  • Since we expect most documents to have a relatively small number of options here, a list of clickable options might be better than a <select /> dropdown.
  • There's also no "render as native source" option, but there is a "View as Source" option, which doesn't work. Gotcha!
  • Differential shows a "this file is big, so syntax highlighting is disabled by default" warning even when a document engine which does not use syntax highlighting renders the document.

Jupyter as blocks, no diffing or inlines yet:

Screen Shot 2019-09-25 at 12.02.00 PM.png (1×1 px, 435 KB)

I think the major remaining issues are:

  • OneUp does not perform block layout properly (we want to try to group sequences of "-" and "+" lines together to make things more readable).
  • OneUp does not render blocks properly.
  • I'm not sure what inline comments do in mail. I'm aiming for "not broken" but not sure if we're hitting that bar yet.
  • When lines of context are folded, you can't expand them. This is currently "not broken" but would be nice to fix.

Oh, and maybe lack of source code intraline diffs, although that's likely not too much work.

Context isn't automatically unfolding around inlines, but should.

The bulk of this work is done and I think there's nothing unique and actionable left here. This is survived by T13642 and other issues.