Page MenuHomePhabricator

Plans: Rich presentation and diff rendering pipelines for various file types
Open, NormalPublic

Description

Some file types (like Markdown and SVG) are text (or binary) files which can be rendered into some sort of document for presentation. For example, an SVG file is a text file but it can be rendered into an image.

In particular, we can render these files when:

  • viewing a file in Diffusion;
  • viewing a file in Files;
  • viewing an attachment on any object, maybe with {Fxxx, ...} options;
  • perhaps in Pholio, although this is maybe not v1; and
  • we could automatically expose these to the inline {{{ ... }}} remarkup renderers.

Additionally, some files can be rendered into a visual diff format in a richer way than just showing two renderings side-by-side. We can potentially render a rich diff when:

  • viewing a commit in Diffusion; or
  • viewing a revision in Differential.

A good example of a rich diff is GitHub's image diffing feature, which has modes to overlay two images on top of one another:

https://help.github.com/articles/rendering-and-diffing-images/

Another good example is GitHub's STL model support:

Even when we can't render a dedicated diff of an file, rendering the old file and the new file may still be better than rendering the raw text diff, although perhaps not always.

For example, one document type where a rendered old vs new may be worse than a text diff is Markdown: if a large section was deleted, added, or moved, the two documents won't line up at all and it may be very difficult to tell what was changed, while a plain text diff may actually be more clear.

A possible related concern is that some files might diff better in Diffusion/Differential using the prose diff algorithm than using the source diff algorithm.


The first step here is to add some sort of modular DocumentRenderingEngine (or something) which can elect to render a given document or a diff between two documents. Files, attachments, the lightbox, Diffusion file views, Diffusion commit diffs, and Differential revisions should all engage this engine, test for support, and provide the user some way to select between rendering options.

In particular, multiple engines should be able to support a given document and the user should be able to choose between rendering engines. One example where this might be useful is is formats like SVG: SVG can be rendered as an image, or it could be rendered by a generic XML renderer. A binary file format could be rendered as an actual document (say, an image or 3D model) or a hexdump -C style listing, or a decoded format-specific text variant.

For sophisticated rendered diffs, we will probably need to sacrifice inline comment support in most cases, at least for now. It's possible that we could eventually support different commenting modes on different renderings but this seems like a substantial undertaking. Comments can still work on the raw text version, we just can't reasonably map a comment on line 32 of an XML file into a point in 3D space when the file is rendered as a model. We could put comments on other views of the file above/below the rendering or just show an indicator that they exist.


Once the engine infrastructure works, we can add support for different file types.

A general concern here is that many file types are complicated to render and may not have secure rendering engines. SVG is a good example here: although browsers can all render SVG, rendering SVG safely is very hard (see T6445). LaTeX is similar (T6609) with major security concerns in the most popular rendering engines. We've previously rewritten figlet and cowsay with in-process/first-party code (see T7785) and removed Graphviz support because we were unable to secure it or port it (T9408).

Many rendering tools were written assuming that input documents are safe and were written by a trusted local user. Often, these tools can be exploited when that assumption is violated with a malicious input. (From T9408, a researcher found a buffer overflow accessible from the command line in figlet, a toy program for drawing big letter using keyboard symbols.)

Thus, upstreamable rendering implementations for some file types may be substantial undertakings where we either need to carefully audit a large external codebase or write a first-party version of the rendering pipeline. For specialized diffs, it's likely that we need first-party pipelines in most cases.

The existence of this engine and pipeline will open up opportunities for third-party code to provide insecure engines, but approaches based largely on faith that a 200,000 line external has no attack surface isn't upstreamable, so just keep it to yourself. 🙈 🙉 🙊


In the past, we've seen interest in:

I don't currently plan to implement any of these since they're all complicated and have little or no mana behind them relative to implementation complexity.

These formats also seem conceptually reasonable, although not necessarily valuable:

  • Remarkup (standalone is easy; diffs are hard).
  • Normal images like PNG, GIF and JPEG (standalone is trivial but should probably be implemented via engine infrastructure; diffs aren't trivial but may be valuable).
  • Binary as hexdump (standalone and diff are both easy but value is questionable).
  • Prose diff, maybe.
  • Raw binary (it would be nice if the fallback renderer that just says "we can't do anything with this file" was modular, not special cased).
  • XML-as-DOM? JSON/YAML-as-DOM? We could render these formats in an expandable tree view and perform more-semantic diffing on them, but it's not clear that this is very useful relative to the amount of work it likely involves. Viewing these files as text is usually pretty good already, assuming they've been beautified by whatever wrote them. Maybe "Beautify JSON" is reasonable, but it's not clear this really solves a problem anyone has.

See PHI452. An install would like support for PCB schematic files (.sch); layer files (.brd, GDS, Gerber).

See PHI452. See PHI456. Multiple installs are interested in Jupyter notebooks.

Revisions and Commits

rP Phabricator
D19457
D19414
D19378
D19377
D19350
D19349
D19348
D19346
D19314
D19313
D19312
D19311
D19310
D19309
D19307
D19306
D19305
D19302
D19301
D19300
D19274
D19273
AuditedD19260
AuditedD19259
D19258
D19256
D19254
D19253
D19252
D19251
D19239
D19238
D19237

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I really like this, Remarkup idea, I think there are many text based files that people add to reviews which would be easily transformed to something more readable by being transfer directly to Remarkup, even for example something as simple as CSV,

but also potentially something much more complex like MIF (Framemaker) or even the HTML that might be part of a HtmlHelp file. so I could see using this rendering engine as a basis for other possible engines

I've personally already customized an engine to transform a proprietary graph format into an SVG diagram with some success, I'm interested to see how easy it will be for the render engine to be used in Diffusion and Differential

Super excited about this plan..

Changeset rPfb4ce851c4ae: Add a PDF document "rendering" engine allows to view PDFs from the Files application. However, {Fxxx, ...} references to a PDF file in Phriction still only allow to download the file or open a lightbox with only an icon (again with a download link).

Are there plans to add PDF rendering similar to Files also to Phriction ? This would be really helpful.

Junk I'm about to break in Diffusion (but plan to restore):

  • Line numbers are a mess: they don't read off DiffusionRequest properly and the URI doesn't update correctly.
  • Disable syntax highlighting for very large files (should move to document engine)
  • Symbol linking
  • Mobile?
  • Blame
  • Also blame for large files?
  • Coverage
  • Lint messages
  • Many dangling/leftover behaviors and maybe styles?
  • If a file is non-UTF8, we should try the repository's "Encoding" as a default (or maybe just remove this feature).

Compatibility:

  • The secret $1-2,7-9 syntax for highlighting multiple noncontiguous blocks of lines in Diffusion is no longer supported.

Still needs work:

  • Degrading and error messages when blaming large files.
  • "Skip Past This Commit" doesn't retain line numbers.
  • When we know the commit created the file, disable the "skip past this commit" stuff.
  • Symbol highlighting has some glitchiness in Safari, e.g. typehints.
  • Coverage.
  • Lint messages.
  • General cleanup of old behaviors?

Now OK?

  • Tooltips on revisions links are being built wrong.
  • Swapping away from source and back doesn't populate blame out of cache properly.
  • When lines 15, 16, 17... all have the same blame, they should only show the links on the first line.
  • Line number coloration for age (older = darker; newer = brighter).
  • CSS is a little funky, e.g. the "/" in "commit / revision" is off-baseline.
  • The "line-linker" behavior for mapping rows to line numbers is a little sketchy.
  • Line-linker behavior isn't jumping to the linked lines, but Diffusion previously did.
  • Source view rendering in Paste has a missing background color.

Coverage.

See T13125 for a more detailed breakout of coverage plans.

From PHI604, for completeness, on the newer behavior of "Hide Blame":

I'd like to avoid restoring a per-user toggle preference for the default [show/hide blame] behavior. One reason for this is that it meant that links produce materially different pages for different people: if you turned off the old preference, you may see a page without blame. You send someone a link to it, but they have the preference on and see a page with blame (or vice versa). For example, your innocent link might be perceived as finger-pointing when your recipient sees blame, although you didn't. Or your link may be confusing to your recipient when you discuss blame/history information but they can't see it. The new behavior means that your recipient will see the same page with the same information you do.

If the "Hide Blame" option isn't enough, we could look at other options here (maybe per-repository or per-install "disable blame by default" modes, although I really hate adding options) but I'd like to avoid returning to a per-user setting here because I want links to look the same to different viewers.