Page MenuHomePhabricator

Plans: Rich presentation and diff rendering pipelines for various file types
Open, NormalPublic

Description

Some file types (like Markdown and SVG) are text (or binary) files which can be rendered into some sort of document for presentation. For example, an SVG file is a text file but it can be rendered into an image.

In particular, we can render these files when:

  • viewing a file in Diffusion;
  • viewing a file in Files;
  • viewing an attachment on any object, maybe with {Fxxx, ...} options;
  • perhaps in Pholio, although this is maybe not v1; and
  • we could automatically expose these to the inline {{{ ... }}} remarkup renderers.

Additionally, some files can be rendered into a visual diff format in a richer way than just showing two renderings side-by-side. We can potentially render a rich diff when:

  • viewing a commit in Diffusion; or
  • viewing a revision in Differential.

A good example of a rich diff is GitHub's image diffing feature, which has modes to overlay two images on top of one another:

https://help.github.com/articles/rendering-and-diffing-images/

Another good example is GitHub's STL model support:

Even when we can't render a dedicated diff of an file, rendering the old file and the new file may still be better than rendering the raw text diff, although perhaps not always.

For example, one document type where a rendered old vs new may be worse than a text diff is Markdown: if a large section was deleted, added, or moved, the two documents won't line up at all and it may be very difficult to tell what was changed, while a plain text diff may actually be more clear.

A possible related concern is that some files might diff better in Diffusion/Differential using the prose diff algorithm than using the source diff algorithm.


The first step here is to add some sort of modular DocumentRenderingEngine (or something) which can elect to render a given document or a diff between two documents. Files, attachments, the lightbox, Diffusion file views, Diffusion commit diffs, and Differential revisions should all engage this engine, test for support, and provide the user some way to select between rendering options.

In particular, multiple engines should be able to support a given document and the user should be able to choose between rendering engines. One example where this might be useful is is formats like SVG: SVG can be rendered as an image, or it could be rendered by a generic XML renderer. A binary file format could be rendered as an actual document (say, an image or 3D model) or a hexdump -C style listing, or a decoded format-specific text variant.

For sophisticated rendered diffs, we will probably need to sacrifice inline comment support in most cases, at least for now. It's possible that we could eventually support different commenting modes on different renderings but this seems like a substantial undertaking. Comments can still work on the raw text version, we just can't reasonably map a comment on line 32 of an XML file into a point in 3D space when the file is rendered as a model. We could put comments on other views of the file above/below the rendering or just show an indicator that they exist.


Once the engine infrastructure works, we can add support for different file types.

A general concern here is that many file types are complicated to render and may not have secure rendering engines. SVG is a good example here: although browsers can all render SVG, rendering SVG safely is very hard (see T6445). LaTeX is similar (T6609) with major security concerns in the most popular rendering engines. We've previously rewritten figlet and cowsay with in-process/first-party code (see T7785) and removed Graphviz support because we were unable to secure it or port it (T9408).

Many rendering tools were written assuming that input documents are safe and were written by a trusted local user. Often, these tools can be exploited when that assumption is violated with a malicious input. (From T9408, a researcher found a buffer overflow accessible from the command line in figlet, a toy program for drawing big letter using keyboard symbols.)

Thus, upstreamable rendering implementations for some file types may be substantial undertakings where we either need to carefully audit a large external codebase or write a first-party version of the rendering pipeline. For specialized diffs, it's likely that we need first-party pipelines in most cases.

The existence of this engine and pipeline will open up opportunities for third-party code to provide insecure engines, but approaches based largely on faith that a 200,000 line external has no attack surface isn't upstreamable, so just keep it to yourself. 🙈 🙉 🙊


In the past, we've seen interest in:

I don't currently plan to implement any of these since they're all complicated and have little or no mana behind them relative to implementation complexity.

These formats also seem conceptually reasonable, although not necessarily valuable:

  • Remarkup (standalone is easy; diffs are hard).
  • Normal images like PNG, GIF and JPEG (standalone is trivial but should probably be implemented via engine infrastructure; diffs aren't trivial but may be valuable).
  • Binary as hexdump (standalone and diff are both easy but value is questionable).
  • Prose diff, maybe.
  • Raw binary (it would be nice if the fallback renderer that just says "we can't do anything with this file" was modular, not special cased).
  • XML-as-DOM? JSON/YAML-as-DOM? We could render these formats in an expandable tree view and perform more-semantic diffing on them, but it's not clear that this is very useful relative to the amount of work it likely involves. Viewing these files as text is usually pretty good already, assuming they've been beautified by whatever wrote them. Maybe "Beautify JSON" is reasonable, but it's not clear this really solves a problem anyone has.

See PHI452. An install would like support for PCB schematic files (.sch); layer files (.brd, GDS, Gerber).

See PHI452. See PHI456. Multiple installs are interested in Jupyter notebooks.

Details

Commits
D19457 / rPa894c9993512: Add "max-width: 100%;" to stop large images from overflowing the new rendering…
D19414 / rPafc3099ee785: Add a view option to disable blame in Diffusion and fix some view transition…
D19378 / rP665529ab60a1: Restore coverage reporting to Diffusion browse UI
D19377 / rP21bb0215dbc7: Remove obsoleted "diffusion-browse-file" behavior for coverage
D19350 / rP37a03402bc44: When following a link to a particular line ("/example.txt$12"), scroll to that…
D19349 / rP5b3a351852a7: Use pseudoelements, not Zero Width Space, to implement copy/paste behavior in…
D19348 / rPc5c53e277a28: Make line selection in source code views less fragile and more consistent
D19346 / rP55619e89642b: Restore an explicit white background color to files in Paste
D19314 / rP472bc3d90a59: Colorize lines in blame under DocumentEngine, to show relative age of changes
D19313 / rPcf75d63b49a7: When lines 12, 13, 14, etc all blame to the same change, only show it once
D19312 / rPeb80f0a2d95d: When you swap between document rendering engines, populate or redraw blame if…
D19311 / rPeca7dc25f20b: Use javelin_tag(), not phutil_tag(), to render revision blame tooltips properly
D19310 / rP11664277b33d: Make DocumentEngine source line linking behavior better when blame is shown
D19309 / rP09c6d42b9599: Mostly make blame work with DocumentEngine
D19307 / rP90a614778c07: Make repository symbol references work with DocumentEngine
D19306 / rP0363febeb2c6: Disable default syntax highlighting for large files in DocumentEngine
D19305 / rP6dea2ba3b37f: Fix DocumentEngine line behaviors in Diffusion
D19302 / rP1fde4a945061: Move Diffusion browse rendering to DocumentEngine, breaking almost all features
D19301 / rP245132a0b22e: Pull file Document Engine rendering out of "Files" application controllers
D19300 / rP7d4e25614d00: Remove the ability to disable blame in Diffusion
D19274 / rP7189cb7ba816: Support text encoding and syntax highlighting options in document rendering
D19273 / rPccbc8a430f62: Make Jupyter notebooks use the fast builtin Python highlighter
D19260 / rPc5b244bfd04f: Render directly embedded image data represented as a string in Jupyter notebooks
D19259 / rPb7d3101e7cbf: Minor document rendering fixes: dropdown for synchronous files, URI…
D19258 / rP38999e25ac59: Support logged-out access to the document rendering endpoint
D19256 / rPbba1b185f834: Improve minor client behaviors for document rendering
D19254 / rPd2727d24da6e: Add an abstract "Text" document engine and a "Source" document engine
D19253 / rPcbf3d3c37154: Add a very rough, proof-of-concept Jupyter notebook document engine
D19252 / rPfb4ce851c4ae: Add a PDF document "rendering" engine
D19251 / rP8b658706a8c4: Add a basic Remarkup document rendering engine
D19239 / rP4aafce686202: Add filesize limits for document rendering engines and support partial/complete…
D19238 / rPf646153f4dbe: Add an async driver for document rendering and a crude "Hexdump" document engine
D19237 / rP01f22a8d06c9: Roughly modularize document rendering in Files

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I really like this, Remarkup idea, I think there are many text based files that people add to reviews which would be easily transformed to something more readable by being transfer directly to Remarkup, even for example something as simple as CSV,

but also potentially something much more complex like MIF (Framemaker) or even the HTML that might be part of a HtmlHelp file. so I could see using this rendering engine as a basis for other possible engines

I've personally already customized an engine to transform a proprietary graph format into an SVG diagram with some success, I'm interested to see how easy it will be for the render engine to be used in Diffusion and Differential

Super excited about this plan..

NMeral added a subscriber: NMeral.Mar 26 2018, 2:22 PM
NMeral removed a subscriber: NMeral.

Changeset rPfb4ce851c4ae: Add a PDF document "rendering" engine allows to view PDFs from the Files application. However, {Fxxx, ...} references to a PDF file in Phriction still only allow to download the file or open a lightbox with only an icon (again with a download link).

Are there plans to add PDF rendering similar to Files also to Phriction ? This would be really helpful.

Junk I'm about to break in Diffusion (but plan to restore):

  • Line numbers are a mess: they don't read off DiffusionRequest properly and the URI doesn't update correctly.
  • Disable syntax highlighting for very large files (should move to document engine)
  • Symbol linking
  • Mobile?
  • Blame
  • Also blame for large files?
  • Coverage
  • Lint messages
  • Many dangling/leftover behaviors and maybe styles?
  • If a file is non-UTF8, we should try the repository's "Encoding" as a default (or maybe just remove this feature).

Compatibility:

  • The secret $1-2,7-9 syntax for highlighting multiple noncontiguous blocks of lines in Diffusion is no longer supported.
epriestley added a comment.EditedApr 9 2018, 12:15 PM

Still needs work:

  • Degrading and error messages when blaming large files.
  • "Skip Past This Commit" doesn't retain line numbers.
  • When we know the commit created the file, disable the "skip past this commit" stuff.
  • Symbol highlighting has some glitchiness in Safari, e.g. typehints.
  • Coverage.
  • Lint messages.
  • General cleanup of old behaviors?

Now OK?

  • Tooltips on revisions links are being built wrong.
  • Swapping away from source and back doesn't populate blame out of cache properly.
  • When lines 15, 16, 17... all have the same blame, they should only show the links on the first line.
  • Line number coloration for age (older = darker; newer = brighter).
  • CSS is a little funky, e.g. the "/" in "commit / revision" is off-baseline.
  • The "line-linker" behavior for mapping rows to line numbers is a little sketchy.
  • Line-linker behavior isn't jumping to the linked lines, but Diffusion previously did.
  • Source view rendering in Paste has a missing background color.

Coverage.

See T13125 for a more detailed breakout of coverage plans.

From PHI604, for completeness, on the newer behavior of "Hide Blame":

I'd like to avoid restoring a per-user toggle preference for the default [show/hide blame] behavior. One reason for this is that it meant that links produce materially different pages for different people: if you turned off the old preference, you may see a page without blame. You send someone a link to it, but they have the preference on and see a page with blame (or vice versa). For example, your innocent link might be perceived as finger-pointing when your recipient sees blame, although you didn't. Or your link may be confusing to your recipient when you discuss blame/history information but they can't see it. The new behavior means that your recipient will see the same page with the same information you do.
If the "Hide Blame" option isn't enough, we could look at other options here (maybe per-repository or per-install "disable blame by default" modes, although I really hate adding options) but I'd like to avoid returning to a per-user setting here because I want links to look the same to different viewers.

Pawka added a subscriber: Pawka.Feb 6 2019, 3:03 PM