Page MenuHomePhabricator

Remarkup rule to embed images
ClosedPublic

Authored by jcox on Sep 26 2016, 2:01 PM.
Tags
None
Referenced Files
F18807396: D16597.id39952.diff
Sun, Oct 19, 1:24 AM
F18777821: D16597.diff
Sat, Oct 11, 5:42 AM
F18763140: D16597.id39955.diff
Mon, Oct 6, 10:52 PM
F18748532: D16597.id39953.diff
Fri, Oct 3, 11:15 PM
F18743987: D16597.id39956.diff
Fri, Oct 3, 1:31 AM
F18743104: D16597.id39956.diff
Thu, Oct 2, 10:24 PM
F18740755: D16597.id.diff
Thu, Oct 2, 5:17 AM
F18740640: D16597.id39952.diff
Thu, Oct 2, 4:19 AM
Tokens
"Pterodactyl" token, awarded by yelirekim."Party Time" token, awarded by epriestley.

Details

Summary

Ref T4190. Added the remarkup rule to embed images:

Syntax is as follows:

{image <IMAGE_URL>}

Parameters are also supported, like:
{image uri=<IMAGE_URI>, width=500px, height=200px, alt=picture of a moose, href=google.com}

URLs without a protocol are not supported.

Test Plan

Tested with many of the syntax variations. If the provided URL doesn't point to an image, then a broken image icon will be shown.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

jcox retitled this revision from to Added the remarkup rule to embed images. Next step is actually using the imageproxy for it..
jcox updated this object.
jcox edited the test plan for this revision. (Show Details)
jcox retitled this revision from Added the remarkup rule to embed images. Next step is actually using the imageproxy for it. to Remarkup rule to embed images.Sep 26 2016, 2:02 PM
jcox updated this object.
jcox edited edge metadata.
src/applications/files/markup/PhabricatorImageRemarkupRule.php
8

I think this can just be 200 like other {rule <args>} rules. If that isn't right, other rules need to be fixed anyway so it at least makes it easy to fix.

The priority stuff mostly only matters when rules could produce a different result when matched in a different order, but the {rule <args>} rules are pretty unambiguous and just need to run early enough that this:

{rule name=**STARS**}

...gets interpreted as a "call" instead of "some garbage, then some bold text, then some more garbage".

Now using the imageproxy endpoint.

All of the following syntaxes work now:

{image http://i.imgur.com/nTvVrYN.jpg}
{img http://i.imgur.com/nTvVrYN.jpg}
{img uri=http://i.imgur.com/nTvVrYN.jpg}
{img src=http://i.imgur.com/nTvVrYN.jpg}
{img url=http://i.imgur.com/nTvVrYN.jpg}

If it encounters an error, it just returns the input text. So {img fffdsfsf=http://i.imgur.com/nTvVrYN.jpg} just returns {img fffdsfsf=http://i.imgur.com/nTvVrYN.jpg}.
It still doesn't handle the case where the protocol is excluded (actually I think that's an issue with the imageproxy controller itself).

src/applications/files/markup/PhabricatorImageRemarkupRule.php
38

I'm guessing there's a non-hardcoded way to get this

more succinct alias checking for src key

src/applications/files/markup/PhabricatorImageRemarkupRule.php
17

For consistency, use null as the non-value of a variable instead of empty string. This doesn't really cause any behavior differences in reasonable situations, it's just a convention thing.

A minor argument for it is that properties on objects are default-initialized to null, so default-initializing locals to null makes it less likely that refactoring (e.g., to move a local variable to an object property) will cause weird surprises.

If you typehint a method with a class like this:

public function doStuff(Triangle $t) { ... }

...you can also allow the method to accept null by adjusting the typehint:

public function doStuff(Triangle $t = null) { ... }

However, this method can never accept "" (empty string).

(These are pretty weak reasons to use null as the empty/unset value over "", but I don't think there are any real arguments in favor of "" over null.)

38

Not realllly. You can do some hugely complicated nonsense like this:

$application = PhabricatorApplication::blahBlahgetTheFilesApplication();
$uri = $application->getApplicationURI('imageproxy/?uri='.$uri);

But that only gets rid of the file/ hardcode. It's possible that we may add support for installing applications on user-selected paths eventually, which would make getting rid of this hardcode valuable, but we have no real plans to do that.

There's no way to say "get the URI for a particular controller" or, in the general case, "get the URI for a particular object" right now. This feels under-abstracted but doesn't cause too many issues in practice, and it isn't really clear to me where the responsibility for URI generation and resolution "should" be in these cases. There's some discussion somewhere that I'll see if I can dig up, and I think I made it halfway into an approach but wasn't too happy with how things were going.

So hardcoding this is fine until we come up with A Great URI Abstraction. If we do start letting you install applications on custom prefixes, it will probably be a per-application flag ("Can this application be remapped?") and we'll just turn it off for builtin apps to start with, then go through one-by-one and fix all this stuff and turn the flag on.

I think the value of such a flag is almost exclusively in letting you install two different third-party applications that decided to claim the path kanban/ or whatever, and not really relevant for first-party applications, since you'd have to be kind of silly to decide to put your third-party application at file/. Of course, you could pick something reasonable and we might write a conflicting application later, but moving the third-party application is probably the right resolution, ultimately.

Outside of all that, you should make sure $uri gets escaped -- as written, I think URIs which contain & or = won't work properly. You can do this with urisprintf() or new PhutilURI() ... ->setQueryParam().

T9231 has the discussion I was thinking of.

It's not hugely relevant here since this is getting the route for an arbitrary controller, while the discussion there is more about getting object URIs (edit, delete, etc).

I think EditEngine has chipped away at this somewhat (more routes are now automatically or centrally generated than were in the past) and it doesn't feel like a burning need to find a cleaner abstraction for this. Duplicate URI information isn't great and does cause some bugs, but they're usually pretty rare/minor and grep is usually good at finding almost all of the issues in the rare cases where we move stuff.

Use null rather than empty string.

As far as dealing with URLs where the protocol isn't specified I see a few possible approaches:

  1. Require that the user includes the protocol. The downside here is that things like {img funny.com/dog.png} won't work when really they probably should.
  2. If no protocol is specified, assume http. If that is broken, then either silently fail or display an error message to the user.
  3. Assume http, if that doesn't work, try https. If that fails then show an error message

I think #3 should probably just be handled by the domain that is hosting the image if they automatically want to support both. It's also possible (albeit unlikely) that the content served over https differs from that served over http.

#2 is probably going to work best for most cases. The main issues I can think of are "what if the image 404s" which is a case we have to deal with regardless.

src/applications/files/markup/PhabricatorImageRemarkupRule.php
57

Should a utility function like this go in libphutil? Or should I just leave it here for now?

If we assume a protocol, I tend to favor assuming "https": hopefully, this is the correct default most of the time in a few years. However, it might be more confusing than helpful today because HTTPS is not yet as widespread as it probably should be.

I think "it doesn't work" is also completely reasonable -- I'd guess that it's unlikely that users will manually type many URLs or otherwise lose the protocol. Offhand, I can't come up with a way that a reasonable user would end up typing funny.com/dog.jpg and not be better served by the Macros application.

I don't favor assuming "http" (I think this is suboptional from a security viewpoint, and hopefully moreso in a few years -- but it would break things to change the default then) or downgrading from "https" to "http" after a failure.

Added parameters for width, height, alt, layout, and href

jcox edited the test plan for this revision. (Show Details)
src/applications/files/markup/PhabricatorImageRemarkupRule.php
16

You should include the isFlatText() check from NavigationRemarkupRule, etc., here.

This check is a safety feature which protects us from re-evaluating text which has already been matched by another rule. Such text "should" never reach us, but if it does we can end up with security issues -- the text may contain replacement tokens, which we then put inside attributes and are later replaced by other rules with tags, so we end up with something like this:

<img src="<strong>some bold text</strong>" />

There's a short path from here to XSS.

The isFlatText() check makes sure that we haven't matched text containing tokens, avoiding this risk.

53

This should be validated as PhabricatorEnv::isValidURIForLink(). Although we have lower-level protections against javascript: URIs, this defuses weird future attacks with, e.g., unusual protocols that aren't yet known to be vulnerable (tel://, fax://, execute-as-root://) and are not in the uri.allowed-protocols whitelist.

If the value is not valid, ignore it.

54

I don't think this is a real HTML/CSS property. In {Fxxx}, it gets turned into some CSS classes I think. Maybe just drop it for now, width and height should be fine as-is.

Ideally, we should share layout/width/height code with the {Fxxx} file code, but we can consolidate that later.

66

You could probably trim() this too to handle this:

{img                http://...}

(lots of extra spaces, or a newline)

70

Unfairly excludes many fine TLDs including .museum, .horse, and .technology.

75

I don't think we really need most of this. This regexp probably gets a lot of weird edge cases wrong too: beyond the TLD issue, this is a perfectly cromulent URI which it will fail to recognize:

https://secure.phabricator.com/w/☃/

I think the password is optional, the username/password may contain a wider range of characters than the character sets above include, ports 1-9 are technically valid (probably? I guess I've never seen one in use), this matches silly port 00000 which is technically fine, the path can be full of all kinds of stuff like the snowman above (it looks like multiple / are not permitted?) and so on.

I think we're no worse off doing a very simple check for https?://. If users type stuff like this:

{img http://domain+com/blah/blah.jpg}

..and get a broken image because there's a "+" in the URL instead of a ".", when they could have gotten their text verbatim instead with a more aggressive regexp, I think that's perfectly fine.

(Also, fine to put this function here.)

If you do want to match more aggressively, you can just use new PhutilURI($string) and see if it parsed anything as getDomain() as a rough sanity check.

Updated based on CR comments

epriestley added a reviewer: epriestley.

Cool, this looks good to me.

src/applications/files/markup/PhabricatorImageRemarkupRule.php
75

You could probably just inline this regex now for a mild improvement in clarity.

Prefer \z over $ for "end of string" -- $ matches end of string and also "end-of-string, except for a newline". We had some obscure security issues with this in the past (e.g., permitting registration of email addresses with a terminal newline). That is, this regex:

/^[a-z]+$/

...matches the string "dog" but also the string "dog\n". With \z, only "dog" matches.

Writing this as '~^'.$regex.'\z~i' might make it more clear which parts are or aren't variables, although just merging the variable into the string is probably reasonable now.

This revision is now accepted and ready to land.Sep 26 2016, 8:48 PM
jcox edited edge metadata.

Clean up regex

This revision was automatically updated to reflect the committed changes.