Details

Reviewers

epriestley

Group Reviewers

Blessed Reviewers

Maniphest Tasks

T4190: Rebuild remarkup image proxying

Commits

rP26b29a60c0f8: Remarkup rule to embed images

Summary

Ref T4190. Added the remarkup rule to embed images:

Syntax is as follows:

{image <IMAGE_URL>}

Parameters are also supported, like:
{image uri=<IMAGE_URI>, width=500px, height=200px, alt=picture of a moose, href=google.com}

URLs without a protocol are not supported.

Test Plan

Tested with many of the syntax variations. If the provided URL doesn't point to an image, then a broken image icon will be shown.

Diff Detail

Repository

rP Phabricator

Branch

RemarkupImageRule (branched from master)

Lint

Lint Passed

Unit

Tests Passed

Build Status

Buildable 13843
Build 17901: Run Core Tests
Build 17900: arc lint + arc unit

Event Timeline

jcox updated this revision to Diff 39949.Sep 26 2016, 2:01 PM

jcox retitled this revision from to Added the remarkup rule to embed images. Next step is actually using the imageproxy for it..

jcox updated this object.

jcox edited the test plan for this revision. (Show Details)

jcox added a task: T4190: Rebuild remarkup image proxying.

Herald added a reviewer: Blessed Reviewers. · View Herald TranscriptSep 26 2016, 2:01 PM

Herald added subscribers: yelirekim, epriestley. · View Herald Transcript

jcox retitled this revision from Added the remarkup rule to embed images. Next step is actually using the imageproxy for it. to Remarkup rule to embed images.Sep 26 2016, 2:02 PM

jcox updated this object.

jcox edited edge metadata.

epriestley added inline comments.Sep 26 2016, 2:05 PM

src/applications/files/markup/PhabricatorImageRemarkupRule.php
8	I think this can just be `200` like other `{rule <args>}` rules. If that isn't right, other rules need to be fixed anyway so it at least makes it easy to fix. The priority stuff mostly only matters when rules could produce a different result when matched in a different order, but the `{rule <args>}` rules are pretty unambiguous and just need to run early enough that this: {rule name=STARS} ...gets interpreted as a "call" instead of "some garbage, then some bold text, then some more garbage".

Now using the imageproxy endpoint.

All of the following syntaxes work now:

{image http://i.imgur.com/nTvVrYN.jpg}
{img http://i.imgur.com/nTvVrYN.jpg}
{img uri=http://i.imgur.com/nTvVrYN.jpg}
{img src=http://i.imgur.com/nTvVrYN.jpg}
{img url=http://i.imgur.com/nTvVrYN.jpg}

If it encounters an error, it just returns the input text. So {img fffdsfsf=http://i.imgur.com/nTvVrYN.jpg} just returns {img fffdsfsf=http://i.imgur.com/nTvVrYN.jpg}.
It still doesn't handle the case where the protocol is excluded (actually I think that's an issue with the imageproxy controller itself).

jcox added inline comments.Sep 26 2016, 3:40 PM

src/applications/files/markup/PhabricatorImageRemarkupRule.php
38	I'm guessing there's a non-hardcoded way to get this

more succinct alias checking for src key

epriestley added inline comments.Sep 26 2016, 3:54 PM

src/applications/files/markup/PhabricatorImageRemarkupRule.php
17	For consistency, use `null` as the non-value of a variable instead of empty string. This doesn't really cause any behavior differences in reasonable situations, it's just a convention thing. A minor argument for it is that properties on objects are default-initialized to `null`, so default-initializing locals to `null` makes it less likely that refactoring (e.g., to move a local variable to an object property) will cause weird surprises. If you typehint a method with a class like this: public function doStuff(Triangle $t) { ... } ...you can also allow the method to accept `null` by adjusting the typehint: public function doStuff(Triangle $t = null) { ... } However, this method can never accept `""` (empty string). (These are pretty weak reasons to use `null` as the empty/unset value over `""`, but I don't think there are any real arguments in favor of `""` over `null`.)
38	Not realllly. You can do some hugely complicated nonsense like this: $application = PhabricatorApplication::blahBlahgetTheFilesApplication(); $uri = $application->getApplicationURI('imageproxy/?uri='.$uri); But that only gets rid of the `file/` hardcode. It's possible that we may add support for installing applications on user-selected paths eventually, which would make getting rid of this hardcode valuable, but we have no real plans to do that. There's no way to say "get the URI for a particular controller" or, in the general case, "get the URI for a particular object" right now. This feels under-abstracted but doesn't cause too many issues in practice, and it isn't really clear to me where the responsibility for URI generation and resolution "should" be in these cases. There's some discussion somewhere that I'll see if I can dig up, and I think I made it halfway into an approach but wasn't too happy with how things were going. So hardcoding this is fine until we come up with A Great URI Abstraction. If we do start letting you install applications on custom prefixes, it will probably be a per-application flag ("Can this application be remapped?") and we'll just turn it off for builtin apps to start with, then go through one-by-one and fix all this stuff and turn the flag on. I think the value of such a flag is almost exclusively in letting you install two different third-party applications that decided to claim the path `kanban/` or whatever, and not really relevant for first-party applications, since you'd have to be kind of silly to decide to put your third-party application at `file/`. Of course, you could pick something reasonable and we might write a conflicting application later, but moving the third-party application is probably the right resolution, ultimately. Outside of all that, you should make sure `$uri` gets escaped -- as written, I think URIs which contain `&` or `=` won't work properly. You can do this with `urisprintf()` or `new PhutilURI() ... ->setQueryParam()`.

T9231 has the discussion I was thinking of.

It's not hugely relevant here since this is getting the route for an arbitrary controller, while the discussion there is more about getting object URIs (edit, delete, etc).

I think EditEngine has chipped away at this somewhat (more routes are now automatically or centrally generated than were in the past) and it doesn't feel like a burning need to find a cleaner abstraction for this. Duplicate URI information isn't great and does cause some bugs, but they're usually pretty rare/minor and grep is usually good at finding almost all of the issues in the rare cases where we move stuff.

Use null rather than empty string.

As far as dealing with URLs where the protocol isn't specified I see a few possible approaches:

Require that the user includes the protocol. The downside here is that things like {img funny.com/dog.png} won't work when really they probably should.
If no protocol is specified, assume http. If that is broken, then either silently fail or display an error message to the user.
Assume http, if that doesn't work, try https. If that fails then show an error message

I think #3 should probably just be handled by the domain that is hosting the image if they automatically want to support both. It's also possible (albeit unlikely) that the content served over https differs from that served over http.

#2 is probably going to work best for most cases. The main issues I can think of are "what if the image 404s" which is a case we have to deal with regardless.

src/applications/files/markup/PhabricatorImageRemarkupRule.php
57	Should a utility function like this go in libphutil? Or should I just leave it here for now?

If we assume a protocol, I tend to favor assuming "https": hopefully, this is the correct default most of the time in a few years. However, it might be more confusing than helpful today because HTTPS is not yet as widespread as it probably should be.

I think "it doesn't work" is also completely reasonable -- I'd guess that it's unlikely that users will manually type many URLs or otherwise lose the protocol. Offhand, I can't come up with a way that a reasonable user would end up typing funny.com/dog.jpg and not be better served by the Macros application.

I don't favor assuming "http" (I think this is suboptional from a security viewpoint, and hopefully moreso in a few years -- but it would break things to change the default then) or downgrading from "https" to "http" after a failure.

Added parameters for width, height, alt, layout, and href

jcox updated this object.Sep 26 2016, 7:30 PM

jcox edited the test plan for this revision. (Show Details)

epriestley added inline comments.Sep 26 2016, 8:20 PM

src/applications/files/markup/PhabricatorImageRemarkupRule.php
16	You should include the `isFlatText()` check from NavigationRemarkupRule, etc., here. This check is a safety feature which protects us from re-evaluating text which has already been matched by another rule. Such text "should" never reach us, but if it does we can end up with security issues -- the text may contain replacement tokens, which we then put inside attributes and are later replaced by other rules with tags, so we end up with something like this: <img src="<strong>some bold text</strong>" /> There's a short path from here to XSS. The `isFlatText()` check makes sure that we haven't matched text containing tokens, avoiding this risk.
53	This should be validated as `PhabricatorEnv::isValidURIForLink()`. Although we have lower-level protections against `javascript:` URIs, this defuses weird future attacks with, e.g., unusual protocols that aren't yet known to be vulnerable (`tel://`, `fax://`, `execute-as-root://`) and are not in the `uri.allowed-protocols` whitelist. If the value is not valid, ignore it.
54	I don't think this is a real HTML/CSS property. In `{Fxxx}`, it gets turned into some CSS classes I think. Maybe just drop it for now, `width` and `height` should be fine as-is. Ideally, we should share layout/width/height code with the `{Fxxx}` file code, but we can consolidate that later.
66	You could probably `trim()` this too to handle this: {img http://...} (lots of extra spaces, or a newline)
70	Unfairly excludes many fine TLDs including `.museum`, `.horse`, and `.technology`.
75	I don't think we really need most of this. This regexp probably gets a lot of weird edge cases wrong too: beyond the TLD issue, this is a perfectly cromulent URI which it will fail to recognize: https://secure.phabricator.com/w/☃/ I think the password is optional, the username/password may contain a wider range of characters than the character sets above include, ports 1-9 are technically valid (probably? I guess I've never seen one in use), this matches silly port `00000` which is technically fine, the path can be full of all kinds of stuff like the snowman above (it looks like multiple `/` are not permitted?) and so on. I think we're no worse off doing a very simple check for `https?://`. If users type stuff like this: {img http://domain+com/blah/blah.jpg} ..and get a broken image because there's a "+" in the URL instead of a ".", when they could have gotten their text verbatim instead with a more aggressive regexp, I think that's perfectly fine. (Also, fine to put this function here.) If you do want to match more aggressively, you can just use `new PhutilURI($string)` and see if it parsed anything as `getDomain()` as a rough sanity check.

Updated based on CR comments

Cool, this looks good to me.

src/applications/files/markup/PhabricatorImageRemarkupRule.php
75	You could probably just inline this regex now for a mild improvement in clarity. Prefer `\z` over `$` for "end of string" -- `$` matches end of string and also "end-of-string, except for a newline". We had some obscure security issues with this in the past (e.g., permitting registration of email addresses with a terminal newline). That is, this regex: /^[a-z]+$/ ...matches the string `"dog"` but also the string `"dog\n"`. With `\z`, only `"dog"` matches. Writing this as `'~^'.$regex.'\z~i'` might make it more clear which parts are or aren't variables, although just merging the variable into the string is probably reasonable now.

This revision is now accepted and ready to land.Sep 26 2016, 8:48 PM

epriestley awarded a token.Sep 26 2016, 8:48 PM

Clean up regex

Herald added a subscriber: Korvin. · View Herald TranscriptSep 26 2016, 9:37 PM

Closed by commit rP26b29a60c0f8: Remarkup rule to embed images (authored by jcox, committed by jcox). · Explain WhySep 26 2016, 9:46 PM

This revision was automatically updated to reflect the committed changes.

yelirekim awarded a token.Sep 26 2016, 9:57 PM

Remarkup rule to embed images
ClosedPublic
Actions

Details

Diff Detail

Event Timeline

Revision Contents
Changeset List

Diff 39954

src/__phutil_library_map__.php

src/applications/files/application/PhabricatorFilesApplication.php

src/applications/files/markup/PhabricatorImageRemarkupRule.php

Remarkup rule to embed imagesClosedPublicActions

Details

Diff Detail

Event Timeline

Revision ContentsChangeset List

Diff 39954

src/__phutil_library_map__.php

src/applications/files/application/PhabricatorFilesApplication.php

src/applications/files/markup/PhabricatorImageRemarkupRule.php

Remarkup rule to embed images
ClosedPublic
Actions

Revision Contents
Changeset List