Changeset View
Standalone View
src/applications/files/markup/PhabricatorImageRemarkupRule.php
- This file was added.
<?php | |||||
final class PhabricatorImageRemarkupRule extends PhutilRemarkupRule { | |||||
public function getPriority() { | |||||
return 200.0; | |||||
} | |||||
public function apply($text) { | |||||
epriestley: I //think// this can just be `200` like other `{rule <args>}` rules. If that isn't right, other… | |||||
return preg_replace_callback( | |||||
'@{(image|img) ((?:[^}\\\\]+|\\\\.)*)}@m', | |||||
array($this, 'markupImage'), | |||||
$text); | |||||
} | |||||
public function markupImage(array $matches) { | |||||
epriestleyUnsubmitted Not Done Inline ActionsYou should include the isFlatText() check from NavigationRemarkupRule, etc., here. This check is a safety feature which protects us from re-evaluating text which has already been matched by another rule. Such text "should" never reach us, but if it does we can end up with security issues -- the text may contain replacement tokens, which we then put inside attributes and are later replaced by other rules with tags, so we end up with something like this: <img src="<strong>some bold text</strong>" /> There's a short path from here to XSS. The isFlatText() check makes sure that we haven't matched text containing tokens, avoiding this risk. epriestley: You should include the `isFlatText()` check from NavigationRemarkupRule, etc., here.
This… | |||||
$args = array(); | |||||
$defaults = array( | |||||
Not Done Inline ActionsFor consistency, use null as the non-value of a variable instead of empty string. This doesn't really cause any behavior differences in reasonable situations, it's just a convention thing. A minor argument for it is that properties on objects are default-initialized to null, so default-initializing locals to null makes it less likely that refactoring (e.g., to move a local variable to an object property) will cause weird surprises. If you typehint a method with a class like this: public function doStuff(Triangle $t) { ... } ...you can also allow the method to accept null by adjusting the typehint: public function doStuff(Triangle $t = null) { ... } However, this method can never accept "" (empty string). (These are pretty weak reasons to use null as the empty/unset value over "", but I don't think there are any real arguments in favor of "" over null.) epriestley: For consistency, use `null` as the non-value of a variable instead of empty string. This… | |||||
'uri' => null, | |||||
'alt' => null, | |||||
'href' => null, | |||||
'layout' => null, | |||||
'width' => null, | |||||
'height' => null, | |||||
); | |||||
if ($this->isURI($matches[2])) { | |||||
$args['uri'] = new PhutilURI($matches[2]); | |||||
} else { | |||||
$parser = new PhutilSimpleOptions(); | |||||
$keys = $parser->parse($matches[2]); | |||||
$uri_key = ''; | |||||
foreach (array('src', 'uri', 'url') as $key) { | |||||
if (array_key_exists($key, $keys)) { | |||||
$uri_key = $key; | |||||
} | |||||
} | |||||
if ($uri_key) { | |||||
$args['uri'] = new PhutilURI($keys[$uri_key]); | |||||
Not Done Inline ActionsI'm guessing there's a non-hardcoded way to get this jcox: I'm guessing there's a non-hardcoded way to get this | |||||
Not Done Inline ActionsNot realllly. You can do some hugely complicated nonsense like this: $application = PhabricatorApplication::blahBlahgetTheFilesApplication(); $uri = $application->getApplicationURI('imageproxy/?uri='.$uri); But that only gets rid of the file/ hardcode. It's possible that we may add support for installing applications on user-selected paths eventually, which would make getting rid of this hardcode valuable, but we have no real plans to do that. There's no way to say "get the URI for a particular controller" or, in the general case, "get the URI for a particular object" right now. This feels under-abstracted but doesn't cause too many issues in practice, and it isn't really clear to me where the responsibility for URI generation and resolution "should" be in these cases. There's some discussion somewhere that I'll see if I can dig up, and I think I made it halfway into an approach but wasn't too happy with how things were going. So hardcoding this is fine until we come up with A Great URI Abstraction. If we do start letting you install applications on custom prefixes, it will probably be a per-application flag ("Can this application be remapped?") and we'll just turn it off for builtin apps to start with, then go through one-by-one and fix all this stuff and turn the flag on. I think the value of such a flag is almost exclusively in letting you install two different third-party applications that decided to claim the path kanban/ or whatever, and not really relevant for first-party applications, since you'd have to be kind of silly to decide to put your third-party application at file/. Of course, you could pick something reasonable and we might write a conflicting application later, but moving the third-party application is probably the right resolution, ultimately. Outside of all that, you should make sure $uri gets escaped -- as written, I think URIs which contain & or = won't work properly. You can do this with urisprintf() or new PhutilURI() ... ->setQueryParam(). epriestley: Not realllly. You can do some hugely complicated nonsense like this:
```lang=php
$application… | |||||
} | |||||
$args += $keys; | |||||
} | |||||
$args += $defaults; | |||||
if ($args['uri']) { | |||||
$src_uri = id(new PhutilURI('/file/imageproxy/')) | |||||
->setQueryParam('uri', (string)$args['uri']); | |||||
$img = $this->newTag( | |||||
'img', | |||||
array( | |||||
'src' => $src_uri, | |||||
'alt' => $args['alt'], | |||||
'href' => $args['href'], | |||||
epriestleyUnsubmitted Not Done Inline ActionsThis should be validated as PhabricatorEnv::isValidURIForLink(). Although we have lower-level protections against javascript: URIs, this defuses weird future attacks with, e.g., unusual protocols that aren't yet known to be vulnerable (tel://, fax://, execute-as-root://) and are not in the uri.allowed-protocols whitelist. If the value is not valid, ignore it. epriestley: This should be validated as `PhabricatorEnv::isValidURIForLink()`. Although we have lower-level… | |||||
'layout' => $args['layout'], | |||||
epriestleyUnsubmitted Not Done Inline ActionsI don't think this is a real HTML/CSS property. In {Fxxx}, it gets turned into some CSS classes I think. Maybe just drop it for now, width and height should be fine as-is. Ideally, we should share layout/width/height code with the {Fxxx} file code, but we can consolidate that later. epriestley: I don't think this is a real HTML/CSS property. In `{Fxxx}`, it gets turned into some CSS… | |||||
'width' => $args['width'], | |||||
'height' => $args['height'], | |||||
)); | |||||
return $this->getEngine()->storeText($img); | |||||
Not Done Inline ActionsShould a utility function like this go in libphutil? Or should I just leave it here for now? jcox: Should a utility function like this go in libphutil? Or should I just leave it here for now? | |||||
} else { | |||||
return $matches[0]; | |||||
} | |||||
} | |||||
// This can probably be simplified a ton. Even if kept as-is, this probably | |||||
// isn't the right place to put it | |||||
private function isURI($uri_string) { | |||||
epriestleyUnsubmitted Not Done Inline ActionsYou could probably trim() this too to handle this: {img http://...} (lots of extra spaces, or a newline) epriestley: You could probably `trim()` this too to handle this:
```
{img http://...}
```… | |||||
$regex = '(https?\:\/\/)?'; // protocol | |||||
// username and password. Can maybe just remove this piece. | |||||
$regex .= '([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?'; | |||||
$regex .= '([a-z0-9-.]*)\.([a-z]{2,3})'; // Host or IP | |||||
epriestleyUnsubmitted Not Done Inline ActionsUnfairly excludes many fine TLDs including .museum, .horse, and .technology. epriestley: Unfairly excludes many fine TLDs including `.museum`, `.horse`, and `.technology`. | |||||
$regex .= '(\:[0-9]{2,5})?'; // Port | |||||
$regex .= '(\/([a-z0-9+\$_-]\.?)+)*\/?'; // Path | |||||
$regex .= '(\?[a-z+&\$_.-][a-z0-9;:@&%=+\/\$_.-]*)?'; // Query string | |||||
$regex .= '(#[a-z_.-][a-z0-9+\$_.-]*)?'; // Anchor | |||||
epriestleyUnsubmitted Not Done Inline ActionsI don't think we really need most of this. This regexp probably gets a lot of weird edge cases wrong too: beyond the TLD issue, this is a perfectly cromulent URI which it will fail to recognize: https://secure.phabricator.com/w/☃/ I think the password is optional, the username/password may contain a wider range of characters than the character sets above include, ports 1-9 are technically valid (probably? I guess I've never seen one in use), this matches silly port 00000 which is technically fine, the path can be full of all kinds of stuff like the snowman above (it looks like multiple / are not permitted?) and so on. I think we're no worse off doing a very simple check for https?://. If users type stuff like this: {img http://domain+com/blah/blah.jpg} ..and get a broken image because there's a "+" in the URL instead of a ".", when they could have gotten their text verbatim instead with a more aggressive regexp, I think that's perfectly fine. (Also, fine to put this function here.) If you do want to match more aggressively, you can just use new PhutilURI($string) and see if it parsed anything as getDomain() as a rough sanity check. epriestley: I don't think we really need most of this. This regexp probably gets a lot of weird edge cases… | |||||
return preg_match("~^$regex$~i", $uri_string); | |||||
Not Done Inline ActionsYou could probably just inline this regex now for a mild improvement in clarity. Prefer \z over $ for "end of string" -- $ matches end of string and also "end-of-string, except for a newline". We had some obscure security issues with this in the past (e.g., permitting registration of email addresses with a terminal newline). That is, this regex: /^[a-z]+$/ ...matches the string "dog" but also the string "dog\n". With \z, only "dog" matches. Writing this as '~^'.$regex.'\z~i' might make it more clear which parts are or aren't variables, although just merging the variable into the string is probably reasonable now. epriestley: You could probably just inline this regex now for a mild improvement in clarity.
Prefer `\z`… | |||||
} | |||||
} |
I think this can just be 200 like other {rule <args>} rules. If that isn't right, other rules need to be fixed anyway so it at least makes it easy to fix.
The priority stuff mostly only matters when rules could produce a different result when matched in a different order, but the {rule <args>} rules are pretty unambiguous and just need to run early enough that this:
...gets interpreted as a "call" instead of "some garbage, then some bold text, then some more garbage".