Page MenuHomePhabricator

Allow full URIs to be rendered as object references
AbandonedPublic

Authored by epriestley on May 4 2015, 11:25 AM.
Tags
None
Referenced Files
Unknown Object (File)
Thu, Jan 2, 2:29 AM
Unknown Object (File)
Wed, Dec 18, 2:34 AM
Unknown Object (File)
Mon, Dec 16, 5:16 PM
Unknown Object (File)
Sun, Dec 15, 6:45 AM
Unknown Object (File)
Fri, Dec 13, 2:13 AM
Unknown Object (File)
Thu, Dec 12, 2:22 AM
Unknown Object (File)
Thu, Dec 12, 12:13 AM
Unknown Object (File)
Thu, Dec 12, 12:12 AM

Details

Summary

Fixes T5378. Allow object references to optionally include the full URI. This means that http://phabricator.example.com/T123 is rendered equivalently to T123.

Test Plan
= These work as expected, rendering as objects =
D1
{D1}
http://phabricator.local/T1
{https://phabricator.local/T1}
HTTP://PHABRICATOR.LOCAL/D1
http://phabricator.local/rARC977baacc329ce690d4e5c0304f4c50e352e7644f
#foobar

= These are not very well defined =
http://phabricator.local/T1.html

= These work as expected, rendering as hyperlinks =
http://phabricator.local/
http://phabricator.local/xyz
http://phabricator.local/xyz/D1
http://phabricator.local/T1xyz
http://phabricator.local/?id=T1&key=value
http://secure.phabricator.com/T1
http://phabricator.local/D1/D2

= These don't work =
'#foobar'
http://phabricator.local/#foobar
http://phabricator.local/tag/foobar
http://phabricator.local/project/view/1/
http://phabricator.local/diffusion/TEST/

Diff Detail

Repository
rP Phabricator
Branch
master
Lint
Lint Passed
SeverityLocationCodeMessage
Advicesrc/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php:175XHP14Misuse of preg_quote()
Unit
No Test Coverage
Build Status
Buildable 5729
Build 5748: [Placeholder Plan] Wait for 30 Seconds

Event Timeline

joshuaspence retitled this revision from to Allow full URIs to be rendered as object references.
joshuaspence updated this object.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence added a reviewer: epriestley.
joshuaspence added a subscriber: epriestley.

Ideally, it would be nice to run this rule before the URI rule so we don't have to do the URI exception stuff. Particularly, if you type http://install.com/path/to/something/ (that is, an install URI but not a monogram), this approach as written will leave it unlinked.

You can run this rule first by adjusting getPriority() to a number smaller than the hyperlink priority. This may have the undesirable side effect of causing us to start matching monograms in URIs (or in other weird contexts, maybe), but perhaps that isn't too big of an issue and we can avoid it by being more careful with the regexps.

If that isn't fruitful, my next best idea is to let rules reach into the matched hyperlink list and update the tokens. We don't have any rules like this right now, but theoretically it should be possible to do something like:

$uri_rule = $this->getEngine()->getTheRuleThatMatchesLinks();
foreach ($uri_rule->getMatchedURIs() as $key => $uri) {
  if (the $uri is on this install) {
    $uri_rule->useThisTokenInsteadOfTheDefault($key, $pretty_monogram_link);
  }
}

This would probably make hard for us to get {http://.../} to work, but seems conceptually cleaner if we're willing to give that up (I think it's reasonable for it to not work, but I also think it's reasonable to consider it important for it to work).

joshuaspence edited edge metadata.

Increase priority

// These work as expected.
D1
http://phabricator.local/
http://phabricator.local/D1
http://phabricator.local/foobar
http://phabricator.local/T1xyz

// These don't work.
http://phabricator.local/D1/D2
http://phabricator.local/T1.html
http://phabricator.local/?id=T1&key=value
src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
204

We can fix some of those by adding = and / to this blacklist of lookbehind characters.

204

Probably also ? and & for a.html?T1 and a.html?a=1&T1=true.

src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
175

I suspect we need to match "a bunch of stuff that looks like it's probably a URI here", and then check if it's a URI we care about in the callback, so that these work:

  • URIs with uppercase letters.
  • http vs https.
  • All URIs in phabricator.allowed-uris.

Probably something like "some non-whitespace stuff", then "://", then "some more stuff with no whitespace", then "a slash"?

// These work as expected.
D1
http://phabricator.local/
http://phabricator.local/D1
http://phabricator.local/xyz
http://phabricator.local/xyz/D1
http://phabricator.local/T1xyz
http://phabricator.local/?id=T1&key=value
http://secure.phabricator.com/T1
https://phabricator.local/D1
HTTP://PHABRICATOR.LOCAL/D1

// These don't work.
http://phabricator.local/D1/D2 (specifically D1)
http://phabricator.local/T1.html
src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
281

Err, ignore this :P

http://phabricator.local/T1.html is going to be difficult to get right because it could be used as follows:

http://phabricator.local/T1.this is a new sentence.

I think it's correct/reasonable that http://blah/T1.blah doesn't work -- it wouldn't work as a normal URI either.

Hmm, I seem to have broken {D1} somehow.

Ah, this also causes a unit test to fail (I noticed this whilst submitting D12694):

   FAIL  ProjectRemarkupRuleTestCase::testProjectObjectRemarkup
Assertion failed, expected values to be equal (at ProjectRemarkupRuleTestCase.php:119): I like #ducks.
Expected vs Actual Output Diff
--- Old Value
+++ New Value
@@ -1,18 +1,19 @@
 Array
 (
     [embed] => Array
         (
         )
 
     [ref] => Array
         (
             [0] => Array
                 (
-                    [offset] => 8
-                    [id] => ducks
+                    [offset] => -1
+                    [id] => 
+                    [tail] => ducks
                 )
 
         )
 
 )
// These work as expected.
D1
{D1}
http://phabricator.local/
http://phabricator.local/D1
http://phabricator.local/xyz
http://phabricator.local/xyz/D1
http://phabricator.local/T1xyz
http://phabricator.local/?id=T1&key=value
http://secure.phabricator.com/T1
https://phabricator.local/D1
HTTP://PHABRICATOR.LOCAL/D1
http://phabricator.local/T1.html
{http://phabricator.local/D1}
I like #foobar.

// These don't work.
http://phabricator.local/D1/D2 (specifically D1)
I like '#foobar'.
joshuaspence edited the test plan for this revision. (Show Details)
joshuaspence edited the test plan for this revision. (Show Details)

What are your thoughts on this?I think that this approach is worthwhile and that we can probably fix the remaining edge cases.

On another note... Would landing this generate mentions retroactively or would it only generate mentions for new comments?

Yeah, this looks like it's going to work well to me.

This won't generate mentions retroactively, only for new comments.

One thing that might make this easier is a way to easily resolve URI to monogram. Maybe we could achieve this by adding a monogram.query Conduit method (somewhat similar to the phid.lookup and phid.query methods that we have currently.

What problem are you running into which would be solved by a URI -> monogram mapping? It looked like this is basically in a working state? Or is there more stuff that came up which turns out to be harder?

I think that it is maybe wrong to assume that http://phabricator.example.com/$x is the URL for an object with the monogram of $x. I guess this works for most cases, but it fails for project hashtags. That is http://phabricator.example.com/#some_project shouldn't be parsed as being equivalent to #some_project. Really, I guess this is only a problem for URLs which are not very meaningful anyway (e.g. I don't expect that anyone would ever link to http://phabricator.example.com/#some_project).

Additionally, have a convenient way to map URLs to objects would allow us to do the following:

  • Map http://phabricator.local/project/view/1/ to #some_project.
  • Map http://phabricator.local/tag/some_project to #some_project.
  • Map http://phabricator.local/diffusion/TEST to rTEST.
  • Maybe be more complex things like mapping http://phabricator/config/blah to some nav sequence.

Yeah, I think those cases aren't very important, and could probably be handled by adding a few special rules later on.

@epriestley, this definitely needs some more work in its current form... but what do you think about my revised approach?

FireShot Capture - ⚓ T1 Remarkup Test - http___phabricator.local_T1#7.png (991×990 px, 197 KB)

Oh, sorry: I think this is totally reasonable and we should definitely move forward with it. The only thing I'd like to adjust before landing it is supporting all URIs in phabricator.allowed-uris, rather than only phabricator.base-uri.

(I think it's also possible to run remarkup without having base-uri set, so we might need to test it for null.)

Sure, but I personally still prefer my latest approach on this. One big reason is that it handles http://phabricator.local/T1?workflow=create (I don't think my previous attempt would handle this case).

Oh, maybe I'm misunderstanding -- is this diff not the revised approach? What's the revised approach?

Oh derp. I forgot to actually update the diff.

Make everything more complicated. This seems to be cleaner, albeit more complicated.

Roughly, the idea here is that if something looks like a URL, then take the path of the URL and try to process it as an object reference. If it doesn't match an object reference pattern, then render it as a URL (by proxying to the PhutilRemarkupHyperlinkRule rule).

joshuaspence added inline comments.
src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
11

Oh this isn't used... I was planning to add strings which had been matched as a URL (but then rejected because the URL didn't match an object) so that we could reject them in the markupObjectEmbed and markupObjectReference methods. But I don't think that we need to do this.

152–153

We could maybe add markupObjectHyperlinkEmbed(), but I don't think it would be used very often as the users who are more likely to use hyperlinks instead of object references probably don't know about the embed syntax anyway.

211

This was copied from PhutilRemarkupHyperlinkRule.

311–317

This is a little odd and I wanted to just do $this->apply($path) unconditionally... but the issue was that I couldn't return whether $path had actually been matched by $this->getObjectEmbedPattern() or $this->getObjectReferencePattern().

321–322

I don't understand why return $rule->apply($matches[0]) doesn't work here.

src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
312

I think a link like http://secure.phabricator.com/T1 should display like this in the grey bubble:

[ http://secure.phabricator.com/T1 ]

...not like this:

[ T1 ]

That is, we should recognize it and link it up, but visually retain the fact that it's a URL reference, not a monogram reference, so the actual bubble text is the URL, not just the monogram.

This is probably a little worse for new/lazy users, but gets us most of the actual core value (mentions, hovercards, etc) and I think it's better for experienced users (specifically, it can be frustrating when you type text which includes a URL and your meaning is altered by summarization of the token, and retraining the URL avoids that). Generally, this reduces the frustration the rule can cause in cases where we get things wrong.

319–322

Do we need to do this at all? Can we just return $matches[0] and let the other rule pick it up later?

src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
211

If you prefer less magic, you can remove MAGIC_BYTE from the regex -- matching strings which contain a MAGIC_BYTE will be rejected by isFlatText() for not being flat text. The MAGIC_BYTE stuff predates the concept of "flat text". But this is also fine as-is.

src/infrastructure/markup/rule/PhabricatorObjectRemarkupRule.php
312

The idea is that the domain checking happens a few lines before this:

if (strtolower($uri->getDomain()) != strtolower($base_uri->getDomain())) {
  return $matches[0];
}
319–322

I don't think we can do that... my concern was that the following would happen:

  1. http://phabricator.local/?id=T1 would be picked up by markupObjectHyperlinkReference() and then rejected because the path of the URI doesn't match an object.
  2. By returning $matches[0], the text would then be passed to markupObjectReference... transforming it into http://phabricator.local?id= and T1.
  3. PhutilRemarkupHyperlinkRule would then turn http://phabricator.local?id= into a hyperlink.
  4. You end up with "http://phabricator.local?id= T1" (without the whitespace).

The intention was to short-circuit this straight to PhutilRemarkupHyperlinkRule.

Oh -- you're trying to avoid the (?<!=|:#/()<>) negative lookbehind?

I'm not sure exactly what's going wrong with the proxying, but it won't work in the general case because the Engine never learns about the rule (you tell the rule about the engine, but not the engine about the rule). Later in execution, the Engine iterates over all the rules it is running and calls methods on them (like didMarkupText()). Although I don't immediately see exactly what's going wrong (e.g., Hyperlinks do not depend on having didMarkupText() run, I don't think) there are a class of cases like this where something does happen in those methods, and where proxying won't work.

Maybe it's easier to markup object embeds first, then do references and URIs in a single pass? Basically run the rule:

'/(match-uri)|(match-reference)/'

...instead of running the rules separately. Then figure out which one you matched in the callback. If it's a URI, do what you do now and return $matches[0] if you fail. If it's a reference, it skips all the URI stuff and just goes down the normal path. That should make sure text which matched the URI pattern doesn't have a chance to get examined by the ref pattern.

Or a broader negative lookbehind like (?<!\S{3,}://.*) might work too.

@joshuaspence I'm wondering if you had any plans to do work on this since a number of our users are using full URLs :/

epriestley edited reviewers, added: joshuaspence; removed: epriestley.

Let me see if I can get this over the edge of the precipice...

Let me see if I can get this over the edge of the precipice...

I think I have a "working" change for this:

https://secure.phabricator.com/differential/diff/42106/

...but I'm not really satisfied with the approach. Particularly, it won't generalize to objects where the URI pattern and the monogram pattern differ. It would be nice for any approach here to have the potential to annotate any URI, even if it's something like /diffusion/browse/path/to/file.txt.

I'm currently thinking this needs to be three separate passes (the two existing passes, plus one new pass), with the first pass decoupling URI annotation from monogram matching:

  • One pass to extract {self-uri} and self-uri, which needs to be able to delegate most of its rendering into PhabricatorObjectRemarkupRule (this is the tricky part, I guess).
  • One pass to extract normal URIs (essentially unchanged from current code).
  • One pass to extract {monogram} and monogram (essentially unchanged from current code).

D20511 + D20512 appear to give us a way forward here.