Page MenuHomePhabricator

Differential file anchors may contain characters which prevent remarkup linking properly
Closed, ResolvedPublic

Description

For example the file anchor on this revision:
https://secure.phabricator.com/D18465#change-cv4o_.vkNhKk

When using the remarkup to link to it, the period prevents the full anchor from being used.
D18465#change-cv4o_.vkNhKk

The link will still navigate to the revision, it just won't jump to the linked file.

See also T12970#234322

Event Timeline

This can be fixed by adjusting the regexp in PhabricatorObjectRemarkupRule->getObjectReferencePattern() to accept ., but then we'll match #xxx. as an anchor. In most cases, a terminal period is probably a punctuation character, not an anchor, like: For details, see D123#abc.

I think a better solution is probably to stop generating anchors with digestForIndex() and generate them with a new digestForAnchor() instead. This raises a couple issues:

  • This will break existing anchors (depending on how much we change the algorithm), although that's probably not a huuuuge deal.
  • Is DifferentialChangeset->getAnchorName() the only place where we generate these? I thought there was one in Diffusion too, maybe, but it didn't end up linked to T12970.
  • What algorithm should we use?

Naively, we could replace . with -. However, this can generate ----, ____, _-_-, etc., as anchors.

We could limit the set of characters to 0-9A-Za-z, but doing this without changing existing anchors too much and without weird properties in the mapping seems a little tricky (e.g., we could replace . with a, but then a will appear twice as often as other characters). But maybe this is perfectly fine.

We could just put an "X" on the end and call it a day.

I'm inclined to try to be clever about hashing into the domain of 0-9A-Za-z.

This will break existing anchors (depending on how much we change the algorithm), although that's probably not a huuuuge deal.

I think these already broke once from D18465 and presumably it hasn't caused issues for anyone

I thought there was one in Diffusion too

I think I was remembering D18609, which is linked in D18465, just not T12970. That change made us use getAnchorName() in Diffusion too, so I believe D18909 is exhaustive in updating all relevant callsites for anchor generation.

This will break existing anchors which contained "." or "_", but preserve all other anchors. This sounds good, but I think the chance an anchor has no . or _ is (30/32)^12 = 0.460..., so this breaks 54% of anchors.

Arguably, all the cleverness in D18909 isn't worth keeping only 46% of anchors intact. We could just break them all and use a simpler and more general hash algorithm instead. (But we could always break them all later, too -- the cost of breaking them the first time didn't seem large, and it intuitively makes sense that users probably aren't saving these links in durable contexts very often.)

epriestley triaged this task as Normal priority.Jan 22 2018, 5:18 PM
epriestley claimed this task.

This is at least fixed in theory. To verify it, I waved my cursor over a revision's table of contents and didn't see any periods.