Page MenuHomePhabricator

Digest changeset anchors into purely alphanumeric strings
ClosedPublic

Authored by epriestley on Jan 22 2018, 4:22 PM.
Tags
None
Referenced Files
Unknown Object (File)
Sun, Dec 29, 1:08 PM
Unknown Object (File)
Sat, Dec 28, 5:34 PM
Unknown Object (File)
Fri, Dec 27, 11:17 PM
Unknown Object (File)
Fri, Dec 27, 8:38 PM
Unknown Object (File)
Thu, Dec 26, 6:04 PM
Unknown Object (File)
Sat, Dec 21, 11:05 PM
Unknown Object (File)
Sat, Dec 21, 11:05 PM
Unknown Object (File)
Sat, Dec 21, 11:05 PM
Subscribers
None

Details

Summary

Ref T13045. See that task for discussion.

This replaces digestForIndex() with a "clever" algorithm in digestForAnchor(). The new digest is the same as digestForIndex() except when the original output was "." or "_". In those cases, a replacement character is selected based on entropy accumulated by the digest function as it iterates through the string.

Test Plan

Added unit tests.

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

(See T13045 for some additional discussion about tradeoffs; I don't have a strong opinion on the complexity vs compatibility tradeoff.)

Looks good except for nitpicks.

src/infrastructure/util/PhabricatorHash.php
76–77

We've lost an entire 1/2 of a bit of entropy!!
log_2(62) * 12 = 71.45

108

"non-alphanumeric"

111–113

For consistency, use just "digest" or "hash".

This revision is now accepted and ready to land.Jan 23 2018, 8:16 PM
src/infrastructure/util/PhabricatorHash.php
76–77

Haha, yeah. I probably should have just used this approach with digestForIndex() in the first place since the fraction of a bit we get by including . and _ isn't worth even this diff, much less whatever other weird stuff we might hit in the future. Alas, I think that ship has sailed. Almost all other cases where we use digestForIndex() aren't likely to ever cause issues, though, at least.

On the other hand, we'd really start losing bits if we tried to deal with "0" vs "O" vs "o", "1" vs "I" vs "l" vs "i", hashes containing super offensive words like "butt", case sensitivity, and so on.

  • "non-alphanumeric"
  • Use "digest" instead of a mix of "digest" and "hash".
  • Clarify "index hashes", which sound like a real thing, into "digests created with digestForIndex()".
This revision was automatically updated to reflect the committed changes.