Page MenuHomePhabricator

Digest changeset anchors into purely alphanumeric strings
ClosedPublic

Authored by epriestley on Jan 22 2018, 4:22 PM.
Tags
None
Referenced Files
F13819169: D18909.id45360.diff
Fri, Sep 20, 12:30 PM
F13818853: D18909.diff
Fri, Sep 20, 10:34 AM
Unknown Object (File)
Fri, Sep 13, 12:17 AM
Unknown Object (File)
Thu, Sep 12, 1:03 AM
Unknown Object (File)
Wed, Sep 11, 9:43 PM
Unknown Object (File)
Wed, Sep 11, 9:42 PM
Unknown Object (File)
Wed, Sep 11, 9:42 PM
Unknown Object (File)
Wed, Sep 11, 9:41 PM
Subscribers
None

Details

Summary

Ref T13045. See that task for discussion.

This replaces digestForIndex() with a "clever" algorithm in digestForAnchor(). The new digest is the same as digestForIndex() except when the original output was "." or "_". In those cases, a replacement character is selected based on entropy accumulated by the digest function as it iterates through the string.

Test Plan

Added unit tests.

Diff Detail

Repository
rP Phabricator
Branch
anchor1
Lint
Lint Passed
Unit
Tests Passed
Build Status
Buildable 19114
Build 25807: Run Core Tests
Build 25806: arc lint + arc unit

Event Timeline

(See T13045 for some additional discussion about tradeoffs; I don't have a strong opinion on the complexity vs compatibility tradeoff.)

Looks good except for nitpicks.

src/infrastructure/util/PhabricatorHash.php
94–95

We've lost an entire 1/2 of a bit of entropy!!
log_2(62) * 12 = 71.45

126

"non-alphanumeric"

129–131

For consistency, use just "digest" or "hash".

This revision is now accepted and ready to land.Jan 23 2018, 8:16 PM
src/infrastructure/util/PhabricatorHash.php
94–95

Haha, yeah. I probably should have just used this approach with digestForIndex() in the first place since the fraction of a bit we get by including . and _ isn't worth even this diff, much less whatever other weird stuff we might hit in the future. Alas, I think that ship has sailed. Almost all other cases where we use digestForIndex() aren't likely to ever cause issues, though, at least.

On the other hand, we'd really start losing bits if we tried to deal with "0" vs "O" vs "o", "1" vs "I" vs "l" vs "i", hashes containing super offensive words like "butt", case sensitivity, and so on.

  • "non-alphanumeric"
  • Use "digest" instead of a mix of "digest" and "hash".
  • Clarify "index hashes", which sound like a real thing, into "digests created with digestForIndex()".
This revision was automatically updated to reflect the committed changes.