Page MenuHomePhabricator

Strip strange characters out of project hashtags, or support parsing strange hashtag characters in other contexts
Closed, ResolvedPublic

Description

I had noticed some weirdness when you create projects, it's pretty easy to make My: Awesome, (Project), resulting in a #my:_awesome,_(project) hashtag. This is strange, and seems harmless, until you try to do something like update a revision which has projects attached with weird hashtags, and arcanist bails on you:

user@machine:~/thing$ arc diff

Exception
Error parsing field "Projects": The objects you have listed include objects which do not exist (#perception:_odtac_(obstacle_detection, _tracking, _and_classification)).
(Run with `--trace` for a full exception trace.)

The comma is the real culprit in this instance, but I suspect the other strange characters may possess lurking demons as well.

Event Timeline

yelirekim updated the task description. (Show Details)
yelirekim added a subscriber: yelirekim.

Other known demons are spaces (T7305), colons (T7341), and sort-of-but-not-really hyphens (T9480).

I think all this stuff is pretty much case-by-case but fairly straightforward:

  • Commas should not be permitted in slugs;
  • colons should not be permitted slugs;
  • spaces should get normalized/redirected when constructed directly in URIs (this probably hits some other characters/cases too, might just be an Apache config issue);
  • hyphen case is probably really a subproject issue although we could add a rule to slam _-_ into - without incurring any real costs I think.

T7341 has some discussion of weird edge cases with periods and projects named #internet.com that are weird but I think probably fine as-is since I believe all the behavior is reasonable and intuitive, it just implies a result that isn't intuitive (namely: we sometimes generate hashtags which are not usable in comments).

chad renamed this task from Strip strange characters out of projectg hashtags, or support parsing strange hashtag characters in other contexts to Strip strange characters out of project hashtags, or support parsing strange hashtag characters in other contexts.Oct 12 2015, 6:37 PM
chad added a project: Projects.

At HEAD:

  • Hashtags should no longer generate with colons, commas, or other problematic characters.
  • /tag/blah/ URIs should be much better about reasonable behavior for unusual inputs.
  • Stuff with interword hyphens should generate very slightly nicer tags/slugs.

But note:

  • This does not fix anything historically, so you'll have to go clean up the hashtags for existing projects.
  • (If you need to swap the primary hashtag for a janky existing project, you may need to rename it and then rename it back to force an update.)

One followup in T9573 but I don't think anything exploded, otherwise.