Page MenuHomePhabricator

Prevent users from generating an ambiguous commit message by editing revisions with the web UI
Closed, ResolvedPublic

Description

I was pretty sure there was a bug for this, but I can't find it after a quick search of Arcanist bugs.

With a commit message like

...

` ``
[
{
blah: "foobar",
summary: "Improve styling of standup email",
...
},
...
]
` ``

That is, a code block with a leading summary: line Phabricator parses this as a duplicate 'summary' field and errors out.
A similar issue occurs with any other field (reviewers, jira, etc.) even outside of code blocks (for example when discussing Jira issues in the summary).

Event Timeline

Very confused why I received an email on this task. It says I was CCd, but... I don't see where? How did you create this task?

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 6 2016, 4:07 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 6 2016, 4:15 PM
eadler renamed this task from arcanist incorrectly parsers fields inside fields to arcanist incorrectly parses fields inside fields.Jun 14 2016, 10:13 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jun 16 2016, 6:28 PM

What's the behavior you'd expect or prefer when a user writes this commit message?

Fix parsing of fields
Summary: Fix parsing of the
Reviewers: field and empty
Subscribers:
Reviewers: alincoln
Test Plan: Checked how the
Reviewers: field parses and also the
Summary: field and also the
Subscribers: field
Subscribers: htaft

I'm not sure I have a good answer to that. :/

FWIW I think this usually occurs due to editing in the web UI where it is not obvious that the various fields will be concatenated and then parsed.
The ` case is less common.

From a technical point of view I think the ideal solution to this is to make the commit message format unambiguously parseable, but I'm not immediately sure how to do that without making other things worse. It's valuable for messages to be human-readable and human-editable, and I think we'd be worse overall if we tried to escape or encode messages in Git itself (or represent them as XML or whatever).

If this is mostly happening in the web UI, maybe we could just reject messages which we know will create parser problems? For example, if you click Edit Revision and type this:

Fix parsing of the
Reviewers: field

...and then click Save, we would just prevent the edit: "Your summary won't be unambiguously parseable because it contains "Reviewers:", which is a recognized commit message field header. Adjust the message so it will parse unambiguously."


The only other approach I can come up with offhand is trying to cheat somehow and make the message less likely to be ambiguous. For example, we could move to putting fields in the commit message in this form:

>Summary: blah blah

>Reviewers: abc, def

If >Summary: was present, we would ignore Summary:. This isn't as bad as making the messages XML or something, but it's kind of bad/confusing so I'm not sure if it's really better. I can't immediately come up with any kind of "less ambiguous" syntax that feels particularly natural:

Reviewers|:
Reviewers::
Reviewers>:
Reviewers/\/\(^__^)/\/\:
*Reviewers:
<Reviewers>:

In all cases there would still be a risk of ambiguity, but it would be reduced because the text would be less likely to occur naturally.


We could also actually escape text in messages, but this seems particularly bad since when you git show your JSON is now going to have backslashes in it that you didn't write or something. That is, if we rewrite into:

\summary:

...that seems dangerous/confusing/unreasonable. We could write it like this instead when we detect there will be a problem:

Summary:
----8<---- Begin Summary 23nf23n ----8<----
...
summary:
...
----8<---- End Summary 23nf23n ----8<----

But I don't see a way to do that which retains any semblance of human readability.

@eadler Did you have enough insight/recollection as to whether all our reported cases were due to users editing in the web UI?

@eadler Did you have enough insight/recollection as to whether all our reported cases were due to users editing in the web UI?

All of our reported cases were due to editing the web UI with the exception of the single case of referencing the commit format inside of triple quotes.

epriestley renamed this task from arcanist incorrectly parses fields inside fields to Prevent users from generating an ambiguous commit message by editing revisions with the web UI.Jul 2 2016, 7:12 PM
epriestley triaged this task as Normal priority.

We can fix the highly-tractable web UI part, at least.

eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:04 PM
epriestley claimed this task.

I think that will prevent these edits from occurring by accident via the web UI. Yell if you manage to trick it or it causes any weird stuff.

It can't currently point at exactly which line is causing the issue, but I think this parser will get a general improvement pass for T9713.

You're watching #twitter.

Unrelated to this ticket, but should #twitter be converted to "Restricted Project" in this comment? I had no idea that a #twitter project existed until I saw this comment.

I think we'd have to build another Remarkup rendering target to accomplish that (sending sanitized text-only emails, for example). I also believe we do no policy checks in Remarkup (which is cached).