Page MenuHomePhabricator

Fixing up invalid encoding confuses archanists utf8-check
Open, Needs TriagePublic

Description

Hello

In our repository i found a text file with some very strange encoding.
I fixed up the file and committed for review with "arc diff".

Now the tool warns me that it is not a valid utf8 file, like this:

Invalid Content Encoding (Non-UTF8)
This diff includes a file which is not valid UTF-8 (it has invalid byte
sequences). You can either stop this workflow and fix it, or continue. If you
continue, this file will be marked as binary.

so, according to "UTF-8 and Character Encoding" i ran the utf8.php script on the file,
but it reports "OK":

$ libphutil/scripts/utils/utf8.php ~/repo/doc/utf8.txt
OKAY  /Users/ml/repo/doc/utf8.txt

I suspect that the check is confused by the fact that the old version of the file (and the resulting diff)
will contain non-valid utf8 characters.

The problem now is that my file is marked as binary in order to send the diff to phabricator, and
it is not easy for reviewer to see my changes.

Event Timeline

martin.lindhe raised the priority of this task from to Needs Triage.
martin.lindhe updated the task description. (Show Details)
martin.lindhe added a project: Arcanist.
martin.lindhe added a subscriber: martin.lindhe.

We've seen exactly the same problem on one of our repositories.

Had to work around by circumventing arcanist.

clemvangelis added a subscriber: clemvangelis.

I can confirm this behavior. Is there any way to fix that ?

I am also facing same issue in a file. Can anyone suggest me the way to fix it?

At present, the only way to get around the problem is to ignore arcanist and push directly to git for the change that involves this problem.

Is Arcanist saving information about files somewhere? or is it just a problem with the utf8 validator?