My external linter reports offsets taking into account UTF8 characters. That is, if there's a UTF8 byte order mark at the start of the file, the first real text character is reported as offset 1, line 1, character 1, while internally inside Arcanist this is actually offset 3, line 1, character 3. I initially tried just offsetting all reported lint errors by 3 if a BOM is detected, but this doesn't handle the scenario where there are other UTF8 characters in the source file.
The Arcanist linter API needs to offer methods that count offsets based on UTF8 characters instead of strict ASCII characters.