Page MenuHomePhabricator

Improve whitespace expansion of tab literals
Open, WishlistPublic

Description

We are using Phabricator for our project OpendTect (http://opendtect.org). We would like our reviewers to be able to review the code indentation as well. The problem is, the 'diff' content in Phabricator web interface shows either no indentation or a wrong one. How can we make Phabricator show indentation exactly as in the diff generated by SVN?

Event Timeline

raman.rathod added a subscriber: raman.rathod.

Do you use tabs to indent code? We currently convert tabs directly into spaces and do not respect tabstops. This isn't correct, but actually hasn't come up until now.

Assuming you do use tabs, does your source code have comments defining the tab width, like vim local variable blocks?

/*
 * Local variables:
 * tab-width: 4
 * c-basic-offset: 4
 * indent-tabs-mode: t
 * End:
 */

Or do you just have some project convention on tab width (4 spaces / 8 spaces / whatever)?

btrahan raised the priority of this task from Normal to Needs Triage.Feb 6 2013, 9:51 PM
btrahan added projects: Diffusion, Differential.
epriestley triaged this task as Wishlist priority.Feb 15 2013, 4:09 PM

This is a legitimate issue (we handle tabs in a very clumsy way right now) but it's tricky because there's no natural way to figure out where the tabstops are in a file. Some strategies we can use are:

  • Comments in the file: some editors (like vim) put formatted comments in files, which say "this file has tabstops at width X".
  • Configuration:
    • .editorconfig files: these aren't widely established but strike me as a generally good solution to the problem.
    • Some kind of user configuration? This gets tricky with "--raw" diffs and such. This is a mess in general since we have a lot of display contexts where it's expensive and/or difficult to look up configuration.
  • Heuristics: we could examine the file content and pick a tabstop size which seems correct. This is probably plausible but I'm not aware of anyone implementing it. It's also complicated.

I'd like to wait to implement this until we have an install complaining more actively about it, so we can make sure we're solving real problems for users.

Wouldn't assuming tabstops be alright? Isn't that the argument for using tabs rather than spaces?

I think there are two cases here -- suppose we have some code like this:

......int,,,x;
......float,y;

Where . and , represent some mixture of tabs and spaces. . is "indenting` and , is "aligning". The "correct" way to use tabs is to indent with tabs and align with spaces, so writing this code "correctly" means (assuming 2-space tab):

T T T intSSSx;
T T T floatSy;

If code is written like this, any tabstop width will work fine, but it will also already work fine because picking tabstop width 2 is exactly identical to replacing all tabs with two spaces.

A second possibility is that code is aligned with spaces, but indented inconsistently, or aligned and indented inconsistently:

SSSTT intSSSx;
T SSSSfloatSx;
STSST intST z;
SSSST floatTw;

In this case, we must pick the same tabstop positions the original code had.

So code which actually uses tabs correctly should already work. Code which mixes tabs and spaces won't work if we pick the wrong tabstops / tab widths.

I'd guess that only a teensy tiny fraction of codebases use tabs to indent and spaces to align consistently, since stuff like this is very hard to get right:

T T function long_function_name_with_wrapping_parameters(int x,
T T SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSint y) {

That's also difficult to detect automatically, so enforcing it with tools seems difficult too.

I see, thanks for clearing that up.

Sorry, I was away for a while.

We use a tabsize of 8 (vim default) which explains why our code looks messy on Phabricator. Besides, our developers on different platforms use different methods of indentation which means it can be any combination of tabs and spaces. But they all follow the same indentation style and a tabsize of 8.

In my opinion, we need to be able to define the tabsize somewhere, possibly in the Arcanist configuration file so that Arcanist can replace the tabs with appropriate number of spaces while creating a revision.

I think we want to do the replacement only when displaying the revision, not when creating it. Otherwise, arc diff -> arc patch would give you a change back which didn't match the original change (all your tabs would be gone, and replaced with spaces).

The problem is that if we make this a global setting, someone will show up in a month saying "we have a frontend team who uses 4-space tabs to write Ruby, and a backend team who uses 8-space tabs to write COBOL". So we go undo all the global work and make it per-repository.

Then someone shows up and says "we use 2-space tabs in Javascript and 4-space tabs in C, but they're in the same repository". So we go undo all the repository work and make it some highly-configurable mess of rules based on repo and language.

Then someone shows up...

And we can't really do either if someone pastes a diff in, because we don't know which repository it's from.

I'd like to try to cover all the cases with reasonable flexibility, but I also want to avoid writing more configuration DSLs. I asked about the Vim comments because that one's clear-cut: if files have internal annotations about tab size, we should clearly always (or almost always) respect them. But it sounds like you don't use them, so we have to do something more general, while trying to anticipate whatever the next project that runs into this is going to need.

@epriestley

While you work on a universal solution to this problem, could you please point me to the relevant part of the Phabricator code so that I can think of putting a local patch that works for my project?

Well, the adaptation was easy in our case. We simply commented the line https://secure.phabricator.com/diffusion/P/browse/master/src/applications/differential/render/DifferentialChangesetRenderer.php;dbcd38aa881b090c$253 on our Phabricator server which means there is no replacement of tabs by spaces anymore. And it works! We are using a tab-size of 8 and the indentation looks perfect in the Phabricator browser now.

To summarize the state of affairs:

  • We replace tab literals with two spaces when rendering source, ignoring tabstops.
    • This behavior is not correct. It works this way because Facebook uses two-space indents, and thus was a simple rule which produced reasonable results when I wrote it some time in 2007.
  • Code using tabs and spaces "correctly" (tabs to indent, spaces to align) will display correctly (that is, with alignment preserved) with this rule, or with any other rule, and can be displayed correctly at an arbitrary tabstop width. I suspect no such code exists in real life.
  • Code using tabs and spaces "incorrectly" (this is probably all real code) will display incorrectly (that is, with alignment distorted) with any expansion rule other than the original tabstop width that it was written in.
  • If we replace tab literals with four or eight spaces, or leave them as literal tabs (basically, leave the tab width up to the browser), we aren't fixing the root problem. Code which was written with tabs at some other width and which mixes tabs and spaces when aligning or indenting will still display incorrectly.
  • The ideal behavior is to replace tabs with spaces, respecting tabstops, at the original width the file was written with.
    • This is not technically complicated. However, figuring out the original tab width is. Some possible heuristics are discussed earlier in this task, but we'd like to drive any choice here with more real use cases rather than just kind of guessing what might be useful. If "the entire install just has a configurable width" is a rule which actually solves the issue in practice, we can do that and stop there.

We'd like tagstops be respected if at all possible. Short of that, we'll take a configurable setting that allow us to replace a tab with a number of spaces. Thanks.

@staticshock reports that Chromium on Ubuntu doesn't display tabs following a zero-width space, e.g. in Diffusion/Paste (see F76999). Specifically, a construction like this causes the issue, apparently:

<td>(zero width space)(tab)...</td>

Moving to universal tab-to-space conversion seems like the easiest way to resolve this issue.

epriestley renamed this task from Indentation in differential revisions to Improve whitespace expansion of tab literals.Dec 16 2013, 4:04 PM

This is an important issue for us in FreeBSD - our standard is 8-space tabs, and we don't generally have source comments to set an editor's config. For us a global config "tabstops = 8" would probably cover 99% of our files. There are a few exceptions with source comments - e.g.

libexec/bootpd/hwaddr.c

/*                                                                              
 * Local Variables:                                                             
 * tab-width: 4                                                                 
 * c-indent-level: 4                                                            
 * c-argdecl-indent: 4                                                          
 * c-continued-statement-offset: 4                                              
 * c-continued-brace-offset: -4                                                 
 * c-label-offset: -4                                                           
 * c-brace-offset: 0                                                            
 * End:                                                                         
 */

contrib/libpcap/fad-gifc.c

/* -*- Mode: c; tab-width: 8; indent-tabs-mode: 1; c-basic-offset: 8; -*- */

(We probably have some other cases too, but should just fix them rather than inventing crazy heuristics.)

Okay, let's do this:

  • Look for well-known markers (vim/emacs/etc comments in files) in file content, as they're unambiguous. (We already look for them as syntax highlighting hints.)
  • If no markers are present, we'll fall back to a new, configurable global default tab width.
    • This will just be a number for now; we can migrate it forward to a map of regexps of whatever easily enough without breaking anything if that turns out to be necessary.
  • I'll add a "View Options > Set Tab Width..." option as part of T5179 to handle one-off cases and reduce our need to get everything right. This will let you reload a file, rendering it at a different tab width.
  • For all the in-between cases (per-repo, per-language, per-project, per-directory, custom content-based rules, etc.), we can deal with them as they come up. There are other classes of similar rules (like when to mark a file "generated") that suffer from the same problems, and we can eventually find some solution here which balances complexity and power appropriately.

@epriestley - Thanks, that should work perfectly for us

FWIW, most Go code likely encounters this issue since gofmt formats everything "correctly" (tabs to indent, spaces to align) with a tab width of 8. A single global setting would work perfectly for us (and anyone using Go).

chad changed the visibility from "All Users" to "Public (No Login Required)".Jul 3 2015, 4:38 AM

Okay, let's do this:

  • I'll add a "View Options > Set Tab Width..." option as part of T5179 to handle one-off cases and reduce our need to get everything right. This will let you reload a file, rendering it at a different tab width.

Did this ever happen? I see that T5179 is closed, but I don't see "View Options > Set Tab Width" in my setup (or on secure.phabricator.com) -- was this addressed in a different fashion, or is there still an open task somewhere that I can't find? Or is *this* T2495 the open task for it?

eadler added a project: Restricted Project.May 23 2016, 5:43 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.May 23 2016, 6:06 PM
eadler moved this task from Restricted Project Column to Restricted Project Column on the Restricted Project board.Jul 4 2016, 9:05 PM

Is this still actively being worked on, is there a different solution than View Options->Set Tab Width, or should I apply the patch referenced by raman.rathod?

See Planning for help understanding timelines and priorities.

Phabricator messes up tab spacing in patches. It should be very easy to replace tabs with spaces and require the use of only monospace fonts.

After D20194, expansion of tabs now uses the correct rule (expand tabs to the next tabstop) instead of the old rule (hard-coded replacement with a fixed number of spaces).

There is still no way to configure the tabstop width.

I get the sense that most of this discussion is around diffs, but we've been having issues with the default size of 8 showing in code blocks in Phriction causing more wrapping than we would like.

Wouldn't it be appropriate, and very trivial, to have use a config option for Phrication code blocks to ultimately set the tab-size of the remarkup-code class?