Page MenuHomePhabricator

Offer to remove untracked files, so they don't crop up again
Needs ReviewPublic

Authored by alexmv on Jan 4 2018, 9:49 PM.


Group Reviewers
Blessed Reviewers

Often, untracked files will not be salient to users until the
time comes to arc diff, upon which they will cause a prompt. Often
the untracked files are not precious, and can safely be removed -- but
the user needs to remember to do so after the arc diff completes,
which may be after crafting a long commit message, or dealing with
lint prompts. This makes it likely they will entirely forget to clean
up the untracked files, and will be subjected to the same prompt upon
the next arc diff.

Resolve this vicious circle by allowing the user to remove the
untracked files at the self-same prompt. This makes use of
Filesystem::remove rather than using git clean or equivalent because
it guarantees that the files that were shown to the user as
"untracked" is the precise set of files that are removed; it is not
subject to race conditions if something is altering the repository
simultaneously. Additionally, while git clean exists, Mercurial's
hg purge must be enabled as an extension (though this can be done on
the command line), and Subversion has no built-in command to
accomplish this.

Use the '!' key for this behaviour, as that is unlikely to be typed by
accident, as it requires holding down on most keyboards.

This switches the terminology from "untracked files" to "untracked
paths", as git displays untracked directories as merely the directory
name, not all of its contents.

Test Plan

arc diff with untracked files on Git and Mercurial;
observed that shown untracked files were removed with '!' but not 'y',
and 'n' or Enter exited as expected.

Diff Detail

rARC Arcanist
up-untracked-cleanup (branched from master)
Lint OK
Unit Tests OK
Build Status
Buildable 19016
Build 25653: Run Core Tests
Build 25652: arc lint + arc unit

Event Timeline

alexmv created this revision.Jan 4 2018, 9:49 PM
alexmv requested review of this revision.Jan 4 2018, 9:50 PM

What's the specific kind of file driving this?

My general understanding is that there are three major use cases for untracked files:

  • User innocently forgot to get rid of some scratch file like tmp.log~. This change makes sense in this case, but this can't be very common?
  • Toolchain creates a lot of build.tmp.artifact.123 files that should be ignored but aren't. This change makes sense in this case, but I think it's the wrong fix: the user should ignore these files/patterns instead.
  • User intentionally ("maliciously") creates a lot of important-script.pyfiles that they dearly treasure, while cursing this prompt every day. This change doesn't help with this and if one of these users ever hits "!" by accident they will immediately go spew hatred at us on Twitter.

I think pressing ! accidentally is difficult -- it was chosen that way. We could bulletproof further via making it require typing I swear under penalty of perjury that I absolve arcanist from any and all fault when these files are removed at the prompt, but that might be a bit much.

In my experience, the first option is the most common one. But I have only my own gut to go by, here. This does open up the possibility of an a response which would open the .git/info/exclude for immediate editing, but _that_ one seems to be the more far-fetched option in my experience.

For the (2) case I've sort of imagined possibly building an arc ignore ..., since copying paths into .git/info/exclude does seem like it's the major barrier. The UI could then say "use arc ignore <path> or arc ignore --everything --forever" or whatever, and maybe that'd be a lower-enough barrier to get users over it. I'm a little bit sympathetic to this case since I don't remember .git/info/exclude myself offhand.

For the (1) case, maybe we could just remember that you don't care about these files, rather than actually deleting them? Specifically:

  • When you "y" to ignore files, append them to $workflow->readScratchFile('ignore').
  • When generating this prompt, print paths you've previously ignored separately, as "you've previously ignored these untracked files (in .git/arc/ignore): X, Y, Z". If all files are previously-ignored, don't stop.
  • No mechanism to un-ignore files other than editing the file. (In the future, maybe we could just remember them for 24 hours or until you change branches or something.)

This has the downside of catering to group (3) but doesn't seem thaaaat bad / spooky / magical? Maybe?

The code currently shows:

Untracked changes in working copy:
(To ignore this change, add it to ".git/info/exclude".)
  foo remembering the path to edit isn't hard, as it's right there. The problem is more the immediacy -- it can be several minutes until one is back at a prompt to be able to run the hypothetical arc ignore, at which point we have the same problem we have now.

I'm not a fan of the $workflow->readScratchFile('ignore') "arcignore" file because it's effectively hidden state, which feels like it has only a cumbersome way of examining and changing it. Only having it last for 24h makes it unreliable hidden state, which is the worst because it undermines people's confidence in the tooling as a reliable black box.

I guess I'm not convinced that another type of ignore is actually a feature; if you're wary of this, I'd rather raise the bar for (3), than introduce a new mechanism that makes (1) or (2) harder to reason about.

I'm really, really hesitant to have arc destroy user data, even if they sign a pledge in blood that that's what they intend (e.g., see T8879 for a case of user regret over a bright red, clearly labeled, nondestructive prompt). bin/remove destroy prints out a full-paged gigantic red ASCII art skull and only administrators can run it, and we still see occasional confusion about it.

I agree that hidden state isn't great, but since we'd still print out the paths in arc diff (and all the git commands would still respect it) it doesn't feel exceptionally magical to me.

Maybe I'm still just failing to understand the cost here? Doesn't this prompt come up a second or two after running arc diff and just require the user to type "y"? Why is this onerous enough to change? (Does it take like 25 seconds to come up in your environment because doing all the stat calls on a monorepo takes forever?)

A possible reduction in the magic is to give the ignore file the same lifetime that the temporary commit message gets -- basically, if you ^C out of arc diff your choice is remembered, but if arc diff runs to completion it is forgotten.

Or put another way, if you arc diff, hit this, ^C, go fix it, and arc diff again, shouldn't that cost you like 2 seconds vs using this ! option? Or is the stat cost huge and it's more like 30 seconds?

(I don't really like the magic ignore either, but I worry that no level of safeguards can ever make "delete all your files" actually safe, and moving all your files to .git/arc/trash-can is even dumber than .git/arc/ignore.)

This pain may be mostly derived from prior to D18842 and git performance improvements. At that time, delay until the prompt was ~4s, plus the overhead of noticing, pressing ^C, remembering command-line arguments to git clean, and the startup overhead of the arc diff startup again:

$ time arc diff --preview < /dev/null
You have untracked files in this working copy.

  Working copy: /Users/alexmv/src/server/

  Untracked changes in working copy:
  (To ignore these changes, add them to ".git/info/exclude".)

    Ignore these untracked files and continue? [y/N]  Exception
The program is attempting to read user input, but stdin is being piped from some other source (not a TTY).
(Run with `--trace` for a full exception trace.)

real	0m3.910s
user	0m1.320s
sys	0m7.549s

..which comes up to probably 30s, sure. The perf improvements drop the startup overhead to get to the prompt to a more reasonable ~1s.

I'm perfectly willing to accept that this is over your threshold for possibility for accidental user data loss. I'm personally not 100% convinced that this is the right route for general users, to be clear -- it came about because it was a feature I want to have, not necessarily one that I want everyone to have.

Two other ideas for reducing magic here:

  • Maybe put the path and content of files on the "magic" ignore list, so if the content changes they're not on the ignore list anymore. Not sure what kind of error this would really prevent, though.
  • Clear the ignore list the next time we reach this prompt. So if you arc diff, delete files, then arc diff again, your ignore list gets reset -- it's basically saving "the set of things, if anything, that you ignored the last time you went through the ignore prompt".

It's still magic but maybe not as magic as a weird hidden persistent ignore list?

I think "permanently delete a lot of files with one click" is too scary for me to be comfortable with in the upstream -- it's totally reasonable for experts, and is the action I essentially always take personally, but I have total faith that arc users are industrious enough to use this to destroy some very important data they can not recover.