Page MenuHomePhabricator

Diffusion show folders first
Open, LowPublic

Description

It would be nice if there were an option to show folders first when browsing code in diffusion. Ideally folders sorted by name followed by files sorted by name.

Event Timeline

Dianoga raised the priority of this task from to Needs Triage.
Dianoga updated the task description. (Show Details)
Dianoga added a project: Diffusion.
Dianoga added a subscriber: Dianoga.
epriestley added a subscriber: epriestley.

This seems reasonable, but bumps into a bunch of design/preference stuff and probably isn't valuable enough on its own to motivate solving those issues in the short term.

I've used a mac for ~6 years now and still get annoyed that Finder won't list folders above files sorted by name. For me the folders work as a navigation control instead of content which is why I usually like them separate. My preference said, it doesn't really affect me in Diffusion.

I'd probably pass on the preference, I think folders should just go first. My main observation here is if you instrumented clicks of folders vs. clicks of files, folders are probably chosen 2-1, my expectation here is that people have to dig down first, then open the file they were looking for.

@epriestley Do you care if I just flip this. The hypothesis being folders are clicked more than files, and having them first makes more sense for me, and in our case, brings them above the page fold currently for rP.

I don't care too much.

Just make sure it works properly in directories with more than 100 files in SVN, Mercurial and Git.

(That is, more than 100 non-directory files.)

Or, specifically, a list of files like this:

a001.txt
a002.txt
...
a100.txt
...
a482.txt
some_directory/
some_other_directory/
...

...such that directories exist but you must read more than 100 paths (one page) to find them. A naive approach (where we just sort the existing list by "is it a directory?") won't handle this case properly.

You should also make sure that performance for a directory with a large number of files remains reasonable. GitHub is broken for me right now and unicorning, but the "poems" repository on my account has a directory with ~16,000 files in it that loads in ~120ms on my machine.

Is poems the best example or do you have another suggestion? I can try the linux kernal.

Offhand, I don't know if the kernel has directories with a lot of files in it or not -- like, Phabricator itself probably has only 1-2 directories with more than 100 files (resources/sql/autopatches/) and might have none with 100+ files and directories, so you could implement this wrong (pick the first 100 paths, then sort them by "directoryness") without there being any counterexamples in Phabricator. It's possible the kernel is the same.

There's nothing special about the poems repository, and any repository which can hit both the "100+ files, then some directories" and the "10,000+ files in one directory" cases is fine.

(Poems doesn't have an example of the first kind of directory, and I don't know of any offhand, but you could build one locally in a couple minutes I think.)

I'd offer assistance with testing mercurial on a large repository - but the largest repository I have doesn't have more than ~50 items per directory (that I'm aware of~), so I don't think it would hit the stress points you're looking for.

Yeah I just realized this isn't a front end change, I got to trace it down to the source.

Iโ€™ve been catfished.

yes I believe you have been coaxed into a clever trap -- and your predicament is of your own making!!!

bwahaha

The reason this isn't trivial is that we ask git, hg, or svn for a list of files, for example with git ls-tree.

git ls-tree shows results in alphabetical order and has no --directories-first flag. Likewise, there is no hg locate --order=directories or hg files --with-the-directories-first. We cache SVN results so we probably can trivially order that by directory, but there's no svn ls --xml --files-on-the-bottom either.

In some cases, we currently use LinesOfALargeExecFuture and other stream-oriented tricks to evaluate only part of the result from these commands: so even if a directory contains 16,000 files, we only need to examine the first 100 to show the first page, since the user is only looking at those. After this change, we'll need to examine all 16,000, then reorder them (because it could be 15,900 files, then 100 directories), then return the first 100 results after reordering. This may require additional cleverness to make it perform as well as the "just ignore 99.9% of the result set" strategy we can currently use.

I also suspect that GitHub compromises here for performance. If you look at a very large directory on GitHub, it shows only the first 1,000 files fairly slowly (page takes 1-2 seconds for me) with no pager, and a warning that the directory is too large:

https://github.com/epriestley/poems/tree/master/unreasonably_huge_directory

If you look at that directory in Phabricator, we show the first 100 files in ~120ms with a pager that lets you examine the entire result set.

I would guess that GitHub may have degraded out of the "directories first" behavior; we could prove this by looking at a directory with "a000.txt" through "a999.txt" on GitHub and seeing if "b/" is listed at the top or not.

That is, the algorithm may be:

  • List the first 1,001 files.
  • If there are 1,001 files, return the first 1,000 with an error.
  • If there are fewer than 1,001 files, sort directories to the top and return the results.

Offhand, the severity of this degradation seems needless to me.

(That said, GitHub does show an accurate count so who knows. Maybe the page is slow because they aren't doing anything stream-oriented. In this case, a directory with 3M files probably just crashes.)