It would be nice if there were an option to show folders first when browsing code in diffusion. Ideally folders sorted by name followed by files sorted by name.
Revisions and Commits
I've used a mac for ~6 years now and still get annoyed that Finder won't list folders above files sorted by name. For me the folders work as a navigation control instead of content which is why I usually like them separate. My preference said, it doesn't really affect me in Diffusion.
I'd probably pass on the preference, I think folders should just go first. My main observation here is if you instrumented clicks of folders vs. clicks of files, folders are probably chosen 2-1, my expectation here is that people have to dig down first, then open the file they were looking for.
Or, specifically, a list of files like this:
a001.txt a002.txt ... a100.txt ... a482.txt some_directory/ some_other_directory/ ...
...such that directories exist but you must read more than 100 paths (one page) to find them. A naive approach (where we just sort the existing list by "is it a directory?") won't handle this case properly.
You should also make sure that performance for a directory with a large number of files remains reasonable. GitHub is broken for me right now and unicorning, but the "poems" repository on my account has a directory with ~16,000 files in it that loads in ~120ms on my machine.
Offhand, I don't know if the kernel has directories with a lot of files in it or not -- like, Phabricator itself probably has only 1-2 directories with more than 100 files (resources/sql/autopatches/) and might have none with 100+ files and directories, so you could implement this wrong (pick the first 100 paths, then sort them by "directoryness") without there being any counterexamples in Phabricator. It's possible the kernel is the same.
There's nothing special about the poems repository, and any repository which can hit both the "100+ files, then some directories" and the "10,000+ files in one directory" cases is fine.
I'd offer assistance with testing mercurial on a large repository - but the largest repository I have doesn't have more than ~50 items per directory (that I'm aware of~), so I don't think it would hit the stress points you're looking for.
The reason this isn't trivial is that we ask git, hg, or svn for a list of files, for example with git ls-tree.
git ls-tree shows results in alphabetical order and has no --directories-first flag. Likewise, there is no hg locate --order=directories or hg files --with-the-directories-first. We cache SVN results so we probably can trivially order that by directory, but there's no svn ls --xml --files-on-the-bottom either.
In some cases, we currently use LinesOfALargeExecFuture and other stream-oriented tricks to evaluate only part of the result from these commands: so even if a directory contains 16,000 files, we only need to examine the first 100 to show the first page, since the user is only looking at those. After this change, we'll need to examine all 16,000, then reorder them (because it could be 15,900 files, then 100 directories), then return the first 100 results after reordering. This may require additional cleverness to make it perform as well as the "just ignore 99.9% of the result set" strategy we can currently use.
I also suspect that GitHub compromises here for performance. If you look at a very large directory on GitHub, it shows only the first 1,000 files fairly slowly (page takes 1-2 seconds for me) with no pager, and a warning that the directory is too large:
If you look at that directory in Phabricator, we show the first 100 files in ~120ms with a pager that lets you examine the entire result set.
I would guess that GitHub may have degraded out of the "directories first" behavior; we could prove this by looking at a directory with "a000.txt" through "a999.txt" on GitHub and seeing if "b/" is listed at the top or not.
That is, the algorithm may be:
- List the first 1,001 files.
- If there are 1,001 files, return the first 1,000 with an error.
- If there are fewer than 1,001 files, sort directories to the top and return the results.
Offhand, the severity of this degradation seems needless to me.
(That said, GitHub does show an accurate count so who knows. Maybe the page is slow because they aren't doing anything stream-oriented. In this case, a directory with 3M files probably just crashes.)