Page MenuHomePhabricator

Repository content search with `hg grep` returns matches found in old versions of files
Open, WishlistPublic

Description

Steps to reproduce

  1. Create a new empty hosted Mercurial repository
  2. Locally, create a text file Text.txt containing the text Roses are red.
  3. Commit this file.
  4. Update the text to Violets are blue..
  5. Commit this file.
  6. Push both commits to the Phabricator hosted repository.
  7. Go into Diffusion, select Browse repository and then Show search.
  8. Look for the pattern Roses and hit Grep file content.
  9. Search result brings up a match Roses are red found inside file Text.txt.

This is odd to me. Indeed, this text file does not contain anymore the word Roses. It should not come up as a match. It is also very confusing because when you click on the link in the search result list, you are directed to the latest version of the file which does not contain the pattern at the highlighted line. I would suggest only looking for matches in the current (latest) version of all files. Or give the user the option to do this. Or indicate in the search results the correct version of the file where the hits were found.

Versions
phabricator ee92a3f25a4172003b6768879219a88de9f03873 (Sat, Apr 23)
arcanist 789aff85dbf96248c903376c7d6704ada31f294b (Sat, Apr 23)
phutil 1ea8d2ad6daa9fd64298db8cebfd1db0b9a1e678 (Sat, Apr 23)
libcore 33b3d50c2ee6dcdb773c308f6d9a1be50f4ec9ce (Tue, Apr 26)
services 9dfc0d95fef805a642a0dbaea82928fd43450455 (Mon, Apr 25)

Event Timeline

epriestley added a subscriber: epriestley.

(This isn't related to the "Search" application, even though it's about searching for things.)

This appears to date back to the introduction of the feature in D5738, where I suggested we use ancestors() without a legitimate reason (or maybe very old Mercurial had weird behavior).

Actually, this is less crazy than I thought.

hg grep --rev <commit> -- <pattern> does not search the repository state at <commit> like git grep <commit> -- <pattern> does. It only searches the files changed by <commit>.

No invocation of hg grep appears to have similar behavior to git grep, or do what users are likely to expect when using "search repository". The current behavior is probably (?) the closest approximation? At least, there doesn't appear to be any command we could reasonably substitute here to get better behavior.

This may ultimately be a special case of T7472.

See also this Mercurial internals page:

The grep command has a troubled history. One of the most common thing people want to do is grep the current directory, but only files under hg control. Currently, this is inconvenient and at best requires doing something like hg files -0 | xargs -0 grep in order to accomplish this. At the same time, almost nobody understands what plain hg grep does and how it differs from hg grep --all. The documentation for the command does not explain this, except vague allusions about grepping history.

...

The current plain hg grep is awkward and nonsensical.

It's not clear what has actually become of these changes since my local hg 4.4.1 still seems to have the old behavior.

epriestley renamed this task from Grep file content returns matches found in old versions of files to Repository content search with `hg grep` returns matches found in old versions of files.Jan 4 2018, 7:40 PM
epriestley triaged this task as Wishlist priority.
epriestley moved this task from Backlog to Far Future on the Mercurial board.
epriestley removed a project: Bug Report.