Page MenuHomePhabricator

Support "field present" and "field absent" operators in Ferret
Closed, ResolvedPublic

Description

See PHI1693, which would like to query for "field is present" and "field is absent". This is probably reasonable to do.

Event Timeline

epriestley created this task.

Query parsing of certain unusual or ambiguous inputs has changed slightly.

Empty Tokens

Previously, certain tokens like "" and title:="" were allowed by the parser but discarded by query execution. This is arguably correct for "" but clearly not correct for title:="" (which found any document, not only documents with the empty string as a title).

Tokens making an assertion about field content must now provide a nonempty value. These empty tokens are now query compiler syntax errors.

Present and Absent Operators

Ferret now supports new operators:

  • field:- asserts the field is absent. For example, searching Maniphest for body:- finds tasks with no body text.
  • field:~ asserts the field is present. For example, searching Maniphest for body:~ finds tasks with any body text.

The choice of syntax is somewhat arbitrary and somewhat because it was easy to implement. There's a vague technical narrative that these sort of make sense as "does not contain the empty substring" and "contains the empty substring". However, this is a bit of a conceit because field:-"" clearly intends "does not contain the empty substring" but is now a syntax error, not an alternate syntax for the field absence operator.

It is also a syntax error to search for results where a field is both absent and matches some other value. For example, title:- title:dog ("title is absent and title contains dog") can never match any document and is now a query syntax error.

Function Stickiness

If you search for title:big dog, the field qualifier is "sticky" and applies to both terms. This query is equivalent to title:big title:dog. The rationale is that users typing this probably don't mean "title contains 'big' and any field contains 'dog'", since this kind of query is uncommon to issue informally.

Fields are no longer "sticky" if the first argument is quoted, so title:"big" dog is equivalent to title:big all:dog.

Note that quotes in a casual argument list won't disrupt stickiness, so title:big "red" dog is equivalent to title:big title:red title:dog.

Fields specified with "present" and "absent" operators are not sticky, so body:~ dog is equivalent to body:~ all:dog.

(All this behavior is heuristic and aimed at trying to guess user intent with casual/ambiguous queries. If you want the query to do something precise, specify it precisely.)

Space After Operators

Previously, the query compiler allowed space after operators, so dog - cat meant dog -cat and title:- cat meant title:-cat. Since title:- is now a field absence operator title:- cat is ambiguous under the old rule; thus, operators may no longer be followed by spaces (and this is a syntax error).

The compiler previously allowed space inside operators as well, although this had no effect in practice since all operators supported today are one character long. This is no longer supported.

This appears to be working properly, now.