Page MenuHomePhabricator

Make Phabricator compatible with ONLY_FULL_GROUP_BY
Open, WishlistPublic

Description

See PHI1413. MySQL has an sql_mode called ONLY_FULL_GROUP_BY which rejects queries like this:

SELECT veryLongName, COUNT(value) FROM table GROUP BY veryLongNameHash;

This query is rejected because it may be unpredictable. We know that the value in veryLongNameHash is dependent on the value in veryLongName, so GROUP BY veryLongNameHash produces an aggregation where every veryLongName column always has the same value -- at least, assuming we don't have any bugs and users didn't mess with the data manually. MySQL doesn't know that the columns have this relationship, so it's warning us that we might be selecting one value (veryLongName) from an aggregate with many different values.

Without ONLY_FULL_GROUP_BY, MySQL picks one value (at random? first value?). With ONLY_FULL_GROUP_BY, MySQL rejects the query with an error.

Previously, see T6243. D10856 added a setup check recommending installs disable ONLY_FULL_GROUP_BY. At the time, this option seemed to be rarely used in the wild. However, in PHI1413, it looks like it's enabled by default under RDS + MariaDB. We also currently have a query which fails this during login, so if you log out before fixing all the setup issues you may be unable to recover access via normal login.

This warning is also reasonable, so I'd like to reverse course on D10856 and conform to ONLY_FULL_GROUP_BY. The practical techniques this implies are likely:

  • In MySQL 5.7 and newer (circa 2015), we can use ANY_VALUE() to indicate that we're happy to accept any individual value from the aggregate. This release is probably too new to rely on. Prior to 5.7, MIN() seems to have approximately the same effect. We could introduce some kind of dynamic function selector, conceivably.
  • We can GROUP BY veryLongNameHash, veryLongName. This guarantees that all the values aggregated in veryLongName have the same value. This might be a more robust approach than using ANY_VALUE() depending on what we're doing.

Actual steps are roughly:

  • Locally, enable ONLY_FULL_GROUP_BY.
  • Fix whatever issues I hit.
  • Remove the setup warning.
  • Fix whatever issues we hit in the wild.
  • Possibly, recommend enabling ONLY_FULL_GROUP_BY.

Event Timeline

epriestley triaged this task as Wishlist priority.Sep 3 2019, 11:23 PM
epriestley created this task.

There are only 23 occurrences of the string "GROUP BY" in the codebase, and, from inspection, many obviously do not conflict with ONLY_FULL_GROUP_BY.

However, it's less clear in other cases. At least some versions of this query in Multimeter are in violation:

$data = queryfx_all(
  $conn,
  'SELECT *,
      count(*) AS N,
      SUM(sampleRate * resourceCost) AS totalCost,
      SUM(sampleRate * resourceCost) / SUM(sampleRate) AS averageCost
    FROM %T
    WHERE %LA
    GROUP BY %LC
    ORDER BY totalCost DESC, MAX(id) DESC
    LIMIT 100',
  $table->getTableName(),
  $where,
  array_select_keys($group_map, $group));

In some cases, the GROUP BY is built as a fragment so it's hard to determine if a pathway exists which might violate ONLY_FULL_GROUP_BY:

if ($joined_multiple_rows) {
  if ($joined_project_name) {
    return qsprintf($conn, 'GROUP BY task.phid, projectGroup.dst');
  } else {
    return qsprintf($conn, 'GROUP BY task.phid');
  }
}

RDS + MariaDB

That actually using MySQL official Docker image.

Now that we are on RDS I can confirm that MySQL also has ONLY_FULL_GROUP_BY enabled by default.