Page MenuHomePhabricator

Add GROUP BY to commit query
ClosedPublic

Authored by epriestley on Apr 28 2014, 12:30 AM.
Tags
None
Referenced Files
F13179226: D8879.id.diff
Wed, May 8, 9:02 PM
F13176423: D8879.diff
Wed, May 8, 1:52 PM
Unknown Object (File)
Tue, May 7, 8:29 AM
Unknown Object (File)
Fri, May 3, 9:04 AM
Unknown Object (File)
Wed, Apr 24, 11:49 PM
Unknown Object (File)
Wed, Apr 24, 11:30 PM
Unknown Object (File)
Wed, Apr 24, 11:30 PM
Unknown Object (File)
Wed, Apr 24, 11:30 PM

Details

Summary

Ref T4715. Some minor stuff I caught locally while poking around:

  • Since we don't GROUP BY, we can still get duplicate commits. These get silently de-duplicated by loadAllFromArray() because that returns an array keyed by id, but we fetch too much data and this can cause us to execute too many queries to fill pages. Instead, GROUP BY if we joined the audit table.
  • After adding GROUP BY, getting the audit IDs out of the query is no longer reliable. Instead, query audits by the commit PHIDs. This is approximately equiavlent.
  • Since we always JOIN, we currently never return commits that don't have any audits. If we don't know that all results will have an audit, just LEFT JOIN.
  • Add some !== null to catch the withIDs(array()) issue that we hit with Khan Academy a little while ago.
Test Plan
  • Verified that "All Commits" shows commits with no audits of any kind.
  • Verified that the raw data comes out of the query without duplicates.

Diff Detail

Repository
rP Phabricator
Lint
Lint Skipped
Unit
Tests Skipped

Event Timeline

epriestley retitled this revision from to Minor tweaks to audit queries.
epriestley updated this object.
epriestley edited the test plan for this revision. (Show Details)
epriestley added a reviewer: btrahan.

I should have commented the duplicate row thing, as that was intentional. I felt clever even. :D

I think when you query *all* audits by commitPHID you over load audit data (potentially) based on what you query. Something like ConduitAPI_audit_query_Method would get wonky with this change I think.

Basically, once you start querying by audit data, I think you need to do it like it was before (to get accurate audit data) OR do it like it is now AND add some sort of audit filtering stuff to get the audits that match the audit specific query parameter. A third approach would be to make the change as you have it but see if callsites like the audit conduit method can be otherwise tweaked or their use cases served otherwise.

Something like ConduitAPI_audit_query_Method would get wonky with this change I think.

Ah! Good point.

A third approach would be to make the change as you have it but see if callsites like the audit conduit method can be otherwise tweaked or their use cases served otherwise.

I think this is the right approach -- do you remember anything other than audit.query offhand? I can poke around.

For audit.query itself, we don't have any audit.query callsites in the codebase and I don't remember why we have the call at all. I can't think of anything you'd want to do with it that wouldn't be better served by exposing these filters on diffusion.querycommits, but can do a bit of digging.

I'm going to split this into the LEFT JOIN/JOIN and GROUP BY changes separately, since the LEFT JOIN doesn't cause issues and fixes T4911.

audit.query and bin/audit are the two I remember off hand.

(bin/audit tries lets you delete audits by ID. this would return some N audits right now, with 1 of the N having the right ID and the other N-1 just happening to be on the same commit as the one with the right ID.)

epriestley retitled this revision from Minor tweaks to audit queries to Add GROUP BY to commit query.
epriestley edited edge metadata.

Fixes T5433.

  • Use GROUP BY.
  • I grepped for all the getAudits() callsites, and only audit.query and bin/audit delete used them in a way that expected a subset to be returned.
  • In both cases, I just manually filtered results in the caller. (We can probably deprecate audit.query, maybe.)

Test plan:

  • Paged through commits with multiple audits.
  • Used audit.query with various parameters.
  • Used bin/audit delete with various arguments.
src/applications/audit/management/PhabricatorAuditManagementDeleteWorkflow.php
112–124

This fixed an existing bug -- we were iterating over $audits, but should have iterated over $curr_audits. Mostly, this just made the script take a really long time to run.

btrahan edited edge metadata.
This revision is now accepted and ready to land.Jul 10 2014, 4:55 PM
epriestley updated this revision to Diff 23703.

Closed by commit rP16648c28bcd7 (authored by @epriestley).