Page MenuHomePhabricator

add "update" mode to Diffusion coverage Conduit
ClosedPublic

Authored by ddfisher on Nov 6 2015, 11:03 PM.
Tags
None
Referenced Files
Unknown Object (File)
Wed, Aug 3, 7:27 AM
Unknown Object (File)
Sat, Jul 23, 5:02 PM
Unknown Object (File)
Thu, Jul 14, 4:30 AM
Unknown Object (File)
Tue, Jul 12, 5:45 PM
Unknown Object (File)
Mon, Jul 11, 4:52 PM
Unknown Object (File)
Jun 25 2022, 10:28 AM
Unknown Object (File)
Jun 18 2022, 3:57 AM
Unknown Object (File)
Jun 15 2022, 9:35 AM
Subscribers

Details

Summary

This diff adds a new mode argument to the Diffusion Conduit API with two options:

  • "overwrite": the default, maintains the current behavior of deleting all coverage in the specified branch before uploading the new coverage
  • "update": does not delete old coverage, but will overwrite previous coverage information if it's for the same file and commit

DiffusionRequest::loadCoverage already loads a file's coverage from the
latest available commit, so uploading coverage for different files in different
commits with "update" will result in seeing the latest uploaded coverage in
Diffusion.

Test Plan

manual local verification

Diff Detail

Repository
rP Phabricator
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

ddfisher retitled this revision from to add "overwrite" mode to Diffusion coverage Conduit.
ddfisher updated this object.
ddfisher edited the test plan for this revision. (Show Details)
ddfisher added a reviewer: epriestley.

Conspicuously absent is the "merge" mode we discussed because I've forgotten how to write PHP. If you want that in this diff I can add it, though

epriestley edited edge metadata.

This kind of only works accidentally because we ORDER BY commitID DESC, which expands implicitly into ORDER BY commitID DESC[, id DESC], I think, possibly because of how MySQL refuses to mix ASC/DESC (see D10895).

That is, if we have this table:

mysql> SELECT * FROM repository_coverage;
+----+----------+----------+--------+----------+
| id | branchID | commitID | pathID | coverage |
+----+----------+----------+--------+----------+
|  1 |        1 |    82639 |      0 | B        |
|  2 |        1 |    82639 |      0 | C        |
|  3 |        1 |    82639 |      0 | A        |
|  4 |        1 |    82639 |      0 | D        |
+----+----------+----------+--------+----------+
4 rows in set (0.00 sec)

...and we issue the current query against it, we get the desired result:

mysql> SELECT * FROM repository_coverage WHERE branchID = 1 AND pathID = 0 ORDER BY commitID DESC LIMIT 1;
+----+----------+----------+--------+----------+
| id | branchID | commitID | pathID | coverage |
+----+----------+----------+--------+----------+
|  4 |        1 |    82639 |      0 | D        |
+----+----------+----------+--------+----------+
1 row in set (0.00 sec)

...but we get a different result if we make an apparently unrelated change, which seems surprising/fragile and would be hard to catch:

                                                                                             +- Now "ASC" 
                                                                                             |
                                                                                             V
mysql> SELECT * FROM repository_coverage WHERE branchID = 1 AND pathID = 0 ORDER BY commitID ASC LIMIT 1;
+----+----------+----------+--------+----------+
| id | branchID | commitID | pathID | coverage |
+----+----------+----------+--------+----------+
|  1 |        1 |    82639 |      0 | B        |
+----+----------+----------+--------+----------+
1 row in set (0.00 sec)

                                      ^
                                      |
                            Now "B"! -+

Changing DESC to ASC has changed the implicit order from commitID DESC[, id DESC] to commitID ASC[, id ASC].

Although we're unlikely to randomly change that DESC to ASC, it's broadly fragile and this will also make implementing a "merge" mode more difficult. Instead, I think we should act as through the key_path is unique -- I think it is not unique only for historical reasons.

Basically, we should delete the rows we're going to overwrite before writing the new rows, so that the database always has zero or one row per <branchID, commitID, pathID> tuple. Then it's fine if the query changes and merging won't have to deal with weird states that are only reachable if you use the overwrite mode.

(It's fine to not implement the merge mode until someone comes up with a use case for it.)

This revision now requires changes to proceed.Nov 6 2015, 11:23 PM

Oops! I absolutely agree the database should have at most one row per <branchID, commitID, pathID> tuple. I was confused about the behavior of the insert. Should've checked the database. <_<

Will fix.

Actually, would it make sense to enforce this uniqueness at the SQL level?

Yes, that just might be a more involved change than you want to deal with. :)

ddfisher edited edge metadata.

make (branch, commit, file) entries unique in repository_coverage; tweak mode names

ddfisher retitled this revision from add "overwrite" mode to Diffusion coverage Conduit to add "set_file" mode to Diffusion coverage Conduit.Nov 9 2015, 10:34 PM
ddfisher updated this object.
ddfisher edited edge metadata.

To me, set_file and set_branch are less clear than set and overwrite. I could not guess the relationship by only knowing the names, and if I faced the problem you face and knew the names I suspect I could not make the connection that the mode parameter helps me solve the problem. What's the thinking behind this change?

(And the overwrite mode should probably be deleting all changes by commitID, not by branchID. Except there's no key on commitID. And we shouldn't even really have branchID in this table, but we can't efficiently query the history of a branch...)

resources/sql/autopatches/20151109.repository.coverage.1.sql
3

For consistency, just namespace these tables explicitly:

DELETE x FROM {$NAMESPACE}_repository.repository_coverage ...

The name change was motivated because I realized that the two operations differed only by level of granularity. It makes more sense to me if I imagine a set_commit level as well:

  • set/set_branch replaces all previous coverage information for a branch with the provided coverage information.
  • set_commit would replace all previous information for (branch, commit) with the provided information.
  • overwrite/set_file replaces previous information for (branch, commit, file) with the provided information.

I'm honestly not particularly happy with the new names either and was interested to see what you thought. Happy to change them back.

resources/sql/autopatches/20151109.repository.coverage.1.sql
3

It turns out that doesn't work for reasons that I completely don't understand.

mysql> DELETE x FROM phabricator_repository.repository_coverage x
    -> LEFT JOIN phabricator_repository.repository_coverage y
    ->   ON  x.branchID = y.branchID
    ->   AND x.commitID = y.commitID
    ->   AND x.pathID = y.pathID
    ->   AND y.id > x.id
    ->   WHERE y.id IS NOT NULL;
ERROR 1046 (3D000): No database selected

And I get the same error when running through ./bin/storage. If I change the DELETE x to SELECT * it runs fine.

With regards to the previous names: I think you got them backwards from what we originally came up with. Actually, they may be significantly better that way. Hmm...

epriestley edited edge metadata.

Huh, weird on the SQL thing.

I think "set" + "overwrite" or "overwrite" + "update" are more clear than the "set_x" stuff. The branch vs commit distinction probably isn't useful. I think the branch stuff shouldn't really even exist here, and wouldn't if we had an efficient way to execute git/hg log in large repositories in a few milliseconds. Consider some flavor of operation names in that vein.

There's a formatStringConstants() method you can use to make the API documentation a little more clear, see DifferentialFindConduitAPIMethod for an example. That would render something like "optional string<"set", "overwrite">". Consider using getMethodSummary() / getMethodDescription() to make the behavior of the method more explicit/clear on the /conduit/method/... UI page.

You should have commit access now. If you use arc land, you may need to point your origin at this upstream and add a public key first. If you want to use the "Land Revision" button, you need to arc diff with a clean push to the staging area first -- D14420 has a couple of potential pitfalls. The workflow is a bit rough overall but there's a spinny plane icon.

resources/sql/autopatches/20151109.repository.coverage.1.sql
2

Bizarre. Well maybe USE (vs use) at least.

This revision is now accepted and ready to land.Nov 10 2015, 12:37 AM

"overwrite" + "update" sounds good to me! Changed.

Added the formatStringConstants(...) and verified it appears on the conduit page.

I'll probably just go with adding my public key. The spinny plane icon is pretty tempting, though...

resources/sql/autopatches/20151109.repository.coverage.1.sql
2

Done.

ddfisher retitled this revision from add "set_file" mode to Diffusion coverage Conduit to add "update" mode to Diffusion coverage Conduit.Nov 10 2015, 12:49 AM
ddfisher updated this object.
ddfisher edited edge metadata.
This revision was automatically updated to reflect the committed changes.

Thanks for the quick feedback and for spending the time to help me do things The Right Way (instead of the okay but expedient way)!