Page MenuHomePhabricator

detectCopiedCode has some kind of explosive runtime behavior on large diffs
Closed, ResolvedPublic

Description

See IRC. We should disable this for diffs over a certain number of changes or something like that, since a ~50MB diff took more than 2 hours to process.

Event Timeline

epriestley raised the priority of this task from to Normal.
epriestley updated the task description. (Show Details)
epriestley added a project: Differential.

I believe D9178 fixes this:

  • You can support that theory by checking if the diff for the offending revision contains a large number of identical lines.
  • You can confirm that theory by retrying the reparse.php --message ... command after D9178 lands, and seeing if it exits in fewer than 2 hours.

The file itself certainly does:

$ git show c84da7041b1199f051d041ca2cc86f3736e0898b:crowdin_data.pickle | sort | uniq -c | sort -n | tail
   4612 Vlearn.math.cc-eighth-grade-math.exercises
   4674 Vlearn.math.cc-sixth-grade-math.exercises
   4863 Vlearn.math.calculus.exercises
   4944 Vlearn.math.cc-seventh-grade-math.exercises
   6181 Vlearn.math.early-math.exercises
   7603 Vlearn.math.algebra.exercises
   8504 Vlearn.math.trigonometry.exercises
  54731 sg13
  57871 g11
 108380 g2

Not sure if there's something special I need to do to make reparse.php --message be slow again; P1148 shows its current (fast) output. The diffusion page for that commit says "Still Importing...".

If --message worked, do:

./scripts/repository/reparse.php --change --owners --herald --force rGTWc84da7041b1199f051d041ca2cc86f3736e0898b

...and you should be good to go. That transcript looks like it didn't try to update the revision this time (just skipping all the work), but ehh whatever.

Sorry if I was unclear -- that paste is the result of running without your new patch so it should still be slow (but isn't). Is it easy to make it redo the work? Running with --change (again, without D9178) completes quickly and gets rid of the "Still importing..." message.

I don't think there's an easy way: you have to go find the corresponding Differential revision, reopen it, and then probably manually remove the link between the revision and the commit in the database (and I'm not 100% sure that's sufficient). I can't think of an easy way to get back into this code, offhand.