Page MenuHomePhabricator

MetaMTA - make sure mail garbage collection also cleans up recipient edges
ClosedPublic

Authored by btrahan on Jun 23 2015, 8:10 PM.

Details

Summary

Ref T5791. This edge table grows 2+X faster than the corresponding mail table depending on usage. Ergo, lets make sure to clean that up too in the delete code.

Test Plan

careful thought

Diff Detail

Repository
rP Phabricator
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

btrahan retitled this revision from to MetaMTA - make sure mail garbage collection also cleans up recipient edges.
btrahan updated this object.
btrahan edited the test plan for this revision. (Show Details)
btrahan added a reviewer: epriestley.
btrahan updated this revision to Diff 32470.
epriestley edited edge metadata.EditedJun 24 2015, 12:47 PM
epriestley accepted this revision.

Possibly we should implement DestructibleInterface and use PhabricatorDestructionEngine instead.

That will automatically clean up edges -- including all edge types and inverse edges, should Mail ever get those. These don't matter today but seem like reasonable things that might matter in the future. It will also do stuff like clean up any mail-related tasks in the task queue today, although I don't think this is terribly important (they should automatically fail, and it's unlikely that there are any left after 90 days).

I can come up with two downsides to this approach:

First, this will also write a PhabricatorSystemDestructionLog as a side effect, which is silly/pointless when doing garbage collection, since we'd be trading one row for another one. We could add a flag to it like setDisableLogging() to turn off logging, so we aren't just moving rows around. This seems reasonable but is a little bit of work.

Second, it's much heavier than just doing DELETEs, since it does a bunch of extra queries to look for flags, notifications, etc. The GC is single-process right now, so if an install ever generates mail faster than one process can delete it the GC won't be able to keep up. I suspect this is not a real risk for a very long time, but it might become a more realistic risk for large installs if multiple GCs start using DestructionEngine or the GC process gets heavier (although we could do multi-process GC without too much trouble).

I think this is reasonable as written and avoids both these potential downsides, but if these objects get more full-fledged in the future it might be worth converting.

Perhaps relevant is that DestructionEngine also cleans up HeraldTranscript records "for free", which these probably will have soon.

This revision is now accepted and ready to land.Jun 24 2015, 12:47 PM

Haha, I had a TODO to implement the destruction interface, but then figured that was too heavy weight for garbage collected stuff. For now, I want to just check this in as is to prevent data growing unexpectedly.

I guess the systems design question to me is if you want garbage collection to run through the destruction engine or not?

I don't think the upcoming HeraldTranscript integration (and other feature set) makes maintaining this delete too bad. We could reduce technical cost here by having edges and HeraldTranscript modularize how they delete themselves a bit more if necessary, so some code can be shared here? IDK waving my hands here but feels directionally like it might help.

This revision was automatically updated to reflect the committed changes.
epriestley added inline comments.Jun 28 2015, 2:39 PM
src/applications/metamta/garbagecollector/MetaMTAMailSentGarbageCollector.php
11

I missed this -- this used to be time() - ttl, but was changed to just time(), altering the threshold for deleting mail from "90 days ago" to "0 seconds ago" and deleting all mail in the table.

Not really a big deal, just why /mail/ is empty now. I'll fix this shortly...