Improve the performance of large remarkup documents with many complex rules
Summary:
See PHI1114. An install encountered a multi-megabyte document with approximately 11,000 replacement tokens (complex remarkup rules which store text and evaluate at display time) that required 17s to render.
Onsite investigation narrowed this down to a large amount of time spent in restore(), here.
Before this change, a document like this must call str_replace() on the full document for each token, so roughly O(size of the document * number of tokens) bytes are being shuffled around.
We can improve this dramatically by:
- incrementally expanding tokens, so most operations are on one token instead of the entire document (and the total document size has a much smaller performance effect); and
- replacing tokens in a single pass with preg_match() + append + implode() instead of running str_replace() in a loop.
Test Plan:
On this document:
echo str_repeat("T300 T301 T302 T303 T304 T305 T306 T307 T308 T309 T310\n", 1024).str_repeat("qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq\n", 1024 * 512);
..saw local time in restore() drop from ~3,300ms to ~10ms with no apparent behavioral changes.
Ran all unit tests, browsed around locally, loaded the page in the web UI.
Reviewers: amckinley
Reviewed By: amckinley
Differential Revision: https://secure.phabricator.com/D20522