HomePhabricator

Improve the performance of large remarkup documents with many complex rules

Description

Improve the performance of large remarkup documents with many complex rules

Summary:
See PHI1114. An install encountered a multi-megabyte document with approximately 11,000 replacement tokens (complex remarkup rules which store text and evaluate at display time) that required 17s to render.

Onsite investigation narrowed this down to a large amount of time spent in restore(), here.

Before this change, a document like this must call str_replace() on the full document for each token, so roughly O(size of the document * number of tokens) bytes are being shuffled around.

We can improve this dramatically by:

  • incrementally expanding tokens, so most operations are on one token instead of the entire document (and the total document size has a much smaller performance effect); and
  • replacing tokens in a single pass with preg_match() + append + implode() instead of running str_replace() in a loop.

Test Plan:
On this document:

echo str_repeat("T300 T301 T302 T303 T304 T305 T306 T307 T308 T309 T310\n", 1024).str_repeat("qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq\n", 1024 * 512);

..saw local time in restore() drop from ~3,300ms to ~10ms with no apparent behavioral changes.

Ran all unit tests, browsed around locally, loaded the page in the web UI.

Reviewers: amckinley

Reviewed By: amckinley

Differential Revision: https://secure.phabricator.com/D20522

Details