Page MenuHomePhabricator

Slightly improve performance of querystring construction by caching results
Changes PlannedPublic

Authored by epriestley on Wed, Jan 30, 8:19 PM.

Details

Summary

Ref T13242. For context, see this transaction.search profile from secure:

https://secure.phabricator.com/xhprof/profile/PHID-FILE-47g3frqfj7vmdcjfaczc/

There's ~80ms spent on query string construction.

Currently, we build query strings twice: once on construction (to test for errors) and once on use. We always expect to get the same result, so we can cache these strings. This should improve query string construction cost by ~50%.

Also, about 25% of the strings are empty: qsprintf($conn, ''). This is important for safety/correctness, but we can short-circuit these when building strings. This should reduce construction cost by ~25%.

Test Plan

Poked around locally without hitting behavioral issues; will compare against production profile.

Diff Detail

Repository
rPHU libphutil
Branch
cache1
Lint
Lint OK
Unit
Unit Tests OK
Build Status
Buildable 21767
Build 29702: Run Core Tests
Build 29701: arc lint + arc unit

Event Timeline

epriestley created this revision.Wed, Jan 30, 8:19 PM
epriestley requested review of this revision.Wed, Jan 30, 8:19 PM

(I did eventually make it far enough through T13232 to get XHProf back online locally, and collected some profiles suggesting that this and D20068 have an actual impact.)

amckinley accepted this revision.Fri, Feb 1, 3:51 AM
This revision is now accepted and ready to land.Fri, Feb 1, 3:51 AM

I'm a little surprised that this would have a significant performance impact, but maybe I'm underestimating the amount of query string building we do.

epriestley planned changes to this revision.Fri, Feb 1, 4:12 PM

I think the overall effect here isn't very large, but I think it's probably large enough to be worth the relatively small complexity cost of this patch.

Although I profiled ~80ms in the link in the summary, this is probably a bit (maybe "very") misleading. For one, the profiler has a bit of overhead per-function-call (to measure the call) and this stack makes a lot of very cheap calls. That tends to mean that the profiler will over-report the cost. Second, profiling a whole page on a production server isn't super repeatable so who knows.

Third, part of the reason this number was so high is that this API call (at least, before the "don't load handles" change) actually builds a lot of queries when run against a complex object, since each story may have multiple PHIDs for different other object types and we have to load each object type (project, task, user, revision, commit, etc) in a separate query. This isn't wildly atypical but it's on the higher side. The other change ("don't load handles") throws away a ton of these queries and sharply reduces the impact of this change.

So there are some good arguments that the impact here is probably a lot smaller than it looks at first glance. On the "but maybe it's worth it anyway" side, we have: the total number of calls (5,700) was pretty crazy no matter how cheap they are; this change isn't very complicated; and this should impact ~every page, so even if we're only getting a little bit of benefit we may be realizing that benefit on every page.

There's also an argument like "even if there's not much performance impact, reducing the total number of calls we make by a large fraction like this makes profiling more accurate overall, so we can find other performance issues more easily in the future". I'm not sure I like this argument (we definitely shouldn't be inlining tons of code and saying "inline code is fast!!!1111~~") but I don't think it's completely meritless.

Here's what I measure locally for ab -n 100 -c 1 http://local.phacility.com/api/transaction.search ...:

Test50%95%
No Patches45ms49ms
Query String Cache Only42ms46ms
Handles Patch Only39ms43ms
Both Patches39ms40ms

There's a lot of jitter and a small overall effect (the page is already fast, and I'm not querying a very complex object, and bin/lipsum isn't currently very good at adding a lot of comments/transactions to an object), but the numbers are headed in the right direction at least. A 3ms change isn't much, but it's maybe 7% of the total page weight on 45ms at the 50th percentile.

The "query string cache" change impact is completely wiped out by the "handles" change at the 50th percentile, so this patch may not really help transaction.search much now that it issues far fewer queries, but even if it doesn't help here very much we have plenty of other pages that do run a lot of queries and it "should" "probably" improve things "somewhere".

I expect this effect to be slightly more pronounced in production (mostly: more transactions) but I'll try take a stab at actually measuring it and see if that's true.


Another thought is that this is theoretically a behavioral change. Consider this code:

$object->setTableName('abc');
$fragment = qsprintf($conn, '%R', $object);
$object->setTableName('xyz');
queryfx($conn, 'SELECT * FROM %Q', $fragment);

Previously, this would query for SELECT * FROM xyz (the xyz call affects the object before the query is actually turned into a string). Now, this would query from SELECT * FROM abc.

This pattern is really bad overall and we should never be doing it (and I'm like 99.999% sure we aren't, even by accident), but I think the new behavior is probably slightly (?) better if it does happen. That is, the xyz call having an effect on $fragment seems very surprising.

A negative consequence of this is that if we have code like the snippet above, the unmasked query (which we execute) will be cached, but the masked query (which we show to users) will not. This could mean that the executed query and displayed query differ. This ain't great, although the mess you have to get yourself into to get different behavior is pretty extreme.


The more I overthink this, the more I'm actually leaning toward holding this change back for now, at least for this release. In particular, possibly the mask/unmask code should work like:

  • When you build a QueryString, it immediately renders both forms internally, caches them, and throws away the original arguments.

This: avoids the extreme edge case possibility where the displayed and executed strings differ; often lets us render the masked string for free (since it's usually the same as the unmasked string), and might, in theory, give the GC more room to do GC stuff a little more cheaply since we don't retain references to objects.

Let me hold onto this for a bit. D20068 appears to be a trivial and dramatic improvement here so I'm happy with just shipping that for now.