Ref T13130. I wasn't able make this much better, but it looks like this is about ~20% faster on my system.
This kind of thing is somewhat difficult to micro-optimize because XHProf tends to over-estimate the cost of function calls. In XHProf, this looks much much faster than the old version (~100% faster) but the actual cost of bin/conduit call --method differential.getrawdiff hasn't improved that much. Still, it seems consistently faster across multiple runs.