Page MenuHomePhabricator

Added popularity-based ranking to macros app (WIP)
Changes PlannedPublic

Authored by nipunn on Jul 24 2014, 7:40 AM.
Referenced Files
F13320571: D10035.diff
Fri, Jun 14, 5:29 AM
F13315297: D10035.id30222.diff
Wed, Jun 12, 11:48 AM
F13315296: D10035.id25277.diff
Wed, Jun 12, 11:47 AM
F13315295: D10035.id25018.diff
Wed, Jun 12, 11:47 AM
F13315294: D10035.id24122.diff
Wed, Jun 12, 11:47 AM
Wed, Jun 12, 11:46 AM
F13315281: D10035.diff
Wed, Jun 12, 11:34 AM
F13309450: D10035.diff
Mon, Jun 10, 10:13 AM
"Manufacturing Defect?" token, awarded by angie."Piece of Eight" token, awarded by joshuaspence."Like" token, awarded by lfaraone.


Group Reviewers
Blessed Reviewers

Built this for fun in a few days with Jacob Olson. Here's what it looks like:
It allows you to sort macros by popularity (by usage). Also allows you to
search by user usage to find which macros each user prefers.

By no means is this polished. The queries are nasty. The architecture is odd,
but I didn't want to spend the time figuring that out while building it,
so I went for the path of least resistance. Looking for some feedback here.
I'll add some inline comments at parts we're not particularly proud of.

Test Plan

Patched it onto our internal phabricator build and fumbled around. Happy to
write actual tests eventually, but I suspect heavy refactoring needs to go

Diff Detail

rP Phabricator
Lint Skipped
Tests Skipped

Event Timeline

nipunn retitled this revision from to Added popularity-based ranking to macros app (WIP).
nipunn updated this object.
nipunn edited the test plan for this revision. (Show Details)


18 ↗(On Diff #24122)

Currently, we build an auxiliary table file_imagemacrousage which contains one row for every instance of every macro usage. We query the entire table if we do a popularity-sorted query. One might envision rolled up versions of this table for optimization if necessary.

We use the table file_imagemacrousagecursors to keep track of how far into the other comment tables we've scanned to backfill the imagemacrousage table.

Currently, it only backfills differential_transaction_comment. I could not find a reasonable way to find all tables containing fields that will get remarkup'd.


This is a bit gross because it requires a table join between file_imagemacro and file_imagemacrousage. In order to sort the results by popularity, we also had to add a having clause.


Ugh. Basically grabbed the paging clause and stuck it in the having block since it contains a COUNT(*) aggregation


This is terrible. There isn't really good support in the CursorPagedPolicyAwareQuery for sorting on multiple columns (in our case, popularity then id). We worked around it by overriding getPagingColumn, getPagingValue, and buildPagingClause.

What would be a more reasonable way to do this?


This is nasty because it actually backfills/populates the table synchronous to the request. We cap the amount of backfill done per-request so as not to timeout / overload a mysql query. But there's gotta be a better way.
Is there a way to do a backfill as part of a migration? Perhaps as an async background task?

Macro was targeted at myself of course

Hahaha, see D3022 (and D3021) for some previous discussion (which was a hackathon project from Facebook, I think, but are about 2 years old now).

The two major issues at that time are still relevant. In summary:

  1. The usage table needs to be populated in a reasonable (gradual/daemon/offline) way, not built at query time.
  2. Every application needs to be supported, not just Differential.

However, we're in a much better position to attack those problems today than we were in 2012. In particular, I'd suggest this approach, which I think is mostly practical, can reasonably be implemented today, and is not particularly more complicated than your implementation:

  • Modify the macro remarkup rule to expose usage counts, in the way that the @mention and #project rules expose mentioned users and projects. You can look at those rules for examples (although I think those examples are slightly more complex). The idea is that the Macro rule just saves a list of all the macro PHIDs (and maybe the number of uses) when rendering, and then the caller can later extract that. The major goal here is to avoid duplicating the logic for identifying macros.
  • Modify PhabricatorApplicationTransactionEditor->expandRemarkupBlockTransactions(). Currently, this extracts #project mentions and writes links from them. Add a block of code which extracts macro PHIDs and updates the usage table.
  • I think the table should look like <id, macroPHID, authorPHID, usagePHID, count> (and maybe epoch if you want to eventually implement time decay) but usagePHID needs to store a Transaction PHID (so we can update the data if the comment is edited or removed). This is a little tricky, because when you extract the macro uses the transactions won't have PHIDs yet (and the transactions may be modified later in the Editor). I don't immediately see a clean way to fix this. Moving extraction to PhabricatorApplicationTransactionCommentEditor might be one approach (it may also make edit/delete easier). For now, you could hook other things up and just use the $object->getPHID() (which will be the revision/task/etc PHID), and we can figure out how to get the transaction PHIDs populating properly later.
  • To fill historic data, write a script which iterates over every transaction from every application. The skeleton should look something like this:
$xaction_classes = id(new PhutilSymbolLoader())
foreach ($xaction_classes as $xaction_class) {
  foreach (new LiskMigrationIterator($xaction_class) as $xaction) {
    // $xaction is now an ApplicationTransaction and you can load
    // the comment and update the macro usage table.
  • ManiphestTaskQuery->buildPagingClause() has an example of building paging clauses from multiple columns, but it's not much simpler. (I think this is legitimately complicated and not something we could trivially reduce with better API, but 95% of the time the behaviors are simple so there might be room to improve the API that we just haven't figured out yet.)
  • Simplifying the query capabilities (at least, in v1) might make this a bit easier. For example, you could rank/show only total popularity, and show top users on the macro detail page. That said, having "Favorites" is actually useful, so maybe this isn't really worthwhile.

The actual code mostly looks fine here, but I think the approach will be hard to move forward with.

ETL (mentioned in D3022) is also slightly farther along than it was at the time, but I think it's no longer the best approach in the modern codebase. Particularly, transactions were not generic across applications in 2012 and the approach outlined above would have been far more complicated.

epriestley added a reviewer: epriestley.

Just pushing this out of my queue, no rush. :)

This revision now requires changes to proceed.Jul 30 2014, 9:05 PM
nipunn added a subscriber: eadler.
nipunn edited edge metadata.

Moved logic to remarkup rules.
Couldn't quite get the backfill migration script working. Looking for some help there.
Otherwise, it works great.

nipunn edited edge metadata.

Got the backfill migration script to work on DifferentialTransactionComment.
Still can't quite get it working in the general case. I feel like I misunderstand
PhabricatorApplicationTransaction and PhabricatorApplicationTransactionComment

This looks awesome! What's its status? From @nipunn's last comment, it looks like everything is working except for the backfill?

Whatever happened to this diff? It seems like a really useful and important feature.

wasimsikder1 changed the visibility from "Public (No Login Required)" to "wasimsikder1 (wasim sikder)".Feb 13 2016, 7:36 AM
nipunn changed the visibility from "wasimsikder1 (wasim sikder)" to "Public (No Login Required)".

@jhurwitz and I made some effort toward this mid-last-year, but the backfill script is still not quite working. Figure I'd put up the progress.

Also couldn't figure out how to patch the diff ( into my code without attaching to a revision.