Symbols are currently managed with two scripts, scripts/symbols/clear_repository_symbols.php and scripts/symbols/import_repository_symbols.php.
All of this stuff is very non-modern. I'm also not sure it actually works, although it looks more likely that it works than, e.g., scripts/repository/save_lint.php.
I'd like to move all symbol management to Conduit (T12318). There is nothing inherently operational/administrative about symbol indexes and using a bot to manage them is reasonable and desirable. This will also let these features work on Phacility instances, and let future versions of Diviner that publish documentation over Conduit also perform symbol updates (T8536).
Symbols aren't unique and don't have any stable identifier, so it's hard to update them (versus nuking and rebuilding them). However, we can provide better tools for this than we currently do. In particular, we could support:
- Remove all symbols in a particular file (so you can do an update file-by-file: git diff --raw, analyze each file which was touched since the last update, wipe them all, push the new symbols).
- Remove specific symbols (so you can fully manage a cache and diff the symbol list if you want).
The "remove + add" should probably happen in the same call and commit atomically, so that implies the API looks something like symbols.update with a list of things to remove and a list of things to add.
Supplemental scripts scripts/symbols/generate_ctags_symbols.php and scripts/symbols/generate_php_symbols.php work alongside the existing scripts. These should probably either be removed and replaced with general documentation, moved to arc, moved wholesale to documentation, or integrated into some arc symbol workflow.
I don't see any immediate problems with PhabricatorRepositorySymbol, although it doesn't have an id column, which is a little unusual. I think this is fine for now.
One sort-of-meta-concern here is that symbol indexes are per-repository, but symbols are per-working-copy-state. Your master and experimental branches might have wildly different symbols. No one has cared yet, though, and we can generally punt on this, hopefully indefinitely.
Symbols don't use a modern SearchEngine, but should. DiffusionSymbolController is basically just an ancient version of SearchEngine. Paging may be difficult because Symbol objects do not have an ID column and may be virtual external symbols, but this should be tractable.
T10771 discusses a specific UI deficiency of this page. The UI could use improvements in general. Since it's likely to be "free", we should also probably define diffusion.symbol.search or similar.
When X::y() or namespace x; y(); appears in the code and the user clicks y, we support jumping to the correct definition. However, this requires a very smart highlighter. Only XHPAST is currently smart enough, and making a smart highlighter is hard work.
PHI201 suggests that more information about the path where namespace x; y(); appeared could be used to make an educated guess about the context. I think this is reasonable. In a similar vein, PHI284 asks for filenames and line numbers to be passed to external queries.
We don't always have this information (for example, if a user types s y into global search, we can't provide any information about files or line numbers), but we often do, and can provide more information when it's available.
- Clicking links in Differential and Diffusion should pass file and line context to symbol lookup.
- This context should be passed to DiffusionExternalSymbolsSource if it's available.
- DiffusionExternalSymbolsSource, or some similar EngineExension sort of class, should get some kind of hook for ranking symbols. Since this is a mess, maybe the implementation really looks like doing the query in separate high-priority and low-priority phases. To make this work by default, it probably really looks like each source getting to define its own phases.