See PHI284. See PHI201.
**"APIs"**
Symbols are currently managed with two scripts, `scripts/symbols/clear_repository_symbols.php` and `scripts/symbols/import_repository_symbols.php`.
All of this stuff is very non-modern. I'm also not sure it actually works, although it looks more likely that it works than, e.g., `scripts/repository/save_lint.php`.
I'd like to move all symbol management to Conduit (T12318). There is nothing inherently operational/administrative about symbol indexes and using a bot to manage them is reasonable and desirable. This will also let these features work on Phacility instances, and let future versions of Diviner that publish documentation over Conduit also perform symbol updates (T8536).
Symbols aren't unique and don't have any stable identifier, so it's hard to update them (versus nuking and rebuilding them). However, we can provide better tools for this than we currently do. In particular, we could support:
- Remove all symbols in a particular file (so you can do an update file-by-file: `git diff --raw`, analyze each file which was touched since the last update, wipe them all, push the new symbols).
- Remove specific symbols (so you can fully manage a cache and diff the symbol list if you want).
The "remove + add" should probably happen in the same call and commit atomically, so that implies the API looks something like `symbols.update` with a list of things to remove and a list of things to add.
Supplemental scripts `scripts/symbols/generate_ctags_symbols.php` and `scripts/symbols/generate_php_symbols.php` work alongside the existing scripts. These should probably either be removed and replaced with general documentation, moved to `arc`, moved wholesale to documentation, or integrated into some `arc symbol` workflow.
**Storage**
I don't see any immediate problems with `PhabricatorRepositorySymbol`, although it doesn't have an `id` column, which is a little unusual. I think this is fine for now.
One sort-of-meta-concern here is that symbol indexes are per-repository, but symbols are per-working-copy-state. Your `master` and `experimental` branches might have wildly different symbols. No one has cared yet, though, and we can generally punt on this, hopefully indefinitely.
**Querying**
Symbols don't use a modern `SearchEngine`, but should. `DiffusionSymbolController` is basically just an ancient version of `SearchEngine`. Paging may be difficult because `Symbol` objects do not have an ID column and may be virtual external symbols, but this should be tractable.
T10771 discusses a specific UI deficiency of this page. The UI could use improvements in general. Since it's likely to be "free", we should also probably define `diffusion.symbol.search` or similar.
**Symbol Context**
When `X::y()` or `namespace x; y();` appears in the code and the user clicks `y`, we support jumping to the correct definition. However, this requires a very smart highlighter. Only XHPAST is currently smart enough, and making a smart highlighter is hard work.
PHI201 suggests that more information about the path where `namespace x; y();` appeared could be used to make an educated guess about the context. I think this is reasonable. In a similar vein, PHI284 asks for filenames and line numbers to be passed to external queries.
We don't always have this information (for example, if a user types `s y` into global search, we can't provide any information about files or line numbers), but we often do, and can provide more information when it's available.
Broadly:
- Clicking links in Differential and Diffusion should pass file and line context to symbol lookup.
- This context should be passed to `DiffusionExternalSymbolsSource` if it's available.
- `DiffusionExternalSymbolsSource`, or some similar `EngineExension` sort of class, should get some kind of hook for ranking symbols. Since this is a mess, maybe the implementation really looks like doing the query in separate high-priority and low-priority phases. To make this work by default, it probably //really// looks like each source getting to define its own phases.