For previous work, see T5954.
After T13046, the state of affairs is:
- Phabricator supports modular data export as part of core infrastructure.
- It is implemented by `PullLog` and `User`.
- After work here, by `Task`.
- ~~Maniphest still has a legacy, hard-coded version of this which exports to Excel in a non-modular way.~~
- After work here, Maniphest is now modern.
---
Completed changes:
{icon check, color=green} **Support Excel**: Currently, CSV, JSON, and text are supported. Elsewhere, in Maniphest, we support building Excel sheets. An Excel exporter should be added to the new infrastructure (e.g., `PhabricatorExcelExportFormat`).
{icon check, color=green} **Upgrade Maniphest**: Replace Maniphest's export process with the core infrastructure process. Remove `ManiphestExcelFormat` and `ManiphestExcelDefaultFormat`. This will break a small number of third-parties. I believe third-party exporters were almost exclusively used to hack in custom field support (below), though.
{icon check, color=green} **Modularize More Fields**: The `id` and `phid` fields are currently implemented by each `SearchEngine`, but the base class could implement them.
Extensions (like Custom Fields, maybe Flags, Subscribers, Projects, etc) should be able to add new field definitions and export data. See T5391 for Custom Fields specifically.
This would start by defining an `ExportEngineExtension` base class, then implementing `CustomFieldExportEngineExtension`, etc. Make the base `SearchEngine` build extensions and call them to add more fields and data while exporting.
{icon check, color=green} **Column Headers/Labels**: JSON is self-labeling but CSV, text, and Excel could use column headers. Before we pass the field data to the exporter, we can pass just the field list (`addHeaders($fields)`). The JSON format can return immediately while other formats can build a header row.
{icon check, color=green} **Background file generation**: For very large result sets, foreground generation of export formats will fail (either hitting memory limits or time limits). We can test if we're going to examine more than, say, 1K rows, and queue a background job (like the Bulk jobs) instead of generating these large sheets in the foreground. The current stuff is written with a soft assumption that we'll pursue this eventually (e.g., you can stream 100 results at a time to the exporters and don't need to load them all into memory at once).
---
More speculative/future stuff, some of which may not have much value. Generally, I think we should probably wait for users to hit use cases for most of this stuff before pursuing it.
**Let users reorder and filter columns**: If an object exports columns X, Y, and Z, let the user select that they only want X and Z in order "Z, X". The APIs are currently built with a soft assumption that we might pursue this eventually. On the other hand, you can do this yourself in Excel and it's probably quite a lot of work to do in the web UI (since we'd realistically need to let you edit/save/share these configurations -- there's no time savings if you have to go uncheck 250 checkboxes every time you click the "Export Data" button).
**Let users reformat fields**: If we support filtering/reordering fields, we might also support selecting a format for fields. For example, for a date field, you might choose to export it as an epoch timestamp, or as YYYY-MM-DD, or as Saturday, XXth of YYYY Year of Our Lord, or whatever else. As above, you can do this in Excel yourself, so the value isn't clear.
**Support more applications**: Adding support for more applications is fairly easy, but it's not clear that exporting most applications to CSV is particularly useful.