Page MenuHomePhabricator

Plans: Data Exporters (Excel, JSON, CSV, etc.)
Closed, ResolvedPublic

Description

For previous work, see T5954.

After T13046, the state of affairs is:

  • Phabricator supports modular data export as part of core infrastructure.
  • It is implemented by PullLog and User.
    • After work here, by Task and UserLog and PushLog.
  • Maniphest still has a legacy, hard-coded version of this which exports to Excel in a non-modular way.
    • After work here, Maniphest is now modern.

Completed changes:

Support Excel: Currently, CSV, JSON, and text are supported. Elsewhere, in Maniphest, we support building Excel sheets. An Excel exporter should be added to the new infrastructure (e.g., PhabricatorExcelExportFormat).

Upgrade Maniphest: Replace Maniphest's export process with the core infrastructure process. Remove ManiphestExcelFormat and ManiphestExcelDefaultFormat. This will break a small number of third-parties. I believe third-party exporters were almost exclusively used to hack in custom field support (below), though.

Modularize More Fields: The id and phid fields are currently implemented by each SearchEngine, but the base class could implement them.

Extensions (like Custom Fields, maybe Flags, Subscribers, Projects, etc) should be able to add new field definitions and export data. See T5391 for Custom Fields specifically.

This would start by defining an ExportEngineExtension base class, then implementing CustomFieldExportEngineExtension, etc. Make the base SearchEngine build extensions and call them to add more fields and data while exporting.

Column Headers/Labels: JSON is self-labeling but CSV, text, and Excel could use column headers. Before we pass the field data to the exporter, we can pass just the field list (addHeaders($fields)). The JSON format can return immediately while other formats can build a header row.

Background file generation: For very large result sets, foreground generation of export formats will fail (either hitting memory limits or time limits). We can test if we're going to examine more than, say, 1K rows, and queue a background job (like the Bulk jobs) instead of generating these large sheets in the foreground. The current stuff is written with a soft assumption that we'll pursue this eventually (e.g., you can stream 100 results at a time to the exporters and don't need to load them all into memory at once).


More speculative/future stuff, some of which may not have much value. Generally, I think we should probably wait for users to hit use cases for most of this stuff before pursuing it.

Let users reorder and filter columns: If an object exports columns X, Y, and Z, let the user select that they only want X and Z in order "Z, X". The APIs are currently built with a soft assumption that we might pursue this eventually. On the other hand, you can do this yourself in Excel and it's probably quite a lot of work to do in the web UI (since we'd realistically need to let you edit/save/share these configurations -- there's no time savings if you have to go uncheck 250 checkboxes every time you click the "Export Data" button).

Let users reformat fields: If we support filtering/reordering fields, we might also support selecting a format for fields. For example, for a date field, you might choose to export it as an epoch timestamp, or as YYYY-MM-DD, or as Saturday, XXth of YYYY Year of Our Lord, or whatever else. As above, you can do this in Excel yourself, so the value isn't clear.

Support more applications: Adding support for more applications is fairly easy, but it's not clear that exporting most applications to CSV is particularly useful.

Revisions and Commits

rP Phabricator
D18973
D18972
D18971
D18970
D18969
D18968
D18967
D18966
D18965
D18962
D18961
D18960
D18959
D18958
D18957
D18956
D18955
D18954
D18953
D18952
D18951

Event Timeline

epriestley triaged this task as Normal priority.Jan 26 2018, 4:57 PM
epriestley created this task.

See PHI323.

  • These exports can be >4MB, so the piece that shoves the data into Files should support chunked storage.
  • The install wants to export enough data to hit reasonable generation limits, and we should support backgrounding larger result sets.
  • They're also interested in push logs and user activity logs.

See PHI324.

  • An install wants custom fields and "Author" in the Maniphest export, motiving the Maniphest changes.

It looks like PHPExcel was formally deprecated on December 24, 2017 but I'm not planning to switch to its successor, PhpSpreadsheet, until we run into some specific motivator.

epriestley claimed this task.

That's as far as I plan to take this for now. I think the additional features described above (filter, reorder, reformat, and broader field and application support) are all reasonable if use cases arise, but am not currently aware of use cases. We can return to that stuff when use cases arise.