Plans: Data Exporters (Excel, JSON, CSV, etc.)
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	epriestley
	Jan 26 2018, 4:57 PM

Description

For previous work, see T5954.

After T13046, the state of affairs is:

Phabricator supports modular data export as part of core infrastructure.
It is implemented by PullLog and User.
- After work here, by Task and UserLog and PushLog.
~~Maniphest still has a legacy, hard-coded version of this which exports to Excel in a non-modular way.~~
- After work here, Maniphest is now modern.

Completed changes:

Support Excel: Currently, CSV, JSON, and text are supported. Elsewhere, in Maniphest, we support building Excel sheets. An Excel exporter should be added to the new infrastructure (e.g., PhabricatorExcelExportFormat).

Upgrade Maniphest: Replace Maniphest's export process with the core infrastructure process. Remove ManiphestExcelFormat and ManiphestExcelDefaultFormat. This will break a small number of third-parties. I believe third-party exporters were almost exclusively used to hack in custom field support (below), though.

Modularize More Fields: The id and phid fields are currently implemented by each SearchEngine, but the base class could implement them.

Extensions (like Custom Fields, maybe Flags, Subscribers, Projects, etc) should be able to add new field definitions and export data. See T5391 for Custom Fields specifically.

This would start by defining an ExportEngineExtension base class, then implementing CustomFieldExportEngineExtension, etc. Make the base SearchEngine build extensions and call them to add more fields and data while exporting.

Column Headers/Labels: JSON is self-labeling but CSV, text, and Excel could use column headers. Before we pass the field data to the exporter, we can pass just the field list (addHeaders($fields)). The JSON format can return immediately while other formats can build a header row.

Background file generation: For very large result sets, foreground generation of export formats will fail (either hitting memory limits or time limits). We can test if we're going to examine more than, say, 1K rows, and queue a background job (like the Bulk jobs) instead of generating these large sheets in the foreground. The current stuff is written with a soft assumption that we'll pursue this eventually (e.g., you can stream 100 results at a time to the exporters and don't need to load them all into memory at once).

More speculative/future stuff, some of which may not have much value. Generally, I think we should probably wait for users to hit use cases for most of this stuff before pursuing it.

Let users reorder and filter columns: If an object exports columns X, Y, and Z, let the user select that they only want X and Z in order "Z, X". The APIs are currently built with a soft assumption that we might pursue this eventually. On the other hand, you can do this yourself in Excel and it's probably quite a lot of work to do in the web UI (since we'd realistically need to let you edit/save/share these configurations -- there's no time savings if you have to go uncheck 250 checkboxes every time you click the "Export Data" button).

Let users reformat fields: If we support filtering/reordering fields, we might also support selecting a format for fields. For example, for a date field, you might choose to export it as an epoch timestamp, or as YYYY-MM-DD, or as Saturday, XXth of YYYY Year of Our Lord, or whatever else. As above, you can do this in Excel yourself, so the value isn't clear.

Support more applications: Adding support for more applications is fairly easy, but it's not clear that exporting most applications to CSV is particularly useful.

Revisions and Commits

rP Phabricator
	D18973	rP1e3d1271ada0 Make push log "flags", "reject code" human readable; add crumbs to pull/push…
	D18972	rPff98f6f522b1 Make the remote address rules for Settings > Activity Logs more consistent
	D18971	rP8a2863e3f7f8 Change the "can see remote address?" policy to "is administrator?" everywhere
	D18970	rP75bc86589f10 Add date range filtering for activity, push, and pull logs
	D18969	rP0d5379ee1778 Fix an export bug where queries specified in the URI ("?param=value") were…
	D18968	rP5b22412f246e Support data export on push logs
	D18967	rPa5b8be0316ce Support export of user activity logs
	D18966	rP91108cf83826 Upgrade user account activity logs to modern construction
	D18965	rPb27fd05eef0a Add a `bin/bulk export` CLI tool to make debugging and profiling large exports…
	D18962	rP84df1220858f When exporting more than 1,000 records, export in the background
	D18961	rPea58b6aceae3 Remove the old, non-modular Excel export workflow from Maniphest
	D18960	rPc00838878a2d Implement common infrastructure fields as export extensions
	D18959	rP2ac4e1991b4d Support new data export infrastructure in Maniphest
	D18958	rP00b4eae1f4a8 When PHPExcel is not installed, detect it and provide install instructions
	D18957	rP61b8c12970be Make the data export format selector remember your last setting
	D18956	rP5b61f863fdc1 Organize the export code into subdirectories
	D18955	rP040927959546 Support Excel as a data export format
	D18954	rPa067f64ebb32 Support export engine extensions and implement an extension for custom fields
	D18953	rP8b8a3142b310 Support export of data in files larger than 8MB
	D18952	rP0de6210808d7 Give data exporters a header row
	D18951	rP213eb8e93de5 Define common ID and PHID export fields in SearchEngine

Related Objects

Mentioned In: D18958: When PHPExcel is not installed, detect it and provide install instructions
T6187: Allow query results to be exported to spreadsheet rendered in the browser
T12800: When Excel opens a CSV file, it just runs whatever arbitrary code might be in the file
T3461: Improve documentation around Maniphest Excel exports
T5391: Support custom fields in 'Export to Excel'
2018 Week 4 (Late January)
T13046: Surface repository pull logs in the web UI
T5954: Modularize "Export to Excel" as a feature of ApplicationSearch
Mentioned Here: T5391: Support custom fields in 'Export to Excel'
T5954: Modularize "Export to Excel" as a feature of ApplicationSearch
T13046: Surface repository pull logs in the web UI

Event Timeline

epriestley triaged this task as Normal priority.Jan 26 2018, 4:57 PM

epriestley created this task.

Herald added a subscriber: eadler. · View Herald TranscriptJan 26 2018, 4:57 PM

epriestley mentioned this in T5954: Modularize "Export to Excel" as a feature of ApplicationSearch.Jan 26 2018, 4:59 PM

tkriener added a subscriber: tkriener.Jan 26 2018, 5:03 PM

epriestley mentioned this in T13046: Surface repository pull logs in the web UI.Jan 26 2018, 5:57 PM

epriestley mentioned this in 2018 Week 4 (Late January).Jan 26 2018, 11:23 PM

See PHI323.

These exports can be >4MB, so the piece that shoves the data into Files should support chunked storage.
The install wants to export enough data to hit reasonable generation limits, and we should support backgrounding larger result sets.
They're also interested in push logs and user activity logs.

See PHI324.

An install wants custom fields and "Author" in the Maniphest export, motiving the Maniphest changes.

epriestley added a revision: D18951: Define common ID and PHID export fields in SearchEngine.Jan 29 2018, 2:28 AM

epriestley added a revision: D18952: Give data exporters a header row.Jan 29 2018, 2:35 AM

epriestley added a revision: D18953: Support export of data in files larger than 8MB.Jan 29 2018, 3:04 AM

epriestley added a revision: D18954: Support export engine extensions and implement an extension for custom fields.Jan 29 2018, 5:28 AM

mbishopim3 added a subscriber: mbishopim3.Jan 29 2018, 2:23 PM

epriestley added a revision: D18955: Support Excel as a data export format.Jan 29 2018, 3:22 PM

epriestley added a revision: D18956: Organize the export code into subdirectories.Jan 29 2018, 3:26 PM

epriestley added a revision: D18957: Make the data export format selector remember your last setting.Jan 29 2018, 3:35 PM

It looks like PHPExcel was formally deprecated on December 24, 2017 but I'm not planning to switch to its successor, PhpSpreadsheet, until we run into some specific motivator.

epriestley added a revision: D18958: When PHPExcel is not installed, detect it and provide install instructions.Jan 29 2018, 3:57 PM

epriestley added a revision: D18959: Support new data export infrastructure in Maniphest.Jan 29 2018, 4:25 PM

epriestley added a revision: D18960: Implement common infrastructure fields as export extensions.Jan 29 2018, 4:52 PM

epriestley added a revision: D18961: Remove the old, non-modular Excel export workflow from Maniphest.Jan 29 2018, 4:59 PM

epriestley added a revision: D18962: When exporting more than 1,000 records, export in the background.Jan 29 2018, 7:31 PM

epriestley mentioned this in T5391: Support custom fields in 'Export to Excel'.Jan 29 2018, 11:08 PM

epriestley updated the task description. (Show Details)Jan 29 2018, 11:10 PM

epriestley added a commit: rP213eb8e93de5: Define common ID and PHID export fields in SearchEngine.Jan 29 2018, 11:17 PM

epriestley added a commit: rP0de6210808d7: Give data exporters a header row.

epriestley added a commit: rP8b8a3142b310: Support export of data in files larger than 8MB.Jan 29 2018, 11:58 PM

epriestley added a commit: rPa067f64ebb32: Support export engine extensions and implement an extension for custom fields.

epriestley added a commit: rP040927959546: Support Excel as a data export format.Jan 30 2018, 12:00 AM

epriestley added a commit: rP5b61f863fdc1: Organize the export code into subdirectories.

epriestley added a commit: rP61b8c12970be: Make the data export format selector remember your last setting.

epriestley added a commit: rP00b4eae1f4a8: When PHPExcel is not installed, detect it and provide install instructions.Jan 30 2018, 12:03 AM

epriestley added a commit: rP2ac4e1991b4d: Support new data export infrastructure in Maniphest.

epriestley added a commit: rPc00838878a2d: Implement common infrastructure fields as export extensions.

epriestley added a commit: rPea58b6aceae3: Remove the old, non-modular Excel export workflow from Maniphest.Jan 30 2018, 12:06 AM

epriestley added a commit: rP84df1220858f: When exporting more than 1,000 records, export in the background.

epriestley added a revision: D18965: Add a `bin/bulk export` CLI tool to make debugging and profiling large exports easier.Jan 30 2018, 1:54 PM

epriestley added a revision: D18966: Upgrade user account activity logs to modern construction.Jan 30 2018, 2:12 PM

epriestley added a revision: D18967: Support export of user activity logs.Jan 30 2018, 3:06 PM

epriestley added a revision: D18968: Support data export on push logs.Jan 30 2018, 3:59 PM

epriestley added a revision: D18969: Fix an export bug where queries specified in the URI ("?param=value") were ignored when filtering the result set.Jan 30 2018, 4:06 PM

amckinley added a subscriber: amckinley.Jan 30 2018, 6:57 PM

epriestley added a commit: rPb27fd05eef0a: Add a `bin/bulk export` CLI tool to make debugging and profiling large exports….Jan 30 2018, 7:11 PM

epriestley added a commit: rP91108cf83826: Upgrade user account activity logs to modern construction.

epriestley added a commit: rPa5b8be0316ce: Support export of user activity logs.

epriestley added a commit: rP5b22412f246e: Support data export on push logs.Jan 30 2018, 7:19 PM

epriestley added a commit: rP0d5379ee1778: Fix an export bug where queries specified in the URI ("?param=value") were….

epriestley added a revision: D18970: Add date range filtering for activity, push, and pull logs.Jan 30 2018, 7:52 PM

epriestley added a revision: D18971: Change the "can see remote address?" policy to "is administrator?" everywhere.Jan 30 2018, 8:07 PM

epriestley added a revision: D18972: Make the remote address rules for Settings > Activity Logs more consistent.Jan 30 2018, 8:16 PM

epriestley updated the task description. (Show Details)Jan 30 2018, 8:38 PM

epriestley added a revision: D18973: Make push log "flags", "reject code" human readable; add crumbs to pull/push logs.Jan 30 2018, 9:35 PM

epriestley added a commit: rP75bc86589f10: Add date range filtering for activity, push, and pull logs.Jan 30 2018, 11:36 PM

epriestley added a commit: rP8a2863e3f7f8: Change the "can see remote address?" policy to "is administrator?" everywhere.Jan 30 2018, 11:45 PM

epriestley added a commit: rPff98f6f522b1: Make the remote address rules for Settings > Activity Logs more consistent.

epriestley added a commit: rP1e3d1271ada0: Make push log "flags", "reject code" human readable; add crumbs to pull/push….

That's as far as I plan to take this for now. I think the additional features described above (filter, reorder, reformat, and broader field and application support) are all reasonable if use cases arise, but am not currently aware of use cases. We can return to that stuff when use cases arise.

epriestley mentioned this in T3461: Improve documentation around Maniphest Excel exports.Jan 30 2018, 11:49 PM

epriestley mentioned this in T12800: When Excel opens a CSV file, it just runs whatever arbitrary code might be in the file.Jan 30 2018, 11:54 PM

epriestley mentioned this in T6187: Allow query results to be exported to spreadsheet rendered in the browser.Jan 31 2018, 12:38 AM

epriestley mentioned this in D18958: When PHPExcel is not installed, detect it and provide install instructions.Feb 6 2018, 6:25 PM

epriestley added a project: Plans.Feb 25 2018, 3:28 PM

Plans: Data Exporters (Excel, JSON, CSV, etc.)Closed, ResolvedPublicActions

Description

Revisions and Commits

Related Objects

Event Timeline

Plans: Data Exporters (Excel, JSON, CSV, etc.)
Closed, ResolvedPublic
Actions