Page MenuHomePhabricator

Survive import of artificially manufactured Git commits with 64-bit timestamps
Closed, ResolvedPublic

Description

A recent issue with importing a repository (discussed elsewhere) boiled down to Phabricator failing to import artificial commits with 64-bit timestamps, corresponding to an authorship date in 40,000 AD. The script generating these commits was likely writing milliseconds-since-epoch instead of seconds-since-epoch into the date fields.

These timestamps do not fit into a Phabricator "epoch" column (currently uint32) and MySQL fails when inserting them with a "value out of range" error:

(AphrontQueryException) #1264: Out of range value for column 'epoch' at row 1

Specifically, the %ct for these commits was approximately 1000x larger than the current timestamp:

$ git log --format='%H %ct'
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1470000000000
# Current Epoch Timestamp:               1470000000

git itself parses and display these commits, but with a date in 40,000 AD, interpreting the timestamp as seconds-since-epoch.

One way we could probably fix this is by changing epoch to uint64 internally. This is a small change in a technical sense, but might have far-reaching implications, and not a change we otherwise need to make until approximately 90 years from now (circa 2106 AD).

These timestamps are also incorrect (the commits were not committed in 40,000 AD in any sense, even a mythological or narrative sense, and the values are purely mistaken) and appear to be impossible to create with user-level git commands. In particular, using --date to set a commit date does not appear to allow you to choose a date beyond 2099 AD.

We can alternatively clamp these dates into the uint32 range, e.g. by replacing them with the current timestamp. This is similar to the behavior of git commit --date, which replaces dates beyond 2100 AD with the current date.

At least for now, I am inclined to clamp the dates. It would be relatively easy to unclamp them later if/when we do move to 64-bit epoch timestamps.

Event Timeline

After some effort, I'm unable to figure out how to generate these commits using vanilla git. I can't figure out how get them with --date or GIT_COMMITTER_DATE to git commit: dates after 2100 are treated like the current date.

Using raw git commit-tree plumbing does not seem to work either:

$ echo test | GIT_COMMITTER_DATE="Fri Jan 1 03:33:58 2100 -0800" git commit-tree 16320dd390300c79434e94028db093db8c809cea
fatal: invalid date format: Fri Jan 1 03:33:58 2100 -0800

The 2100-04-07T22:13:13 and 1470000000000 -800 forms from the Git documentation don't work for me either.

So I suspect the generating script was not a wrapper on git, although maybe I can get this to work with git hash-object?

That does work:

$ git cat-file commit HEAD > message.original
$ nano message.original # Adjust timestamps to crazy values.
$ git hash-object -t commit --stdin -w < message.original
23c3584bd6030c4f8b9ad9ce5419d218990b75a5
$ git reset --hard 23c3584bd6030c4f8b9ad9ce5419d218990b75a5
HEAD is now at 23c3584 Time travel! Spooky!
$ git show
commit 23c3584bd6030c4f8b9ad9ce5419d218990b75a5
Author: epriestley <git@epriestley.com>
Date:   Mon Jul 22 06:20:00 48622 -0700

    Time travel! Spooky!

diff --git a/future.txt b/future.txt
new file mode 100644
index 0000000..5a8dff0
--- /dev/null
+++ b/future.txt
@@ -0,0 +1 @@
+This commit is from 40,000 AD!

Note authorship date in 48,622 AD.

This commit reproduces the issue:

https://github.com/epriestley/poems/commit/23c3584bd6030c4f8b9ad9ce5419d218990b75a5

To its credit, GitHub displays this "correctly" (timestamp in 48622 AD).

Perhaps a better solution is to introduce an epoch64 datatype and store these timestamps faithfully, since they're at least sort of possible to write and do have a relatively unambiguous meaning. Introducing a new type instead of changing the meaning of the existing epoch would let us accommodate this without requiring tons and tons of migrations or exposing us to a lot of weird fallout.

Introducing an epoch64 type ends up being pretty messy:

  • We use the timestamp in several places, including the potentially huge pathchange table.
  • When we generate feed stories and transactions we use the timestamp too, so we pretty much need to change every epoch at once and can't selectively move just the commit epochs to epoch64: we only get a little bit further into the process.

Changing epoch to be a 64-bit type triggers 568 column adjustments from bin/storage adjust. I suspect this would work, but I'm really hesitant to force every install through that today just because one install is writing weird future commits with deep plumbing commands.