Page MenuHomePhabricator

S3 storage completely broken after 10 jan 2016
Closed, InvalidPublic

Description

Hi, we've just tried to upgrade phabricator to the latest git master (and stable). This broke S3 storage completely. Previously it was working perfectly.

I added the new settings:

  • "amazon-s3.endpoint": "s3-eu-west-1.amazonaws.com",
  • "amazon-s3.region": "eu-west-1"

as required for the new 'v4' s3 client.

Could this be some difference with eu-west-1? (I notice the committed tests are all against us-east-1)

In the meantime, we've downgraded to the commits immediately prior to this in phabricator and libphutil, and we're working again for the moment.

After this, all s3 requests from phabricator raise errors as follows in the logs, with similar appearing in the UI.

Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: [2016-01-25 15:50:11] EXCEPTION: (PhutilAWSException) AWS Request Failed

HTTP Status Code: 400
Response Body: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>XAmzContentSHA256Mismatch</Code><Message>The provided 'x-amz-content-sha256' header does not match what was computed.</Message><ClientComputedContentSHA256>69ddc87ce75b40364c899be

Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: arcanist(head=master, ref.master=57f6fb59d739), phabricator(head=master, ref.master=1e69f06d7447), phutil(head=adb8a9c074ba41b9566d340db4e3599bcf097958, ref.master=f43291e99d36
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #0 <#2> PhutilAWSFuture::didReceiveResult(array) called at [<phutil>/src/future/aws/PhutilAWSS3Future.php:63]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #1 <#2> PhutilAWSS3Future::didReceiveResult(array) called at [<phutil>/src/future/FutureProxy.php:58]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #2 <#2> FutureProxy::getResult() called at [<phutil>/src/future/FutureProxy.php:35]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #3 <#2> FutureProxy::resolve() called at [<phabricator>/src/applications/files/engine/PhabricatorS3FileStorageEngine.php:80]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #4 <#2> PhabricatorS3FileStorageEngine::writeFile(string, array) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:433]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #5 <#2> PhabricatorFile::writeToEngine(PhabricatorS3FileStorageEngine, string, array) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:330]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #6 phlog(PhutilAWSException) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:340]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #7 PhabricatorFile::buildFromFileData(string, array) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:391]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #8 PhabricatorFile::newFromFileData(string, array) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:1010]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #9 PhabricatorFile::loadBuiltins(PhabricatorUser, array) called at [<phabricator>/src/applications/files/storage/PhabricatorFile.php:1041]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #10 PhabricatorFile::loadBuiltin(PhabricatorUser, string) called at [<phabricator>/src/applications/project/query/PhabricatorProjectQuery.php:348]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #11 PhabricatorProjectQuery::didFilterPage(array) called at [<phabricator>/src/infrastructure/query/policy/PhabricatorPolicyAwareQuery.php:259]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #12 PhabricatorPolicyAwareQuery::execute() called at [<phabricator>/src/applications/phid/type/PhabricatorPHIDType.php:91]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #13 PhabricatorPHIDType::loadObjects(PhabricatorObjectQuery, array) called at [<phabricator>/src/applications/phid/query/PhabricatorObjectQuery.php:143]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #14 PhabricatorObjectQuery::loadObjectsByPHID(array, array) called at [<phabricator>/src/applications/phid/query/PhabricatorObjectQuery.php:63]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #15 PhabricatorPolicyAwareQuery::execute() called at [<phabricator>/src/applications/phid/query/PhabricatorHandleQuery.php:46]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #16 PhabricatorHandleQuery::loadPage() called at [<phabricator>/src/infrastructure/query/policy/PhabricatorPolicyAwareQuery.php:227]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #17 PhabricatorPolicyAwareQuery::execute() called at [<phabricator>/src/applications/maniphest/view/ManiphestTaskListView.php:154]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #18 ManiphestTaskListView::loadTaskHandles(PhabricatorUser, array) called at [<phabricator>/src/applications/maniphest/view/ManiphestTaskResultListView.php:50]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #19 phutil_escape_html(array) called at [<phutil>/src/markup/render.php:157]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #20 phutil_tag(string, array, array) called at [<phabricator>/src/infrastructure/javelin/markup.php:53]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #21 javelin_tag(string, array, array) called at [<phabricator>/src/view/AphrontTagView.php:152]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #22 AphrontTagView::render() called at [<phabricator>/src/view/AphrontView.php:176]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #23 AphrontView::producePhutilSafeHTML() called at [<phutil>/src/markup/render.php:133]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #24 phutil_escape_html(PHUIBoxView) called at [<phutil>/src/markup/render.php:139]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #25 phutil_escape_html(PHUIObjectBoxView) called at [<phutil>/src/markup/render.php:119]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #26 phutil_tag(string, array, PHUIObjectBoxView) called at [<phutil>/src/markup/render.php:126]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #27 phutil_tag_div(string, PHUIObjectBoxView) called at [<phabricator>/src/applications/dashboard/engine/PhabricatorDashboardPanelRenderingEngine.php:211]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #28 PhabricatorDashboardPanelRenderingEngine::renderPanelDiv(PhabricatorApplicationSearchResultView, NULL) called at [<phabricator>/src/applications/dashboard/engine/Phabrica
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #29 PhabricatorDashboardPanelRenderingEngine::renderNormalPanel(PhabricatorUser, PhabricatorDashboardPanel, PhabricatorDashboardPanelRenderingEngine) called at [<phabricator>
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #30 PhabricatorDashboardPanelRenderingEngine::renderPanel() called at [<phabricator>/src/applications/dashboard/controller/PhabricatorDashboardPanelRenderController.php:40]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #31 PhabricatorDashboardPanelRenderController::handleRequest(AphrontRequest) called at [<phabricator>/src/aphront/configuration/AphrontApplicationConfiguration.php:237]
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #32 AphrontApplicationConfiguration::processRequest(AphrontRequest, PhutilDeferredLog, AphrontPHPHTTPSink, MultimeterControl) called at [<phabricator>/src/aphront/configurati
Jan 25 15:50:11 phabricator php5-fpm[13698]: ool www[13698]: #33 AphrontApplicationConfiguration::runHTTPRequest(AphrontPHPHTTPSink) called at [<phabricator>/webroot/index.php:17]

Event Timeline

Can you show me the output of this command on your system?

$ php -r "echo hash_hmac('sha256', 'a', 'b');"

The expected output is:

$ php -r "echo hash_hmac('sha256', 'a', 'b');"
cb448b440c42ac8ad084fc8a8795c98f5b7802359c305eabd57ecdb20e248896

Hi, here you go:

root@phabricator:/home/ubuntu$ php -r "echo hash_hmac('sha256', 'a', 'b');"
cb448b440c42ac8ad084fc8a8795c98f5b7802359c305eabd57ecdb20e248896

root@phabricator:/home/ubuntu$

We're running ubuntu 15.10,
php5 5.6.11+dfsg-1ubuntu3.1 all server-side, HTML-embedded scripting language (metapackage)

Thanks. I'll bring up a eu-west-1 bucket and see if I can reproduce this.

Oh, we're authenticating using an accesskey/secretkey for an IAM user, if that makes a difference.

We are too; that's expected to work correctly.

I can't reproduce this. Before we move further, it looks like libphutil may be out of date?

phutil(head=adb8a9c074ba41b9566d340db4e3599bcf097958, ref.master=f43291e99d36

This suggests you have a libphutil from Jan 2 and a Phabricator from Jan 25. This isn't expected to work, because the v4 API wasn't implemented until Jan 9, so libphutil prior to Jan 9 would not be able to make properly formatted v4 API requests.

Broadly, here's a command string (with live, active credentials) which works for me to put data into an eu-west-1 bucket:

libphutil/ $ echo hello | ./bin/aws-s3 put --access-key AKIAIDLC4ZNYXLRM36FQ --secret-key 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 --region eu-west-1 --endpoint s3-eu-west-1.amazonaws.com --bucket phabricator-ireland --key hello.txt

If this isn't a version issue, does running that command exactly as written work on your system?

We had to downgrade libphutil+phabricator to get it working again, so they're currently on the commit immediately prior to the v4 stuff.

I'll upgrade to latest master and retry later this evening, when no one is using our phabricator.

It may also be useful to run arc unit --everything in libphutil/, as that will trigger a number of signature-related test cases.

https://secure.phabricator.com/diffusion/PHU/browse/master/src/future/aws/__tests__/PhutilAWSv4SignatureTestCase.php

This comment was removed by Adq.

Ok, I've reverted our prod system to last night's snapshots of mysql + root disk and it is back to working again (with zero changes in AWS).

I now have the dead system to play with I can get to the bottom of this. I've updated it to the latest master versions of each repo. Still seeing the same problem.

I tried the sample command line you sent, and it works from the server.

I'm logging into the dead system using "auth recover" as something seems to have broken our google auth on this system as well (I've deleted the comment I added about that to avoid confusing this issue.. lets get S3 working, then figure out google if its still a problem). Note that google auth is working fine on the restored system.

Has something changed with AWS permission requirements? Perhaps this is just aws reporting a permissions error in a really weird way? What perms does your test user have?

I will try running the test cases with arc tomorrow.

This is the policy on the IAM user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1453741324000",
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::phabricator-ireland/*"
            ]
        }
    ]
}

Hi, I've accidentally fixed it. I needed to copy some tickets across from the defunct system to our new live system, so I did:

./bin/config set storage.local-disk.path /home/phabricator/storage/
./bin/config delete storage.s3.bucket

as I needed to search for tickets and it uses storage to cache results (and s3 was borked).

Once I'd done that, I then restored the original (s3) configuration with:
./bin/config delete storage.local-disk.path
./bin/config set storage.s3.bucket our-phabricator-bucket

And now s3 is working properly! It must have cleared out some sort of internal state I guess.

Note that I restarted the daemons and rebooted the box several times and that didn't fix it either, only removing the setting and adding it again seemed to do the trick.

Anyway, s3 is now working fine for me on my test system.

epriestley claimed this task.

There is no cache mechanism which would be cleared by adjusting configuration that is not also cleared by restarting the webserver and daemons, so it sounds like this was caused by evil spirits.