The Phacility cluster experienced a major service disruption which affected some instances for about 12 hours between 8:30 PM PDT April 20, 2016 and 8:30 AM PDT April 21, 2016. This task details our response to this issue.
Incident Summary
See below for a more detailed timeline.
Time | Since Previous | Event |
---|---|---|
April 20, 8:30 PM PDT | - | db003 begins to falter. 20% of instances likely began to see disruptions shortly after this. |
April 20, 11:50 PM PDT | 3 hours, 20 minutes | web001 hangs and is removed from the loadbalancer. All instances now experience a total web service disruption. |
April 21, 5:55 AM PDT | 6 hours, 5 minutes | Initial operational response. |
April 21, 6:02 AM PDT | 7 minutes | 80% of instances restored to service. |
April 21, 7:05 AM PDT | 1 hour, 3 minutes | 100% of instances restored to service. |
Instance Credit
We've issued all instances a week of service credit for this disruption. This will be reflected on your next invoice, or over the course of the next two invoices if you receive an invoice in the next week.