AWS us-east-1 single availability zone outage

Incident Report for Buildkite

Postmortem

Service impact

On 8th May 2026 UTC, between 00:00 and 07:30 UTC, some customers would have seen intermittent errors and latency spikes across many areas of the platform.

Incident Summary

The AWS availability zone incident in use1-az4 triggered our automatic availability failover mechanisms on database and cache clusters, as per AZ-failure tolerant design. During the failover we saw some isolated request errors that were handled by client-side retries in the agent. Customer workloads were either entirely undisrupted or in the worst case saw elevated latency for a period of up to 5 minutes.

Throughout the incident we monitored customer impact and prepared additional resources in a healthy availability zone to manually failover to if the automated systems proved insufficient. These were not necessary, and all our infrastructure self healed.

Posted May 13, 2026 - 05:08 UTC

Resolved

The upstream AWS incident in us-east-1 has been resolved by AWS, and all Buildkite services are operating normally. No further customer impact is expected. We appreciate your patience during this incident.

Posted May 09, 2026 - 04:34 UTC

Monitoring

Despite the ongoing AWS incident, our own services are now stable. We are continuing to monitor our services closely, and are ready for further action should the need arise. We are also watching AWS services closely as they recover.

Posted May 08, 2026 - 08:07 UTC

Update

We are continuing to move infrastructure resources out of the affected AWS Availability Zone. Brief latency and error blips may continue while these manual failovers occur.

(Apologies if you receive duplicated notifications for this update.)

Posted May 08, 2026 - 07:22 UTC

Update

We are continuing to move infrastructure resources out of the affected AWS Availability Zone. Brief latency and error blips may continue while these manual failovers occur.

Posted May 08, 2026 - 07:04 UTC

Update

We are continuing to move infrastructure resources out of the affected AWS Availability Zone. Brief latency and error blips will unfortunately continue while these manual failovers occur.

Posted May 08, 2026 - 05:45 UTC

Update

We are actively moving resources out of us-east-1c. Similar brief latency and error blips will be visible to customers while these manual failovers occur.

Posted May 08, 2026 - 05:10 UTC

Update

We have provisioned additional capacity in unaffected availability zones so that they are able to support the additional load. Automatic failovers continue to occur where necessary.

Some latency and transient errors will be visible to customers.

Posted May 08, 2026 - 04:08 UTC

Update

We are continuing to actively monitor the impacts of this availability zone outage for Buildkite customers. Some transient errors are visible due to availability zone failover events.

Posted May 08, 2026 - 02:35 UTC

Update

A small subset of our customers are experiencing delayed notifications. We are actively provisioning additional capacity for these customers.

Availability zone automatic failovers are occurring in response to the outage, and this is causing some brief error blips for some customers.

Posted May 08, 2026 - 01:50 UTC

Investigating

We're aware that AWS is reporting availability zone failures in us-east-1. We are monitoring the situation but so far there is no customer impact.

Posted May 08, 2026 - 01:12 UTC