Slow Response

Incident Report for Buildkite

Postmortem

This was one of five incidents with a common root cause. The post mortem is available here.

Posted Jun 23, 2022 - 07:32 UTC

Resolved

We have been advised by AWS that the root cause has been fixed and performance has returned to normal. This incident is now resolved.
Posted Apr 22, 2022 - 04:52 UTC

Monitoring

We are seeing performance return to normal and seeing builds running again. We are confirming the root cause has been fixed.
Posted Apr 22, 2022 - 04:19 UTC

Update

We are in constant communication with AWS. The internal service team has identified the issue is with EBS storage and is working as fast as possible to resolve the issue.
Posted Apr 22, 2022 - 03:38 UTC

Update

We are continuing to work with the vendor to mitigate the issue. We are also continuing to investigate other mitigation options.
Posted Apr 22, 2022 - 03:01 UTC

Update

We are continuing to work with the vendor to mitigate the issue. We are also continuing to investigate other mitigation options.
Posted Apr 22, 2022 - 02:24 UTC

Update

We are continuing to work with the vendor and have been escalated through to the team responsible. We are also pursuing other options to mitigate the problem.
Posted Apr 22, 2022 - 01:53 UTC

Update

We are continuing to investigate the underlying storage performance with the vendor. We are observing that job dispatch is succeeding but is delayed.
Posted Apr 22, 2022 - 01:23 UTC

Identified

The problem has been identified on the underlying storage performance and we are confirming with the vendor and working on mitigations
Posted Apr 22, 2022 - 01:03 UTC

Investigating

We are currently investigating this issue.
Posted Apr 22, 2022 - 00:48 UTC
This incident affected: Web, Agent API, REST API, and Job Queue.