Elevated error rates

Incident Report for Buildkite

Resolved

Latency has returned to normal levels and the issue is now resolved
Posted Feb 28, 2023 - 22:54 UTC

Update

We continue to see latency return to normal levels and continue to monitor the affected services
Posted Feb 28, 2023 - 22:22 UTC

Monitoring

We are seeing latency return to normal levels and continue to monitor the affected services
Posted Feb 28, 2023 - 21:51 UTC

Update

Latency has improved but remains elevated. We are continuing to manage load while we investigate performance issues.
Posted Feb 28, 2023 - 21:35 UTC

Update

A customer was automatically generating a high volume of builds in error. We have cancelled those builds at that customer's request and have shipped a change to allow us to rate limit new builds on a per-customer basis, which we have enabled. This new rate limit is only applied to a single customer.


Job dispatch is still significantly delayed, and we are actively managing capacity to restore service to normal levels.
Posted Feb 28, 2023 - 21:10 UTC

Update

A customer was automatically generating a high volume of builds in error. We have cancelled those builds at that customer's request and have shipped a change to allow us to rate limit new builds on a per-customer basis, which we have enabled. New build requests exceeding the rate limit will be served an HTTP 429 error.

Job dispatch is still significantly delayed, and we are actively managing capacity to restore service to normal levels.
Posted Feb 28, 2023 - 21:04 UTC

Update

We continue to investigate possible mitigations for the issue. We are implementing additional controls to reduce the amount of work in the system
Posted Feb 28, 2023 - 19:55 UTC

Update

We continue to investigate possible mitigations for the issue. We are implementing additional controls to reduce the amount of work in the system
Posted Feb 28, 2023 - 19:23 UTC

Update

We are activating additional controls to reduce the amount of work in the system.
Posted Feb 28, 2023 - 18:44 UTC

Identified

We identified the high latency's cause and we are currently working on reducing the load.
Posted Feb 28, 2023 - 18:29 UTC

Update

We are continuing to investigate the issue. We are observing an elevated error rate
Posted Feb 28, 2023 - 18:09 UTC

Investigating

We are currently investigating this issue.
Posted Feb 28, 2023 - 17:54 UTC
This incident affected: Web, Agent API, and REST API.