Elevated Error Rates on Agent API
Incident Report for Buildkite

On Monday, 3rd September at 01:37 UTC the Buildkite Agent API experienced elevated error rates. This was due to high load on our backend RDS database caused by high load, data migrations, and a vacuum process. The migration and vacuum processes were cancelled and service was restored by 01:39 UTC. Agents retry behaviour should have handled these failures without data loss. Some build pipeline uploads may have failed and required retry.

These vacuums have been rescheduled to run on weekends during quiet periods, and the data migrations have been modified to pause more to allow normal operations.

Posted 10 months ago. Sep 03, 2018 - 16:39 AEST

This incident has been resolved.
Posted 10 months ago. Sep 03, 2018 - 13:38 AEST
Agent API error rates have returned to normal.
Posted 10 months ago. Sep 03, 2018 - 12:06 AEST
We're seeing elevated error rates on the Agent API, investigating.
Posted 10 months ago. Sep 03, 2018 - 11:55 AEST
This incident affected: Agent API.