On 29th November 2021 at 05:06 UTC a deployment caused degraded performance of the agent artifact API endpoints. This degradation led to high latency and timeout errors across Buildkite. A revert was applied and regular performance was restored by 05:19 UTC.
The deployment included a database migration which changed the performance of some key queries powering the agent artifacts API. These changes had been tested extensively including sampling in production, but unfortunately still had some unexpected side effects when applied to the whole production workload. We’ll be doing more extensive testing before re-deploying this in a weekend maintenance window to minimise the risk of any further disruption.