Increased Agent API error rate and latency
Incident Report for Buildkite
Postmortem

On 29th November 2021 at 05:06 UTC a deployment caused degraded performance of the agent artifact API endpoints. This degradation led to high latency and timeout errors across Buildkite. A revert was applied and regular performance was restored by 05:19 UTC.

The deployment included a database migration which changed the performance of some key queries powering the agent artifacts API. These changes had been tested extensively including sampling in production, but unfortunately still had some unexpected side effects when applied to the whole production workload. We’ll be doing more extensive testing before re-deploying this in a weekend maintenance window to minimise the risk of any further disruption.

Posted Nov 29, 2021 - 06:20 UTC

Resolved
This incident has been resolved.
Posted Nov 29, 2021 - 05:23 UTC
Monitoring
We've identified the issue as an in-progress migration on one of our database tables and have reverted the migration.
Posted Nov 29, 2021 - 05:21 UTC
Investigating
We are currently investigating this issue.
Posted Nov 29, 2021 - 05:17 UTC
This incident affected: Agent API.