Elevated Processing Times
Incident Report for Buildkite
Resolved
The system has stabilised and we continue to work on more long term mitigations to latency issues we've been experiencing over the past few days.
Posted Feb 02, 2023 - 04:33 UTC
Monitoring
Our systems are operating at normal levels, we continue to monitor performance.
Posted Feb 02, 2023 - 01:53 UTC
Update
We saw increased load create increased notification latency. Our mitigations continue to allow Job Dispatch and Web UI latency down to within acceptable levels.

We are continuing to work on ways to improve notification latency.
Posted Feb 02, 2023 - 01:06 UTC
Identified
Notifications are delayed and we're investigating this.
Posted Feb 02, 2023 - 00:33 UTC
Monitoring
Job dispatch and the web interface are back to normal. While we continue to see delays to outbound notifications of up to 6 minutes, latency is improving. We understand this has some impact on customers and we continue to work on longer term mitigations to this ongoing issue.
Posted Feb 02, 2023 - 00:13 UTC
Update
We have seen job dispatch stabilise down to within our SLA, but continue to have higher latency than SLA for notifications.

We continue to investigate how to improve notification latency
Posted Feb 01, 2023 - 23:27 UTC
Update
We have deployed a change to prioritize job dispatch over notifications (i.e. commit statuses). The impact of this is that customers will see commit statuses delayed by up to 30 minutes. This is a once-off impact and notifications latency is expected to return to normal after the initial backlog has been processed.
Posted Feb 01, 2023 - 23:03 UTC
Update
We continue to take steps to stabilize Job Dispatch and we hope to have those changes implemented in the next 90 minutes

Notifications (including commit statuses) will continue to be delayed and may get worse as we limit their load on the system in order to prioritize job dispatch. We continue to work on stabilizing system load and will provide an ETA when available.
Posted Feb 01, 2023 - 22:14 UTC
Update
We continue to investigate database load and work on ways to reduce the impact to Job Dispatch primarily.
We have multiple streams of work going on to improve Job Dispatch as our first priority
Posted Feb 01, 2023 - 21:55 UTC
Identified
We are investigating latency spikes across many of our asynchronous processing queues which is causing slowness in notifications and job assignments. We have identified an issue with database load and we continue to investigate while taking steps to mitigate database load and keep the system stable.
Posted Feb 01, 2023 - 21:30 UTC
Investigating
We are investigating reports of sluggish UI and latency in assigning jobs to agents
Posted Feb 01, 2023 - 20:50 UTC
This incident affected: Web, Agent API and Notifications (GitHub Commit Status Notifications, Email Notifications, Slack Notifications, Webhook Notifications).