Delayed job dispatches
Incident Report for Buildkite
Postmortem

On Saturday, 18th May at 4am AEST (UTC+10) we received a high database load alert. Investigations revealed our process for assigning jobs to agents in batches (dispatching) was delayed due to the elevated database load. This load was caused by some very particular pipeline operations triggering far more background work than was required in a way that we’d previously not identified resulting in high database load that delayed the job dispatching process system-wide.

We intervened to manage the background work, and once the background work was completed job dispatch became speedy again.

Since then, we’ve corrected the scope of the background work that caused the elevated load and have implemented some improvements to our monitoring based on the symptoms we saw.

Posted May 23, 2019 - 02:01 UTC

Resolved
This incident has been resolved.
Posted May 17, 2019 - 20:10 UTC
Monitoring
An unusual request pattern created a backlog of high-intensity database operations; the queue has now returned to normal and dispatches are following
Posted May 17, 2019 - 19:33 UTC
Investigating
Monitoring has detected delays in job dispatching
Posted May 17, 2019 - 18:01 UTC
This incident affected: Job Queue.