Delayed job dispatches

Incident Report for Buildkite

Postmortem

On Saturday, 18th May at 4am AEST (UTC+10) we received a high database load alert. Investigations revealed our process for assigning jobs to agents in batches (dispatching) was delayed due to the elevated database load. This load was caused by some very particular pipeline operations triggering far more background work than was required in a way that we’d previously not identified resulting in high database load that delayed the job dispatching process system-wide.

We intervened to manage the background work, and once the background work was completed job dispatch became speedy again.

Since then, we’ve corrected the scope of the background work that caused the elevated load and have implemented some improvements to our monitoring based on the symptoms we saw.

Posted May 23, 2019 - 02:01 UTC

Resolved

This incident has been resolved.

Posted May 17, 2019 - 20:10 UTC

Monitoring

An unusual request pattern created a backlog of high-intensity database operations; the queue has now returned to normal and dispatches are following

Posted May 17, 2019 - 19:33 UTC

Investigating

Monitoring has detected delays in job dispatching

Posted May 17, 2019 - 18:01 UTC

This incident affected: Job Queue.