Between approximately 18:00 UTC and 22:50 UTC February 26, 2026, a subset of customers experienced increased latency when dispatching jobs to agents. Affected customers observed agents sitting idle for several minutes despite having matching jobs waiting in the queue. Job dispatch eventually succeeded, but with significantly elevated latency. The impact was concentrated on specific database shards but affected customers across multiple shards over the course of the incident.
A database maintenance task designed to improve job ordering performance was running across all production database shards. However, this task was itself contributing significant database load, which impacted normal job dispatch and pipeline upload operations. This increased database load caused dispatch operations to queue up, resulting in the observed delays in matching jobs to agents.
The issue was compounded by a connection pooling service having several containers running on underperforming infrastructure, which reduced the available database throughput.
Contributing factors: