Increased error rates from Test Plan API

Incident Report for Buildkite

Resolved

Our mitigations have resolved the elevated latency and likelihood of suboptimal fallback test plans. We have also identified and fixed a blind-spot in our automated alerting, which was previously unable to detect this scenario as an issue. Work continues this week to resolve the underlying performance issue by restructuring how the relevant data is ingested and accessed.
Posted Mar 10, 2026 - 09:34 UTC

Monitoring

We have implemented several mitigation and continue working on fixing the underlying cause. Our team is actively monitoring the situation to ensure the stability. We will provide further updates as we make progress on resolving this issue.
Posted Mar 10, 2026 - 02:25 UTC

Investigating

We've observed periodic test splitting plan timing out and falling back to non-intelligent splitting. Performance appears to be back to normal as of an hour ago. We are continuing to investigate the root cause and solve the underlying issue.
Posted Mar 10, 2026 - 01:21 UTC
This incident affected: Test Engine (REST API).