Job log API elevated error responses
Incident Report for Buildkite
Resolved
Agent API requests for job logs continue to be served correctly, and all systems are stable.
Posted Feb 10, 2019 - 04:22 UTC
Monitoring
One of the job log storage databases experienced an unplanned failover, and as a result some Agent API requests for job log storage failed between the period 03:05 UTC and 03:10 UTC.

Buildkite Agents > v3.8.3 will have continued to retry posting their job logs to the Agent API, but jobs running on earlier agents may have truncated job logs for jobs run between 03:05 UTC and 03:10 UTC. We recommend upgrading to Buildkite Agent v3.8.3 or above, which provides improved retry behaviour in the case of job log API problems.
Posted Feb 10, 2019 - 03:47 UTC
Identified
An elevated error response rate was detected from our Job Log Agent API endpoint, and has since recovered. An automatic failover has already taken place, and responses have returned to normal, but we'll continue to investigate the underlying cause and continue to monitor the related systems.
Posted Feb 10, 2019 - 03:39 UTC
This incident affected: Agent API.