Job log API elevated error responses
Incident Report for Buildkite
Resolved
Agent API requests for job logs continue to be served correctly, and all systems are stable.
Posted 2 months ago. Feb 10, 2019 - 15:22 AEDT
Monitoring
One of the job log storage databases experienced an unplanned failover, and as a result some Agent API requests for job log storage failed between the period 03:05 UTC and 03:10 UTC.

Buildkite Agents > v3.8.3 will have continued to retry posting their job logs to the Agent API, but jobs running on earlier agents may have truncated job logs for jobs run between 03:05 UTC and 03:10 UTC. We recommend upgrading to Buildkite Agent v3.8.3 or above, which provides improved retry behaviour in the case of job log API problems.
Posted 2 months ago. Feb 10, 2019 - 14:47 AEDT
Identified
An elevated error response rate was detected from our Job Log Agent API endpoint, and has since recovered. An automatic failover has already taken place, and responses have returned to normal, but we'll continue to investigate the underlying cause and continue to monitor the related systems.
Posted 2 months ago. Feb 10, 2019 - 14:39 AEDT
This incident affected: Agent API.