Delayed notifications
Incident Report for Buildkite
Postmortem

This incident was the result of a lack of capacity in the group of workers that services outbound notifications queues. Notification queue latency reached 8 minutes due to this lack of capacity. During a blue/green rollout of these workers, autoscaling was manually disabled to increase the number of workers serving the queue in preparation for migrating the processing of background jobs between the blue and green side of the deployment. After shifting background workload across to the new workers, autoscaling was not re-enabled which prevented the deployment from scaling up to maximum capacity. We’re improving the documentation and tooling involved in this blue/green deployment process to prevent a recurrence of this issue and improve visibility into the status of autoscaling.

Posted Apr 04, 2023 - 03:46 UTC

Resolved
This incident has been resolved.
Posted Apr 04, 2023 - 02:50 UTC
Identified
We've identified an issue with notification delays and have scaled up capacity to deal with it.
Posted Apr 04, 2023 - 02:40 UTC
Investigating
We are investigating delays to build and job notifications such as commit status and other webhooks
Posted Apr 04, 2023 - 02:26 UTC
This incident affected: Notifications (GitHub Commit Status Notifications, Email Notifications, Slack Notifications, Webhook Notifications).