All Systems Operational

About This Site

Status updates for Buildkite’s services and components. You can also follow @buildkitestatus on Twitter for updates.

Web   Operational
90 days ago
99.93 % uptime
Today
Agent API   Operational
90 days ago
99.84 % uptime
Today
REST API   Operational
90 days ago
99.92 % uptime
Today
Job Queue   Operational
90 days ago
99.82 % uptime
Today
Notifications Operational
90 days ago
99.93 % uptime
Today
Github Commit Status Notifications   Operational
Email Notifications   Operational
Slack Notifications   Operational
Hipchat Notifications   Operational
Pusher Pusher REST API   Operational
Pusher WebSocket client API   Operational
Webhook Notifications   Operational
90 days ago
99.93 % uptime
Today
SCM Providers ? Operational
GitHub   Operational
Atlassian Bitbucket SSH   Operational
Atlassian Bitbucket Website and API   Operational
Atlassian Bitbucket Git via HTTPS   Operational
Third Party Services ? Operational
AWS ec2-us-east-1   Operational
Mandrill Mandrill Global   Operational
PagerDuty Notification Delivery   Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Web Response Time ?
Fetching
Agent API Response Time ?
Fetching
REST API Response Time ?
Fetching
Agent Job Dispatch ?
Fetching
Error Responses ?
Fetching
Past Incidents
Aug 15, 2018

No incidents reported today.

Aug 14, 2018

No incidents reported.

Aug 13, 2018

No incidents reported.

Aug 12, 2018

No incidents reported.

Aug 11, 2018

No incidents reported.

Aug 10, 2018

No incidents reported.

Aug 9, 2018

No incidents reported.

Aug 8, 2018

No incidents reported.

Aug 7, 2018

No incidents reported.

Aug 6, 2018

No incidents reported.

Aug 5, 2018

No incidents reported.

Aug 4, 2018

No incidents reported.

Aug 3, 2018
Postmortem - Read details
Aug 7, 17:17 AEST
Resolved - The incident has been resolved, the system is running as normal.
Aug 3, 08:43 AEST
Monitoring - All systems are functioning as normal at this stage. Thanks for your patience everyone and we'll absolutely be following up with more analysis and a post-mortem.
Aug 3, 08:11 AEST
Update - We're seeing performance returning to normal levels, we've turned off maintenance mode. The background queues are also reaching normal levels.
Aug 3, 08:04 AEST
Update - We're continuing to investigate the cause of the ongoing database issues with Amazon. Will provide updates as soon as we know more.
Aug 3, 07:19 AEST
Update - The affected database has been restarted, but we're investigating timeouts that are preventing the web nodes from returning to service
Aug 3, 06:44 AEST
Update - We've been advised by our upstream provider that there is a Postgres bug that we are likely being affected by that is causing autovacuums to not complete. We will be performing a database restart on the affected database which will cause 3-5 minutes of outage.
Aug 3, 06:19 AEST
Update - We've rolled out a new queue cluster, which means the commit statuses and webhook events from now on will be processed immediately whilst we deal with the backlog from the past few hours.
Aug 3, 05:56 AEST
Update - We're still working through the web-hook and commit status backlog. We're spinning up extra infrastructure to assist.
Aug 3, 05:17 AEST
Identified - We've isolated the initial problem as database lock contention caused by unusually high load. The database performance issue resolved quickly, but caused a cascade of follow-on problems as traffic dog-piled back onto the system. We've resolved the issues with the Agent API and agents should be functioning normally, but we are still processing a large backlog of background tasks which will cause things like web-hooks to be delayed whilst we process them.
Aug 3, 03:54 AEST
Update - We're also seeing elevated response times on the Agent API, which is related to the same underlying database issues.
Aug 3, 02:07 AEST
Investigating - We're seeing high delays in service notifications including webhooks.
Aug 3, 01:56 AEST
Aug 2, 2018
Resolved - The queue has been working correctly again for more than an hour. Apologies for the interruptions!
Aug 2, 07:21 AEST
Monitoring - The queue has caught up — service notifications including webhooks and commit statuses should now be up to date.
Aug 2, 06:08 AEST
Investigating - We're seeing high delays in service notifications including webhooks.
Aug 2, 05:41 AEST
Aug 1, 2018

No incidents reported.