Aug 7, 17:17 AEST
The incident has been resolved, the system is running as normal.
Aug 3, 08:43 AEST
All systems are functioning as normal at this stage. Thanks for your patience everyone and we'll absolutely be following up with more analysis and a post-mortem.
Aug 3, 08:11 AEST
We're seeing performance returning to normal levels, we've turned off maintenance mode. The background queues are also reaching normal levels.
Aug 3, 08:04 AEST
We're continuing to investigate the cause of the ongoing database issues with Amazon. Will provide updates as soon as we know more.
Aug 3, 07:19 AEST
The affected database has been restarted, but we're investigating timeouts that are preventing the web nodes from returning to service
Aug 3, 06:44 AEST
We've been advised by our upstream provider that there is a Postgres bug that we are likely being affected by that is causing autovacuums to not complete. We will be performing a database restart on the affected database which will cause 3-5 minutes of outage.
Aug 3, 06:19 AEST
We've rolled out a new queue cluster, which means the commit statuses and webhook events from now on will be processed immediately whilst we deal with the backlog from the past few hours.
Aug 3, 05:56 AEST
We're still working through the web-hook and commit status backlog. We're spinning up extra infrastructure to assist.
Aug 3, 05:17 AEST
We've isolated the initial problem as database lock contention caused by unusually high load. The database performance issue resolved quickly, but caused a cascade of follow-on problems as traffic dog-piled back onto the system. We've resolved the issues with the Agent API and agents should be functioning normally, but we are still processing a large backlog of background tasks which will cause things like web-hooks to be delayed whilst we process them.
Aug 3, 03:54 AEST
We're also seeing elevated response times on the Agent API, which is related to the same underlying database issues.
Aug 3, 02:07 AEST
We're seeing high delays in service notifications including webhooks.
Aug 3, 01:56 AEST