On Saturday, May 25th at 6:10PM AEST we were notified by a customer that they believed data in one of their build pipeline environments had been accessed by an unauthorized attacker. We immediately called all-hands-on-deck and the engineering team investigated what had happened.
We identified a "credential stuffing" attack where a large volume of requests were made to our REST API using Email/Password Authentication (vs the more modern Access Token-based authentication) from a period of May 17th - May 21st. The credentials used in the attack appear to have come from "public breach datasets" made up of credentials stolen in previous hacking incidents not related to Buildkite. We saw 3.5 million requests, of which 300 were successfully able to authenticate with user accounts that had re-used passwords previously disclosed.
At 8:10PM (2 hours elapsed), we disabled Email/Password authentication for the API, specifically for organizations that had the "SSO Required" flag on users. We removed passwords from all compromised users and in some cases removed the user from the organization entirely to ensure the attacker no longer had access via the REST API, login form, or any other means.
Once we had confidence that further attacks weren't possible, we investigated what actions the attacker had made. Once authenticated, the attacker:
- Enumerated all organizations that the user had access to
- Listed each pipeline for those organizations
- Then the most recent builds for each pipeline
After this activity the attack ceased. Using server logs we can see exactly what steps were taken and what information was seen. This investigation allowed us to be confident that we had a clear picture of what had happened. We saw no signs of builds being triggered, artifacts being downloaded or subsequent access to organizations via secondary methods other than the API.
The following day at 1:22PM AEST, we notified organization administrors that there had been unauthorised access to their pipelines. We then followed up on any questions and provided server logs to those that required them. We posted a public status page at 5:40PM and confirmed with Enterprise customers who weren't affected that their data had not been accessed. We've contacted all affected organizations and users, and if you weren't contacted your data was not accessed.
A final concern was that the the webhook URLs used to trigger builds from GitHub, Bitbucket, etc were included in the API responses the attacker had access to. These are restricted to only being able to initiate a build for a given git commit which limits their security impact. We confirmed at 6:30PM that no suspicious builds had been triggered using them and at 7:00PM we rolled out a series of additional security checks and added additional monitoring.
Over the course of the following two weeks, we rolled out a wide variety of security improvements both internally and externally. We closed the incident on May 29th.
A number of factors came together to make this attack possible.
- Users needed to have used email/password combinations for their Buildkite accounts that had been used on other sites and were available in public breach datasets.
- Our REST API allowed email/password authentication that worked for any of the email addresses that a user had on their account including their primary work email, but also the secondary personal emails that are more frequently included in data breaches.
- In recent SSO enhancements, an "SSO Required" flag was added for organization administrators to control whether or not a user required a valid SSO authorization before accessing any pipelines or builds. This wasn't checked when using email/password authentication in the API, although MFA was, as was organization membership.
- Users needed to have stored valuable credentials in plain-text areas such as pipeline environment variables. Our documentation on best practice here didn't clearly discourage users from doing this.
- Our monitoring systems didn't alert us to the spike in failed logins to the REST API.
Changes we've made
We've made a lot of changes at many different layers of our system following this attack. Some of them we're choosing to keep private for a layer of "security through obscurity", but the majority we feel comfortable sharing and we're happy to answer questions about.
- Compromised password detection with the [haveibeenpwned.com](https://haveibeenpwned.com) API (using k-anonymity hashes) for checking if passwords used at login or password reset are included in public breaches. This means we'll reject any password that is known to be leaked and require a password reset.
- Removal of email/password as an authentication method for our REST API.
- Stronger system-wide rate-limiting for authentication failures with automated interventions and internal alerting/monitoring.
- A review of all the points in our codebase where authentication occurs for similar attack vectors.
- A tool and API for rotating github webhooks for triggering builds. Whilst we believe this is low-priority, teams that are concerned can regularly rotate their webhook endpoints.
- Customizable access-control rules for what IP addresses can trigger builds using build webhooks.
- Clearer documentation around best-practices around secrets management.
As a follow-on from the attack, we've been thinking hard about access tokens (which weren't used in the attack, but are a key point in our security model). We've rolled out:
- An API for admins to revoke tokens that they find
- Removed the "All Organizations" scope for access tokens. Access tokens need to have organizations added at the point of creation, or updated to add additional Organizations.
We'll soon be releasing UI changes that will give Organization admins visibility into what tokens have access to their organization, how old they are, when they were last used and assist with revoking access.
- May 25th 6:10PM AEST - We were notified by a customer that they believed data in one of their build pipeline environments had been accessed by an unauthorized individual. They provided an IP address that they had seen used. We immediately set up a Shared Slack to collaborate on identifying the cause.
- May 25th 7:00PM AEST - We identified 3.58M requests to our API from the attacker IP address that returned a 401 Unauthorized, and around 300 requests that were successful, across 19 organizations. Based on the pattern, we determined it was a Credential Stuffing attack, where the attacker used a database of emails and passwords from other breached sites. The breached accounts had re-used previously disclosed passwords listed on haveibeenpwned.com. The attack occurred several days earlier, and attacker activity hadn't been seen since.
- May 25th 8:00PM AEST - We determined that the attacker had used a legacy email/password authentication option on the REST API to bypass SSO requirements to access organization information. As it was via the API, data exposed was limited, but included environment variables set in the pipeline settings and the commands executed for each step.
- May 25th 8:10PM AEST - We rolled out an update to the REST API that disabled email/password authentication for accounts that participated in SSO organizations.
- May 25th 8:30PM AEST - We removed the passwords for all affected users to make sure the attacker couldn't use compromised passwords to retain access.
- May 25th 9:00PM AEST - After an in-depth search, we concluded that the attacker access was limited to REST API usage and that we had a clear picture of all of the actions they had taken via our API logs. The main target appeared to be secrets stored in plain-text in pipeline settings.
- May 25th 10:47PM AEST - We contacted via phone and shared slack teams that we had a nominated security contact for. Several responded immediately and we filled them in on details and fielded questions.
- May 25th 11:30PM AEST - We provided server logs of the REST API activity to teams that required them.
- May 26th 01:50AM AEST - We signed off for the night, leaving the security teams we were working with a pager address.
- May 26th 07:00AM AEST - We requested that DataDog extend our log retention period to ensure we didn't lose any forensic data.
- May 26th 01:22PM AEST - We emailed a detailed account of what had happened to all administrators from all affected user accounts.
- May 26th 05:40PM AEST - We posted a status page incident https://www.buildkitestatus.com/incidents/z4dn9qzvzt93
- May 26th 07:00PM AEST - To mitigate concerns about build webhook endpoints being included in the responses the hacker had access to, we checked all webhook deliveries and confirmed that none came from unexpected IP addresses. We added a check on the source ip of any inbound webhooks from github sources.