The exodus from GitHub Actions to Buildkite

May 28, 2024

Aditya Jayaprakash

GitHub Actions is great for a small company - it's super easy to start with and natively integrates with GitHub. However, at Blacksmith, we’ve noticed a pattern where once companies hit the ~75 engineer mark, GitHub Actions doesn’t quite cut it anymore. BuildKite emerged as the clear choice for such companies. We found a few key reasons for this as we dug deeper:

1. Being able to run CI on your infrastructure easily

As companies scale, they notice that their CI costs grow almost quadratically with the number of engineers. This happens for two reasons,

  1. As they scale, the size of their test suite grows.

  2. A growing number of engineers run this test suite every time they push code, leading to quadratic growth in test suite execution times.

This leads to an explosion in CI costs, and self-hosting is one of the ways companies can control their CI spending. The other motivation for self-hosting is security since self-hosting can ensure that their code is only being run within their VPC.

GitHub Actions allows companies to self-host runners using the Actions Runner Controller (ARC), a Kubernetes Controller. This helps companies orchestrate and autoscale the number of self-hosted runners, typically involving the following steps.

  1. Setting up an AWS EKS cluster (or equivalent)

  2. Installing ARC using its Helm chart

  3. Configuring GitHub App authentication and Kubernetes secrets

  4. Set up autoscaling to automatically scale runners based on the queue length of GitHub Actions jobs.

  5. Setting up logging and monitoring with CloudWatch, Prometheus, or Grafana.

  6. Security configurations like setting up IAM roles and policies attached to your EKS cluster and worker nodes. Security groups must also be defined to control inbound and outbound traffic to your Kubernetes nodes.

Not only does ARC require Kubernetes experience, but it also necessitates all the performance and operational overhead of Docker-in-Docker and dealing with GitHub’s unreliable webhook delivery. ARC also isn’t a full drop-in replacement for the official hosted GitHub runner environment since it runs jobs in a container, as opposed to a full virtual machine.

Buildkite makes self-hosting much easier using its AWS-centric plugin, a CloudFormation stack, to spin up a Buildkite agent fleet on AWS. This is highly streamlined and almost a one-click solution for setting up Buildkite agents on AWS since most settings are pre-configured. Since their agents are directly run on EC2, it’s a much simpler solution and avoids using Kubernetes. Buildkite also offers other plugins, like this one, that make getting metrics on agent health easier.

2. Meeting enterprise needs better

GitHub Actions is notorious for having poor uptime, with outages at least once a week. Buildkite, on the other hand, is a lot more reliable, as reflected in its uptime status page.

Buildkite also has a better audit trail and permissions system, which is less useful for a small company but becomes more useful as an organization grows.

Cloning a repo during CI could be slow for large companies, especially those with large monorepos. Buildkite has a feature called git mirrors that helps clone repos faster. During a git clone, an agent hits the git mirror on the same host machine, so it never repeatedly downloads it from the internet. Cloning from the git mirror is substantially faster since it uses direct disk access rather than hitting any network. This feature is easy to set up and can be enabled by configuring the cloud formation code to enable git mirrors and configuring the git clone flags to ensure the agent sets up a mirror on the host machine.

3. Deeper test analytics and observability into flaky tests

This feature is truly missing from GitHub, which lacks even the most basic observability into the health of a test suite. To overcome this shortcoming, many companies using GitHub Actions use Datadog’s offerings for insights into workflow analytics and flaky tests.

Especially when you have a large test suite, it's important to have a good understanding of the health of your test suite since flaky and failing tests often impede developer velocity. Buildkite lets you get a detailed overview of your test suite's health. It shows you the number of tests that are passing, failing, and flaky. It also helps you rank your tests by least reliable and slow.

They have test collectors defined for the most popular languages and frameworks. These collectors can collect test results and metadata from your test suite and send them to Buildkite, which can be viewed in their dashboard.

GitHub Actions lacks this through and through and is a major shortcoming of Actions as a CI system today.

4. Automatic retries are a first-class primitive in the DSL.

Buildkite has an automatic retries feature that allows you to automatically retry a step if it fails.

They allow you to specify conditions under which to rerun and how many times a step could be rerun. This configurability is helpful since you might want to retry during a failing test but not retry during a deployment.

Below is an actual example. We specify that if the exit_status is 5, we want to retry it twice automatically, zero times if it is 3, and once more in all other cases.

steps:
  - label: "Tests"
    command: "tests.sh"
    retry:
      automatic:
        - exit_status: 5
	      limit: 2
	    - exit_status: 3
		  limit: 0
	    - exit_status: "*"
		  limit: 1

This is especially useful in the case of flaky tests. Developers often close and open their pull requests to rerun CI because of flaky tests, hoping that CI will succeed in the next attempt.

Having retries in place can help lower the likelihood of flaky tests preventing a developer from merging code. In an ideal world, no test is flaky, but this is rarely true.

One could argue that retrying is not the solution for flaky tests; it merely papers over the flakiness. However, Buildkite has a dashboard that contains information about which steps were automatically retried, and this allows you to observe your test health and can help you identify flaky tests better.

Although GitHub Actions does not have retries baked into its DSL, third-party actions like this allow you to retry a step when it fails. Since such actions are not first-party, they are not as well-known or widely adopted. Moreover, there is no dashboard or way to visualize and understand how often a given step was retried.

At Blacksmith, we seek ways to bridge this gap between GitHub Actions and Buildkite. We see no reason enterprises shouldn't be able to write their code and run their tests in the GitHub ecosystem. We'd love to hear from you if you're currently using GitHub Actions and are looking to migrate or have migrated to Buildkite. If you want to talk to us, don't hesitate to contact us at hello@blacksmith.sh.