Ashby’s engineering team was facing performance and reliability issues with their CI test workflow in GitHub Actions, which ultimately was impacting their deployment frequency.
Now, Ashby’s developers can reliably and efficiently run their tests for every git push and continuously deploy to their more than 2500+ customers without having to slow down.
Ashby is an all-in-one recruiting platform used by startups and enterprises alike. They are used by hundreds of companies, including Snowflake, Reddit, Ramp, Mercury, Vanta, and us!
With their adoption of a continuous delivery practice that releases every change passing its test suite to customers, it’s no surprise that they take the performance and reliability of their CI/CD pipeline seriously—so much so that they even explicitly highlight faster CI/CD in their job postings.

The Problem
Moving away from a legacy CI provider.
Prior to Blacksmith, Ashby relied on hosted runners from a third-party CI provider. At the time, they were dealing with their CI taking longer and longer, making their engineers wait more than 7 minutes to be able to merge their code and delaying their deployments. Their initial workaround was to vertically scale and get a larger runner with 16 vCPUs to run their CI jobs on. This helped—for a few months—but their CI times eventually started to regress as they wrote more tests.
When vertical scaling fails, it’s time to try horizontal scaling. So, the next solution they tried was to shard their tests across multiple runners in parallel. This worked when they had only a few shards, but as the number grew, the effectiveness quickly went away. Moreover, as a result of all the runners being requested concurrently, they soon began to notice a new issue: Their runners weren’t being assigned quickly enough, causing jobs to stay in the queue for longer than even the actual job execution itself. Talk about a bottleneck.
As if the long job durations and cold starts weren’t enough, they started dealing with reliability issues. This became a major headache for Aaron Norby, Founding Engineering at Ashby, who noted that outages seemed to be happening “every other week” over the past year, bringing their deployment process to a halt.
During an outage, it’s not always clear if you’ll be able to roll back after deploying. So, to play it safe, they’d prioritize customer experience and hit pause on their deployment entirely whenever their legacy CI provider started having a meltdown. After all, shipping code is risky enough—no need to gamble with whether or not you can undo your mistake! Unsurprisingly, the recurring reliability issues became increasingly frustrating—a sentiment perfectly captured by Abhik Pramanik, the Co-Founder of Ashby:

It seems that some CI providers have forgotten that reliability is important to their customers.
— Abhik Pramanik, Co-Founder, Ashby.
Ultimately, the combined performance and reliability issues with GitHub began to feel too painful, and they decided it was time to search for an alternative.
The Solution
Selecting the fastest option in the business.
Ideally, you wouldn't want to switch to a new solution and lose both the GitHub Actions ecosystem and the domain expertise your engineers have developed. What you really want is a better version of GitHub Actions. This is why alternatives like Buildkite are a non-starter for many engineering teams—including Ashby’s.
Given this, Blacksmith, which integrates with the GitHub Actions ecosystem, became an obvious low friction option for Ashby. However, Blacksmith still needed to prove it could live up to its promises.
To put it to the test, Aaron spun up a 16vCPU runner on both Blacksmith and a their legacy CI Provider to benchmark whether Blacksmith really was 2x faster with no cold start delays and at half the cost. Spoiler alert: We weren’t bluffing.
After successfully passing the sniff test and with a smooth approval process from Ashby’s security team, Blacksmith was selected to run their CI test workflow!

Blacksmith was super easy to get started with. All we had to do was change a few lines of code, and boom, our CI was taking half the time it had been. Instant win.
— Aaron Norby, Software Engineer, Ashby.
The Results
From 0 to 3500 vCPUs in seconds.
Today, Ashby runs thousands of jobs on Blacksmith weekly, each running 31 test shards in parallel for every git push. At peak usage, when multiple of their developers push code at the same time, Blacksmith allows them to go from 0 to 3500 vCPUs in seconds. A PR that used to take more than 7 minutes to pass all checks now wraps up in around 4 minutes and their CI infrastructure costs have dropped by a whopping 75%. On top of all of this, they no longer have to deal with subpar support.

The Blacksmith team is super-responsive on Slack. We haven’t had to wait more than 5 minutes for a response. The difference was night and day compared to dealing with other CI providers.
— Aaron Norby, Software Engineer, Ashby.
If you're currently pulling your hair out using GitHub Actions, but dealing with your CI Provider’s performance and reliability issues, let us help you too. We promise there is a better way!