Monorepo tools use task caching to optimise build times and enhance developer productivity. In large codebases, it's often unnecessary to rebuild or retest every component for every change. Thus, caching task runs provides a way to remember and reuse previous results, saving substantial amounts of time.
How does Task Caching Work?
When a task (like a build or test) is run, the tool generates a hash of its inputs. Inputs could include the source code, configuration files, and environment variables, node
version, etc. The resulting hash string is a unique identifier that represents the exact circumstances of this task.
You can then run the task and store the output in a folder named as the hash string:
├── cache
│ ├── w4lkgfjw9SDLDjgkj02isdflalsdkfjsSGWskdgj <-- hash
│ │ ├── output.txt
│ └── w0skljsSQASalkjQLKjqoijasdSDFmaQLAKSJDLK
│ ├── output.txt
It follows, therefore, that if you want to run that task at a later date, you first re-calculate the hash, then check the cache
for a folder of that name, and instead of running the task again, you just pipe the output from output.txt
directly to the console.
Think of all the code your have in your repo that almost never gets touched, yet the tasks get re-run every time you make a PR. With monorepo tooling, you can instead split your code up into isolated packages, which will then take advantage of this caching mechanism.
This is already a win!
But, we can go further...
How many machines do your tasks run on? Well yours, obviously. Then there's those of your team. Then of course it runs again on CI. What if you could share the cache between all of these machines? The amount of times an individual task is run could be significantly reduced. This is where Nx Cloud steps in.
Essentially, instead of having your cache
folder on your local machine, Nx Cloud manages that remotely. So when you run a task, it first checks the remote to see if that hash exists. If it does, you get the cached output. If it doesn't, you run the task, then send your result to the cache.
There are a few little gotchas here, of course. Most importantly, you have to ensure that all machines are using compatible node
versions. I'm doing it on this project by setting the engines
property in my package.json
:
"engines": {
"node": "18.13.0"
}
And using a .npmrc
to ensure that npm install
respects that engines
property:
engine-strict=true
I won't go into details of how to setup Nx Cloud, you can find that on their website at https://nx.app
When do the tests get run?
As I'm coding, I can be running tasks on affected code locally, using the nx affected
command. This will keep the cache up to date.
Then, when it's time to push, I have a pre-push hook setup to execute all these tasks before the code leaves my machine. This mostly involves drawing the majority from cache, and just running a few tasks for code that has changed.
By the time the code has left my machine, therefore, there's no more work to be done in terms of executing tasks, and CI is free to just rubber-stamp the commit.
I've found this workflow to be extremely efficient as it reduces nearly all friction between writing the code and committing it to the code base.
Results
You can see all the results of my CI pipelines here. Apart from dependabot
updates (which are usually run on the CI machine first - i.e. not cached), most of my CI pipelines take less than 1m to run. This is including:
- Setting up the machine, and installing dependencies.
- Linting source files.
- Unit and integration tests.
- Linting styles.
It's fair to say that the only significant time spent in CI is actually setting up these jobs, because the results are nearly always pulled from remote cache.
Admittedly, it's not the biggest repo in the world, but that's kind of not the point. The repo could grow 10x and still in theory be in the same ball-park for CI times, because Nx will only execute tasks for the code that has changed.
Personally, I don't see how I can ever go back after experiencing this.
Top comments (0)