There are three things you can watch forever: fire burning, water falling, and how the build passes the stages in Pipeline after the next commit. To make the wait less tedious, it's best to take care of the CI setup from the beginning.
GitHub Actions has a cache that gets to the runner's virtual machine in seconds. In this article I'd like to share examples of how to set up aggressive dependency caching. Why did I call this approach "aggressive caching"? Because we will be caching not only the packages archives but also the state of the environment after installation.
For Node.js it will be the node_modules
directory, and for Python it will be the virtualenv directory with installed dependencies.
Node.js Example
Let's take the typical setup for dependency caching example mentioned in the documentation. If you don't need any exotics, you can use the standard actions/setup-node
action, specifying a package manager.
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: 16
cache: 'npm'
- run: npm ci
- run: npm test
This will save the .npm
directory with the global package cache. Sounds great! Remember that if we have several workflow jobs requiring npm ci
inside, this will also be time-consuming.
Let's imagine a pipelining with several jobs:
│
Create │ Reuse
Dependencies │ Dependencies
Cache │
│ ┌────────────────────┐
┌────────┼──► Lint Job ├────────────────────┐
│ │ └────────────────────┘ │
│ │ │
│ │ │
┌───────────────┴────┐ │ ┌────────────────────┐ ┌──▼─────────────────┐
│ Build Job ├───┼──► Test Job ├─────────────────► Deploy Job │
└───────────────┬────┘ │ └────────────────────┘ └──▲─────────────────┘
│ │ │
│ │ │
│ │ ┌────────────────────┐ │
└────────┼──► E2E Job ├────────────────────┘
│ └────────────────────┘
│
│
│
│
Ideally, we want to install dependencies only in the first job and get a state with available dependencies in all subsequent jobs.
I'll show how to achieve it using a sample repo — redux-react-realworld-example-app.
The first (build
) job might look like this:
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version-file: '.nvmrc' # (1)
cache: 'npm'
- name: Cache NPM dependencies # (2)
uses: actions/cache@v3
id: cache-primes
with:
path: node_modules
key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }}
- name: Install dependencies # (3)
if: steps.cache-primes.outputs.cache-hit != 'true'
run: npm ci
- name: Build
run: npm run build
Line #1 specifies the node version using .nvmrc
file. That's the alternative way to specify the version and it helps follow the DRY - Don't Repeat Yourself principle.
In line #2 we use actions/cache
to cache the node_modules
directory. We use the hash from the package-lock.json
file as the key.
In line #3 we only install dependencies if the cache is invalidated.
To automatically retrieve node_modules
in subsequent jobs, you must declare actions/cache
with the same key. For example the test
job can be configured as:
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version-file: '.nvmrc'
cache: 'npm'
- name: Cache NPM dependencies
uses: actions/cache@v3
with:
path: node_modules
key: ${{ runner.os }}-node-${{ hashFiles('package-lock.json') }} # (1)
- name: Tests
run: npm run test # (2)
Line #1 specifies the cache key. The key must be the same as in the build
job. After the actions/cache
step we consider that the dependencies are installed and run the tests in line #2.
Check out the complete workflow on GitHub.
Python Example
Standard scenario from actions/setup-python
docs:
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.9'
cache: 'pip' # caching pip dependencies
- run: pip install -r requirements.txt
This workflow will cache pip packages in ~/.cache/pip
, but the installation step will always be performed, as in the previous example with npm ci
.
Let's see how we can optimize the installation of dependencies. I'll use the Django-based education-backend repo.
Let's dive into the build
job:
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
id: setup-python
with:
python-version-file: '.python-version'
- uses: actions/cache@v3
with:
path: venv
key: ${{ runner.os }}-venv-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/*requirements.txt') }} # (1)
- name: Install dependencies # (2)
if: steps.cache-primes.outputs.cache-hit != 'true'
run: |
python -m venv venv
. venv/bin/activate
pip install --upgrade pip pip-tools
pip-sync requirements.txt dev-requirements.txt
- name: Run the linter
run: |
. venv/bin/activate # (3)
cp src/app/.env.ci src/app/.env
make lint
As you can see, we use the same idea for caching as for the Node.js project. There are some minor changes which are quite important. We need to specify the cache key for each python version involved in the workflow. Line #1 has steps.setup-python.outputs.python-version
variable exactly for this purpose.
Dependencies installation from line #2 is tricky. For python we use a virtual environment created with the module venv
. The environment directory venv
will be cached. You can think about it as node_modules
for node.
Line #3 has one more trick. After the cache is warmed up it's necessary to initialize virtualenv in the future steps. Otherwise, the python interpreter will not be able to detect the necessary libraries to import.
The simplified test
job may look as follows:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
id: setup-python
with:
python-version-file: '.python-version'
- uses: actions/cache@v3
with:
path: venv
key: ${{ runner.os }}-venv-${{ steps.setup-python.outputs.python-version }}-${{ hashFiles('**/*requirements.txt') }} # (1)
- name: Run the tests
run: |
. venv/bin/activate # (2)
cp src/app/.env.ci src/app/.env
make test
Ensure you use the same key for the caching step (line #1) and remember to activate the virtual environment before running the tests (line #2).
Check out the complete workflow on GitHub.
Summary
We've practiced "aggressive caching" with Node.js and Python examples. As far as you have a significant number of dependencies the changes can speed up your GitHub workflow sensibly. I recommend trying to set up workflows for your project using the references I've mentioned:
If you still have questions about caching in GitHub Actions, don't hesitate to ask in the comments. I'll try to help.
I would be grateful if you share your tips on how to speed up workflows on GitHub too.
Top comments (3)
Very useful, thank you!!!
It's super fast, unless it's a different runner virtual machine, you lose the cache.
Very interesting article! Thanks a lot!