DEV Community

Cover image for GitLab CI: Cache and Artifacts explained by example

GitLab CI: Cache and Artifacts explained by example

Anton Yakutovich on August 04, 2021

Hi, DEV Community! I've been working in the software testing field for more than eight years. Apart from web services testing, I maintain CI/CD Pip...
Collapse
 
elmuerte profile image
Michiel Hendriks

Instead of caching node_modules, consider caching node's caching directory instead.
The difference is caching downloaded tar.gz files instead of thousands of small files. Despite gitlab's efforts, their caching mechanism sucks big time for for a large amount of small files.

Collapse
 
onkeltem profile image
Artiom Neganov

Sorry, what do you mean under "node's caching directory"? Which one is that?
And what tar.gz files do you mean?

Collapse
 
andersr profile image
Anders Ramsay

This is pure CI gold. Thank you!

Collapse
 
stoopman profile image
Rick Stoopman

Why do you create a hidden job while you only extend it in 1 job? This could all be included in the setup job right? And right now the cache.policy is always overwritten to pull-push. Or am I missing something?

Collapse
 
ilumin profile image
Lumin

Should we add dependencies in other jobs too?

Collapse
 
bcouetil profile image
Benoit COUETIL 💫

node_modules can be huge in real world, and then unsuitable for artifacts which are limited in size. Worth knowing, it is also uploaded to central Gitlab, which can be a bottleneck for a large Gitlab instance with lots of runners uploading to it.

Other than that thank you, I learned that npm ci is slow due to node_modules deletion 🙏

Collapse
 
drakulavich profile image
Anton Yakutovich

If you compare the time on the clean system, I bet npm ci would be faster than npm install. Cause it just downloads full tree of dependencies from package-lock.json. npm install will check which deps can be updated and build new dependency tree.

Collapse
 
zaggleszurek profile image
Agata Zurek

Yes, this! My project's node_modules is 2GB and is too big for artifacts. What is the recommended solution to deal with that? I've had to include npm ci on every step to get my pipeline to work at all.

Collapse
 
bcouetil profile image
Benoit COUETIL 💫 • Edited

You should use cache. This is why cache exists, and can be shared even across pipelines.

But cache has to be configured on your runners, or you will experience missing cache each time your jobs switch runners (which should not be a problem, npm will handle it)

Collapse
 
weamadel profile image
Weam Adel

Thank you so much for your effort, but I still didn't get why we need to add artifacts. You described the problem artifacts solves like so:

Second, every job installs the package dependencies and wastes time.

Isn't this why we use cache at the first place? to not install packages again? We already had added the cache at this point, so why do we need to add artifacts, too?

Collapse
 
madhead profile image
madhead

We need an article about GitHub Actions!

Collapse
 
dcg90 profile image
dcg90

Thanks! Just a doubt, don't you need to specify the cache location to the npm ci command? Something like npm ci --cache ${npm_config_cache} --prefer-offline ?

Collapse
 
drakulavich profile image
Anton Yakutovich

The variables section has npm_config_cache which will be used by npm automatically.

Collapse
 
minaro profile image
Minaro

Please let us know what's the (5)

Collapse
 
ibrokemycomputer profile image
Josh Martens

It looks like that is just for descriibing the "lines of the code" that are being talked about instead of using actual "line numbers" (since they aren't visible)