Re-usage of build cache is one of the most important things in Docker images creating.
To efficiently dockerize an app you need to split source code copying and dependencies installation in a few steps:
- Copy dependencies files.
- Install dependencies.
- Copy source code.
For a node.js application these steps look like:
COPY package.json yarn.lock ./
RUN yarn install
COPY . .
However, this solution does not work with yarn workspaced application because the root package.json
and yarn.lock
are not enough to install whole project dependencies.
When I faced this task the first time I thought: what if I find all nested package.json
files and copy them to a src
directory:
COPY src/**/package.json src/
src/**/package.json
pattern matches all package.json
's that I need. But COPY
works as not I expected. And instead of the expected directories structure I've got a single file under the src
.
# The original project's tree
app
├── package.json
├── src
│ ├── backend
│ │ ├── backend.js
│ │ └── package.json
│ ├── notifier
│ │ ├── notifier.js
│ │ └── package.json
│ └── scraper
│ ├── package.json
│ └── scraper.js
└── yarn.lock
# The expected tree
app
├── package.json
├── src
│ ├── backend
│ │ └── package.json
│ ├── notifier
│ │ └── package.json
│ └── scraper
│ └── package.json
└── yarn.lock
# The result tree
app
├── package.json
├── src
│ └── package.json
└── yarn.lock
For a second I thought I could replace the single pattern line with a COPY
operation for every workspace. But I wanted to have a scalable solution, a solution without duplication.
Shell solution
I've googled some alternative solutions. Commonly they suggest wrapping docker build
with a script that creates a tmp
folder, build the expected package.json
's tree there and COPY
the folder in the image.
And the "shell solution" is much better than the previous "copy-paste" solution. But it did not make me feel pleased.
Multi-stage builds solution
At some point, I thought of multi-stage builds. I used it in another project to build a tiny production image. "What if I will prepare the tree on a first stage and copy it on a second stage?"
In addition to the root package.json
and yarn.lock
files I copied the src
directory and removed all not package.json
files:
COPY package.json yarn.lock ./
COPY src src
# Remove not "package.json" files
RUN find src \! -name "package.json" \
-mindepth 2 \
-maxdepth 2 \
-print \
| xargs rm -rf
On a second stage I copied the tree and installed dependencies:
COPY --from=0 /app .
RUN yarn install --frozen-lockfile --production=true
Under the hood yarn workspaces
use symlinks. So it's important to create them after copying src
directory:
COPY . .
# Restore workspaces symlinks
RUN yarn install --frozen-lockfile --production=true
The final solution Dockerfile
FROM node:14.15.0-alpine3.10
WORKDIR /app
COPY package.json yarn.lock ./
COPY src src
# Remove not "package.json" files
RUN find src \! -name "package.json" -mindepth 2 -maxdepth 2 -print | xargs rm -rf
FROM node:14.15.0-alpine3.10
ENV NODE_ENV production
WORKDIR /app
COPY --from=0 /app .
RUN yarn install --frozen-lockfile --production=true
COPY . .
# Restore workspaces symlinks
RUN yarn install --frozen-lockfile --production=true
CMD ["yarn", "start"]
Top comments (2)
Thank you for the fix. However, you may not need a second yarn install to restore links since node_modules should not be copied to Docker and ignored in .dockerignore. Also first stage image can be smaller, ubuntu for example (e.g. ubuntu ~63MB vs node:alpine ~120MB) because you do not really need NodeJS there.
Igor, sorry for late reply.
This is an interesting point about choosing stage images. I use node:alpine twice because anyway on second stage I'll need this. So docker downloads it on first stage and takes from cache for second stage. With different images for first and second stages docker will download two images and in total it will spend more time.