DEV Community

Max Belsky
Max Belsky

Posted on • Originally published at

Dockerizing a Workspaced Node.js Application

Re-usage of build cache is one of the most important things in Docker images creating.

To efficiently dockerize an app you need to split source code copying and dependencies installation in a few steps:

  1. Copy dependencies files.
  2. Install dependencies.
  3. Copy source code.

For a node.js application these steps look like:

COPY package.json yarn.lock ./

RUN yarn install

COPY . .
Enter fullscreen mode Exit fullscreen mode

However, this solution does not work with yarn workspaced application because the root package.json and yarn.lock are not enough to install whole project dependencies.

When I faced this task the first time I thought: what if I find all nested package.json files and copy them to a src directory:

COPY src/**/package.json src/
Enter fullscreen mode Exit fullscreen mode

src/**/package.json pattern matches all package.json's that I need. But COPY works as not I expected. And instead of the expected directories structure I've got a single file under the src.

# The original project's tree
├── package.json
├── src
│   ├── backend
│   │   ├── backend.js
│   │   └── package.json
│   ├── notifier
│   │   ├── notifier.js
│   │   └── package.json
│   └── scraper
│       ├── package.json
│       └── scraper.js
└── yarn.lock

# The expected tree
├── package.json
├── src
│   ├── backend
│   │   └── package.json
│   ├── notifier
│   │   └── package.json
│   └── scraper
│       └── package.json
└── yarn.lock

# The result tree
├── package.json
├── src
│   └── package.json
└── yarn.lock
Enter fullscreen mode Exit fullscreen mode

For a second I thought I could replace the single pattern line with a COPY operation for every workspace. But I wanted to have a scalable solution, a solution without duplication.

Shell solution

I've googled some alternative solutions. Commonly they suggest wrapping docker build with a script that creates a tmp folder, build the expected package.json's tree there and COPY the folder in the image.

And the "shell solution" is much better than the previous "copy-paste" solution. But it did not make me feel pleased.

Multi-stage builds solution

At some point, I thought of multi-stage builds. I used it in another project to build a tiny production image. "What if I will prepare the tree on a first stage and copy it on a second stage?"

In addition to the root package.json and yarn.lock files I copied the src directory and removed all not package.json files:

COPY package.json yarn.lock ./
COPY src src

# Remove not "package.json" files
RUN find src \! -name "package.json" \
  -mindepth 2 \
  -maxdepth 2 \
  -print \
  | xargs rm -rf
Enter fullscreen mode Exit fullscreen mode

On a second stage I copied the tree and installed dependencies:

COPY --from=0 /app .

RUN yarn install --frozen-lockfile --production=true
Enter fullscreen mode Exit fullscreen mode

Under the hood yarn workspaces use symlinks. So it's important to create them after copying src directory:

COPY . .

# Restore workspaces symlinks
RUN yarn install --frozen-lockfile --production=true
Enter fullscreen mode Exit fullscreen mode

The final solution Dockerfile

FROM node:14.15.0-alpine3.10

COPY package.json yarn.lock ./
COPY src src

# Remove not "package.json" files
RUN find src \! -name "package.json" -mindepth 2 -maxdepth 2 -print | xargs rm -rf

FROM node:14.15.0-alpine3.10

ENV NODE_ENV production

COPY --from=0 /app .

RUN yarn install --frozen-lockfile --production=true

COPY . .

# Restore workspaces symlinks
RUN yarn install --frozen-lockfile --production=true

CMD ["yarn", "start"]
Enter fullscreen mode Exit fullscreen mode

Top comments (2)

igorpupkinable profile image
Igor • Edited

Thank you for the fix. However, you may not need a second yarn install to restore links since node_modules should not be copied to Docker and ignored in .dockerignore. Also first stage image can be smaller, ubuntu for example (e.g. ubuntu ~63MB vs node:alpine ~120MB) because you do not really need NodeJS there.

mbelsky profile image
Max Belsky

Igor, sorry for late reply.

This is an interesting point about choosing stage images. I use node:alpine twice because anyway on second stage I'll need this. So docker downloads it on first stage and takes from cache for second stage. With different images for first and second stages docker will download two images and in total it will spend more time.