At Dataform we maintain a handful of NPM packages and a documentation site in one single monorepo and we do it all with Bazel.
I'm going to quickly talk through monorepos and Bazel, then deep dive into interesting parts our monorepo with some real code examples, covering:
- Bazel TypeScript basics
- Managing multiple packages in a single monorepo
- Building and publishishing NPM packages with Bazel
Our project is open-source, so you can view all the code, or clone and build it with Bazel at: https://github.com/dataform-co/dataform
All your code in one repo
a.k.a the monorepo, so hot right now.
I used to work at Google so I may be biased, but there is a huge amount of value to having all your code in one single repository. You end up spending a lot less time doing repetitive tasks, updating git submodules, pushing new packages, running bash scripts - the kind of things that distract you from the important task at hand.
With a single code base, it becomes very easy to re-use code and libraries between different projects, but you need a good build system to make it work.
Bazel
_{Fast, Correct} - Choose two_
- https://bazel.build
Ever tried to clone and compile an open-source repo just to spend 30 minutes wrangling with missing or broken system dependencies, mismatched versions and a myriad of bash scripts that just don't work? Yeah...
Bazel is a build system. It's highly opinionated and tricky to master, but leaves you with an extremely fast, hermetic, and reproducible build process once adopted.
Bazel is still fairly young, but the ecosystem is evolving extremely quickly. It's also built on solid foundations - being used internally at Google Bazel is called Blaze and helps to power Google's one colossal monorepo (literally all the code).
The problem
We maintain several NPM packages with inter dependencies. Our goals here are:
- To manage all packages in a single repository
- For our builds to be fast and reliable
- To test changes to multiple packages at the same time
- An easy way to manage versions across all these packages
- To write as few bash scripts as possible
A basic Bazel TS rule
Here's an example of how you build TypeScript library with Bazel. Our simplest package in the repo is @dataform/core
and we'll use this as an example for most of the post.
The folder looks like a normal TS package, except for the BUILD
file. Here's the part of that file that actually compiles the TypeScript:
ts_library(
name = "core",
srcs = glob(["**/*.ts"]),
module_name = "@dataform/core",
deps = [
"//protos",
"@npm//@types/moo",
"@npm//@types/node",
"@npm//moo",
"@npm//protobufjs",
],
)
This rule in the BUILD
file tells Bazel:
- The library called
core
- It should include all
.ts
files within this folder - It's (node) module name is
@dataform/core
- It has one internal dependency
//protos
- It has a few NPM dependencies, just like a
package.json
file
To build this TS library you can run:
bazel build core
Note: ts_library
rules and other node related rulesets are not core to the bazel runtime but are imported from elsewhere. You can read more about them here: https://github.com/bazelbuild/rules_nodejs.
Dependencies between packages
If you've worked with a NPM based monorepo before, you've probably used a tool like Lerna.
Lerna makes it easy to link packages locally so you can test changes across multiple NPM packages. It also makes it easier to manage versioning between them. We want that.
Bazel builds and links packages without going anywhere near an actual NPM package. In our @dataform/core example above, the ts_library
rule depends on //protos
which is just another ts_library
rule.
ts_library(
name = "protos",
srcs = glob(["index.ts"]),
module_name = "@dataform/protos",
deps = [
...
],
)
The ts_libary
rule does some magic to make sure that built packages are available under the module_name
attribute provided, which matches the NPM package they will be published at.
So in our @dataform/core
package, we can import from the //protos
package whose module_name
is @dataform/protos
like this:
import { dataform } from "@dataform/protos";
When we publish to NPM, these imports will resolve correctly too as the module names match the package names.
Managing multiple packages
Lerna also helps you manage multiple package.json
files, updating all versions together and publishing them. We would like a way to do the same thing in Bazel.
To generate package.json
files we built small tool in our monorepo that Bazel uses to generate package.json
files using layers of JSON templates and string substitutions.
For the @dataform/core
package, we have a core.package.json
file looks like this:
{
"name": "@dataform/core",
"description": "Dataform core API.",
"main": "index.js",
"types": "index.d.ts",
"dependencies": {
"@dataform/protos": "$DF_VERSION",
"moo": "^0.5.0",
"protobufjs": "^6.8.8"
}
}
Any extra info, licenses, homepage etc - is inherited from the base common.package.json
so we don't have to keep several files in sync.
The special string $DF_VERSION
gets replaced with a global constant defined as part of the Bazel build system in version.bzl
.
Building NPM packages
To evaluate these JSON templates, we wrote a Bazel macro to invoke our tool above and we invoke it in the @dataform/core
BUILD
file like so:
load("//tools/npm:package.bzl", "dataform_npm_package")
dataform_npm_package(
name = "package",
package_layers = [
"//:common.package.json",
"core.package.json",
],
deps = [":core"],
)
This custom bazel macro both generates a final package.json
from two templates listed, and creates an output dist folder with the compiled TypeScript that's ready to be published.
To see the output of this and the final generated package.json
, you can run the following command:
bazel build core:package
Bazel tells us it's put the package in the folder bazel-bin/core/package
with .js
and .d.ts
files as well as the final package.json
(this is kind of like a dist
folder) that is ready to publish!
Note: We haven't fully automated this step yet, and it's still necessary to make sure that the dependencies and package name in the BUILD
file match those in the package.json
template, but it's certainly feasible to automate this entirely.
Publishing NPM packages
Publishing is easy at this point, and the rules_nodejs
libraries have this built in. To publish a package we can run:
bazel run //core:package.publish
We still have a bash script to do this for all packages but all it does is invoke Bazel commands:
#!/bin/bash
set -e
# Test all the things.
bazel test //...
# Publish all the things.
bazel run //api:package.publish
bazel run //core:package.publish
bazel run //cli:package.publish
bazel run //crossdb:package.publish
bazel run //protos:package.publish
Conclusion
This is by no means a complete solution yet, and in reality will require you to learn quite a lot about Bazel to get it working on your own project. For anyone trying, hopefully our repo can serve as a good reference!
Despite that, I hope that this demonstrates that using Bazel is a great solution to managing complex projects and many Node / Typescript packages inside a single repository.
With a a few small extra Bazel rules you can build a TypeScript monorepo that is lighting fast and will scale as your project does.
If you found this post interesting and would be interested in a Bazel TypeScript starter pack repo, reach out and we'll see what we can do!
Top comments (2)
Great article! Simple and great for the entry level Bazel users.
I wish i could publish package with names like
@organization/scope/package-name
:)Awesome article! Thanks a lot!