Using dbt with CrateDB

Dbt is a tool for transforming data in data warehouses using Python and SQL. The idea is that Data Engineers make source data available to an environment where dbt projects run (for instance with Debezium or with Airflow), and Data Analysts can then run their dbt projects against this data to produce models (tables and views) that can be used with BI tools.
This layer allows the decoupling of the models on which reports and dashboards rely from the source data, and if our business rules or our source systems change we can still maintain the same models as a stable interface.

Some of the things that dbt can do include:

import reference data from csv files
track changes in source data with different strategies so that downstream models do not need to be built every time from scratch
run tests on data, to confirm assumptions remain valid, and to validate any changes made to the models' logic

Due to its unique capabilities, CrateDB is an excellent warehouse choice for data transformation projects. It offers automatic indexing, fast aggregations, easy partitioning, and the ability to scale horizontally. In this article, I will illustrate how to get the most important functionalities of dbt working by doing the necessary changes in the configuration.

Our starting point will be a fresh install of dbt-postgres:

pip install dbt-postgres==1.6.0

We can then create a profiles file with our connection details:

cd ~
mkdir .dbt
cat << EOF > .dbt/profiles.yml
example_datawarehouse_profile:
  target: dev
  outputs:
    dev:
      type: postgres
      host: localhost
      port: 5432
      database: crate
      schema: doc
      search_path: doc
      user: dbt   
      password: pwd1234567A
EOF

(please note the values for database, schema, and search_path in this example)

We will not go into the details of how the project files are structured (for more information check out dbt’s documentation), but in general, a dbt project consists of a combination of SQL, Jinja, YAML, and markdown files. In our project folder, alongside the models folder that most projects have, we can also create a folder called macros where we can place macro overrides.
Let's then create a macros folder and place some files with overrides on it:

mkdir macros
cd macros
wget https://community.crate.io/uploads/short-url/fKupQCFUHtuoKom3jAfKrldUXkt.sql
wget https://community.crate.io/uploads/short-url/qvQExEq1OopiVUcXACLGfpdGHYF.sql
wget https://community.crate.io/uploads/short-url/3jcFxL1EExLrERJSTc6ScnzTS9f.sql
cd ..

A few things I have tested with these overrides:

models with view, table, and ephemeral materializations
dbt source freshness
dbt test
dbt seed
Incremental materializations (with incremental_strategy='delete+insert' and without involving OBJECT columns)

I hope you find this useful. CrateDB is continuously adding new features and I will endeavor to come back and update this article if there are any developments and some of these overrides require changes or become obsolete.

DEV Community

Using dbt with CrateDB

Top comments (0)

Read next

Understanding the Fundamentals of AI

What are the best logging and monitoring tools for .NET apps

AI Testing: A Technical Deep Dive into the Future of Software Quality Assurance

SVG hacks for you