Disclaimer: I have submitted the PRs for two of the Breeze features mentioned in this article (the start-airflow
command and the --init-scripts
flag). I feel responsible for your user experience using them, so if you have questions or feedback please reach out to me.
TL;DR
To have Airflow running on your machine do the following:
- Install Docker and Docker Compose
- Clone the Airflow repository
git clone git@github.com:apache/airflow.git
- In the Airflow folder run
./breeze start-airflow
With the first run Breeze creates the folder files/dags
in the repo folder. Adding DAG files in that folder will make them appear in Airflow.
Go to https://localhost:28080 to see your Airflow running.
Intro
If you do not like when food recipes start with pages of blabbing skip this part.
My problem
I started a to write a few blog posts for people who are approaching Python and Apache Airflow for the first time. I needed an quick way for my readers to setup their own Airflow and an even quicker way to explain how to do it.
I wanted something so simple that you and I could focus only on DAG's code. Enter Breeze.
What is Breeze?
Breeze is a command line tool to spin up a dockerized* Airflow instance for development or testing. It can be used to create an environment with specific properties to run tests, before deploying to production. This is pretty cool if you are into CI/CD.
The first time I met Breeze I was working on automating the creation of our own environment to run tests (for a data warehouse, not for Airflow), and I was very intrigued by the idea.
Therefore when I started thinking about how easily have a dev Airflow running, Breeze was on top of my list.
- Dockerized stands for "running in a virtual machine with no impact on your computer (called the host, while the vm is the guest)." Well, no impact beside consuming CPU and RAM 😕
Setup
Prerequisites
I run my Airflow/Breeze using WLS2 on Windows 10, people using a Mac or a Linux machine will have probably a smoother experience than me, but WLS2 with Ubuntu is quite good (if you are on Windows 10 the WLS2 setup is covered here) and Breeze runs more easily in a linux box (or a mac).
What do you need:
- Install Docker
- Install Docker Compose
- Install git (this is usually already installed)
In my case I installed all these tools in my Ubuntu WLS.
Installation and first run
Clone the Airflow repository from Github with:
git clone git@github.com:apache/airflow.git
Once the repo is downloaded go to the Airflow folder and run Breeze:
cd airflow
./breeze start-airflow
Breeze will download a number of docker images and will ask you if you want to build some of them, just say "yes" when asked (you can use the flags --assume-yes
or --assume-no
if you find this annoying). The first build can take few minutes, depending on your internet speed and machine.
If everything goes as expected you should see a screen like this:
"I love it when a plan comes together"
Congratulations, your Airflow is up and running.
If you go to http://localhost:28080/ you will see the Airflow UI. The default credentials are admin/admin
.
Username: admin - Password: admin
How to use this?
What you see are three tmux panes (tmux is a Linux tool to create a terminal session and split it in multiple parts, called panes). In the lower left corner you have the Airflow Scheduler, which takes care of running things, on the right the Webserver is waiting for you to visit the Airflow Web UI. The top pane is to run additional commands.
If you press Ctrl+b
followed by an arrow key you will be able to move between panes. There is not much you need to do in the bottom panes, you can stop the scheduler and the webserver with Ctrl+C. The top one is use the Airflow CLI commands (run airflow --help
if you want to know more).
To get quickly out from tmux run the following command:
./stop_airflow.sh
The purpose of having these three panes is to allow you to observe what is happening in Airflow and in case use the command line interface (although this is for more advanced use cases).
Developing with Breeze
Now that Airflow is running, you can just put your dags in the folder files/dags
created in your Airflow repository folder. If the folder is not there, Breeze will create it. The DAGs could take few minutes to appear on the web UI.
In case a DAG syntax is wrong the bottom left pane (the Webserver one) shows the errors.
Few additional notes:
- In case you run Breeze using an SQLite database as Airflow backend (see below), that database is recreated with every run. In case you want to store Airflow configuration objects (like connections to your databases, users, etc.) use a different backend or use an initialization script (again see below).
- Environment variables can be entered in the file
files/airflow-breeze-config/variables.env
(create it, if not there), these are set preparing the Airflow environment. - In case you want to initialize Airflow, you can put a file called init.sh in the folder
files/airflow-breeze-config
. The instructions in this file will be executed before Airflow Scheduler and Webserver start.
Some details and recipes
The start-airflow
command provides a simple way to start Airflow and monitor it. Behind the scene Breeze initialize the Airflow backend database and create an admin user that can be used to login into the web UI (credential admin/admin
).
Recipe 1 - A persistent backend
As mentioned above the default database is recreated with every execution, if you want to have something more persistent you can use a different backend, for example PostgreSQL:
./breeze start-airflow -b postgres
This is start an additional container with a database dedicated for Airflow. Now your changes will survive a restart.
Recipe 2 - A different Airflow version
By default Breeze will start the most recent version of Airflow (currently 2.0.0dev) which is probably different from what you have in production. The good thing is that Breeze allows you to pick the version you need with another flag:
./breeze start-airflow --install-airflow-version 1.10.10
Of course you can compose multiple flags:
./breeze start-airflow --install-airflow-version 1.10.10 -b postgres
Feel free to go ahead and explore the other possible flags.
Recipe 3 - Initialize Airflow with your own database connection
One way to do it is to use a resilient backend, you can add your connection in the web UI and use it. At least this is what I was doing when I first started using Airflow.
A more interesting approach is to use the optional initialization script for Breeze to create the connection. This will make easier to maintain the connections and other Airflow settings, plus you can store this file in your versioning tool (e.g. git).
Here an example of init.sh
file:
# Connections
airflow connections add \
--conn-login my_user \
--conn-password my_pwd \
--conn-type jdbc \
--conn-host localhost \
--conn-port 9457 \
--conn-extra {} \
my_connection
# Variable
airflow variables set my_variable variable_content
Using this file will create a JDBC connection called my_connection
and a Variable called my_variable
. You can see them in the Web UI clicking on the corresponding section in the menu Admin.
Additional information
The main point of Breeze was to provide an easy way to run automatic tests for the core Airflow developers, the people building Airflow not with Airflow. Breeze's goal is to lay down the foundation to easily run Airflow, taking care of:
- start the needed docker containers
- expose the ports for the Airflow components (e.g. webserver and backend database)
- provide an convenient way to run new code in Airflow (e.g. put the dags in
files/dags
) - eventually run tests
These features were too interesting to leave them just to the core developers ;)
But this is not everything, if you want to know more about the possibilities offered by Breeze I suggest you to take a look at this video (Airflow Breeze - Development and Test environment fro Apache Airflow); it will not make your DAGs better, but will give you more ideas on how to use Breeze and your new dev environment.
Final words
If you are still here, feel free to leave a comment and provide your feedback. I will be happy to assist you and answer your questions (if I am able to).
Shameless plug
In case you need support or assistance feel free to reach out to me in the comment or direct messages. On twitter you can find me with the handler @mucio.
If you need more structured help, the nice people at Untitled Data Company (which includes me) will be happy to help you with all your data needs.
Top comments (5)
Thanks for this very useful post. One q: I am adding the my custom dags inside
files/dags
folder. Does the airflow webserver automatically reloads the changes to dags? Or will I need to stop and start again using breeze?sorry for the late reply, did you find an answer to this?
If I remember correctly I needed to restart airflow when changing the dags, but I feel this was not the expected behaviour
thanks for this guide! it's just what i was looking for.
one question. did you have an issue with the frontend? i can run airflow, but when i go to 127.0.0.1:28080/home the UI looks like its missing all the CSS.
i also get an error message:
Please make sure to build the frontend in static/ directory and restart the server
i'm guessing the frontend is not getting built for some reason?
I never had this problem, I just pulled from git and started breeze and everything looks fine.
Can you try to pull from git and have breeze re-download the docker images?
Ah I think this was it ...
stackoverflow.com/questions/652875...