In this post, we are going to learn how to setup an Airflow Postgres Connection.

This post follows from our earlier post about Airflow Installation. Therefore, I would recommend you to go through that post and setup Airflow in case you have already not done so.

1 – Why Airflow Needs a Backend?

Airflow stores a bunch of meta-data about the DAGs it is managing. On top of that it also has a user management system.

All of that data needs to be stored somewhere.

Airflow interacts with this meta-data using SqlAlchemy. The default installation uses SQLite as the database.

However, SQLite is not a good choice for production deployments. This is because SQLite only works with SequentialExecutor. More on executors in a later post.

2 – Airflow DB Connection Configuration

SqlAlchemy needs to know the Database URL in order to connect. We provide this URL in the airflow.cfg using the sql_alchemy_conn parameter. This parameter is available in the [core] section of the configuration file.

sql_alchemy_conn = sqlite:////Users/saurabhdashora/airflow/airflow.db

We can also set this up using the AIRFLOW__CORE__SQL_ALCHEMY_CONN environment variable.

3 – PostgresSQL Server

As per the official documentation at the time of writing this post, Airflow supports various databases as below:

  • PostgresSQL – 9.6, 10, 11, 12, 13
  • MySQL – 5.7, 8
  • MsSQL – 2017, 2019

We will focus on PostgresSQL in this post.

You can perform a local PostgresSQL installation for testing. I would recommend to use Docker. If you are new to Docker, we have a detailed post about the basics of Docker.

If Docker is setup, we can simply use the below command to start up a Postgres container.

$ docker run --name demo-postgres -p 5432:5432 -e POSTGRES_PASSWORD=password -d postgres

As you can see, nothing special here. We are just trying to start a basic Postgres server and expose it over port 5432.

4 – Setting up the Postgres Database

Once the container is up and running, we need to setup the Postgres DB. Basically, we need to create a database for storing Airflow meta-data.

We will first create airflow_db and a user with airflow_user and airflow_pass. Below SQL commands can help us achieve the same:

CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'airflow_pass';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;

You can also create the database and users using any other connection manager tool such as DBeaver as well.

5 – Configuring Airflow Postgres Connection

The last step is to tell our Airflow installation to connect to the Postgres server instead of using SQLite. In other words, we need to alter the connection configuration in the airflow.cfg file.

We will replace the existing sql_alchemy_conn parameter with the below value:

sql_alchemy_conn = postgresql+psycopg2://airflow_pass:airflow_user@localhost:5432/airflow_db

We are using the psycopg2 package to establish the connection. Unless you have used it before, the subsequent airflow db init command will throw an error as below:

ModuleNotFoundError: No module named 'psycopg2'

To get around this, we have to install the package psycopg2 using the below command.

$ pip install psycopg2

In case this does not work for some reason, we can also install the psycopg2-binary package.

$ pip install psycopg2-binary

Once the package is installed, we need to reinitialize the airflow database. Below is the command:

$ airflow db init

After successful initialization, we can start the airflow web-server and scheduler in the usual manner. However, this time our Airflow installation will use the Postgres as its database backend.

Conclusion

With this, we have learnt how to setup Airflow Postgres Connection using psycopg2 package. In case you are new to Airflow and wish to learn more, you can also check out this post on how to create your first Airflow DAG.

If you have any comments or queries about this post, please feel free to write in the comments section below.

Categories: AirflowBlog

Saurabh Dashora

Saurabh is a Software Architect with over 12 years of experience. He has worked on large-scale distributed systems across various domains and organizations. He is also a passionate Technical Writer and loves sharing knowledge in the community.

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *