Diego Souza for Devs Norte

Posted on May 20, 2024

Setting up a standalone SQLAlchemy 2.0 ORM application

#python #learning #database #sqlalchemy

SQLAlchemy is a widely used database toolkit for Python, providing a SQL abstraction layer covering most, if not all, your relational database needs. It's often used together with popular frameworks such as Flask and FastAPI, using either its Core library and/or its ORM components.

This article will take a look at how to set up a basic, standalone SQLAlchemy application, meaning we'll take a look into SQLAlchemy itself and skip other frameworks like the ones we just mentioned. We'll explore some of SQAlchemy's foundational concepts, so we better understand how it works and have some insight about how other frameworks may be using it under their hood.

We will also focus on SQLAlchemy's ORM component. As we'll learn along this article, SQLALchemy is divided into a Core and an ORM component, that represent different ways to achieve the same result, that being interacting with a database. This article is ORM-centered. I suggest checking the documentation for a full introduction to the library, and there is also a comprehensive unified tutorial provided by the SQLAlchemy team, which explores both sides in-depth.

Finally, we'll see some very brief CRUD examples, just so we have a basis on data manipulation. Overall the article works as a simple setup guide with some refreshers about the framework. It does not intend to analyze concepts in great depth, nor a full-fledged tutorial on relational databases or data operations for the sake of brevity. I'll leave the final code available in this GitHub repository

With all that being said, let's start with a few key concepts to familiarize ourselves with SQLAlchemy.

1. Key concepts

The first thing to understand about SQLAlchemy is that it is divided into two major parts: the Object Relational Mapper (ORM) and the Core. Basically, they are different ways to implement the same thing: database interactions. While developers tend to choose and stick to one of them when creating applications, it's not impossible to find codebases with both out in the wild. It's a good idea to understand both so you can make educated implementation choices and be able to comprehend the choices made in pre-existing applications.

The Core component contains the foundation for the whole SQLAlchemy's framework, with most of the library's database services and integrations. It's most significant part is the SQL Expression Language, which in fact works as a query builder system. In other words, a toolkit to construct SQL expressions through composable objects, which can then be executed against the database.

The ORM is built on top of this Core foundation. It expands upon it through a domain object model that is mapped to a database schema. This means you can have classes that are mapped to the database tables and columns, and implement different operations in an object-oriented way. This can be further expanded with external tools such as Alembic, which is a migration tool, or marshmallow, which is a serialization tool.

While using an ORM or a query builder can be quite different, both are able to implement the same operations, and SQLAlchemy's ORM is an extension of the Core. As such, both will rely on a few identical fundamental pieces, such as the database connection. We'll begin our exploration by them.

Our starting point will be the engine and the aforementioned connection.

2. Getting started

Let's get some initial setup out of the way, starting by an empty directory and a virtual environment. Create a new directory with any name you like, and a virtual environment with your preferred tool. I personally use Pyenv. It is worth noting that this tutorial was created with Python 3.11.3, and SQLAlchemy version 2.0.27.

pyenv virtualenv 3.11.3 standalone-sqlalchemy
pyenv activate standalone-sqlalchemy

With the virtual environment activated, install SQLAlchemy.

pyenv exec pip install SQLAlchemy
pyenv exec pip freeze > requirements.txt

Create a database subdirectory with an empty __init__.py file, which will contain our files as we progress along the article. The directory tree will look like this:

.
├── database
│   └── __init__.py
└── requirements.txt

3. Creating an engine

The Engine is the starting point of a SQLAlchemy application. It is the central source of connections to the database and holds the application's connection pool. It's independent from either the Core or the ORM and will be used in any and all SQLAlchemy applications.

The usual implementation is to have the engine object created only once in a particular database server. For that, we can use a factory function provided by SQLAlchemy, where we can create and tune database connections. It's the create_engine() function. We use it by passing a database URL as the first positional argument, followed by keyword arguments containing the configuration options we wish to have in place.

The database URL is a usual connection URL. It indicates the database dialect to be used and other required connection parameters. Here's an example:

"dialect+driver://username:password@host:port/database"

The dialect part tells SQLAlchemy what kind of database it will be interacting with. The driver part is pointing to a specific DBAPI implementation, tuned to the database type being used. The DBAPI, in short, is a widely used Python specification providing a low-level system to talk to particular databases. Together, the dialect and driver combination ensures the Python code we write is translated to the correct SQL expressions before they're sent to the database. Changing the dialect changes the SQL translation to other database types without having to modify our Python code.

Refer to this documentation if you want to explore the dialects in detail, including current support level for different database types.

Let's see some examples. We could connect to an in-memory SQLite database:

from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///:memory:")

Or to one in the filesystem:

from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///dbname.db")

On a local port:

from sqlalchemy import create_engine

engine = create_engine("sqlite+pysqlite:///127.0.0.1:3307/dbname")

And so on. It's worth knowing that you may also create database URLs programmatically through the URL.create() constructor. Hardcoding credentials is generally a bad idea, and this constructor is handy for defining credentials through environment variables, for example. The resulting object can be passed to create_engine() directly:

from sqlalchemy import create_engine
from sqlalchemy import URL

url_object = URL.create(
    "postgresql+pg8000",
    username="dbadmin",
    password="zjjk@4863/v",
    host="pghost10",
    database="dbname",
)

engine = create_engine(url_object, echo=True)

This documentation section goes in-depth about the engine options we have available. For the purposes of this article we'll set everything to default, save for an echo=True parameter that causes SQL generated by our code to be logged to stdout. It'll be handy to demonstrate our code being translated to SQL.

4. Connecting

Remember that the engine is independent from both the Core and the ORM, but is used by both implementations. However, there are slight differences in how these implementations use the engine object to establish a connection.

On the Core implementation we usually work with Connection and Result objects. Put simply, the Connection object interacts with the engine to connect to the database. The Result objects are simply a set of database results, generated whenever we perform any given database operation.

The way we tend to use Connection objects is to have them represent a specific resource open against the database (usually a table), such that the scope of operations we want to perform is kept well-defined. We do the by means of Python's with statements, which also leaves the code organized and easy to read. Here's an example:

from sqlalchemy import text
from database.engine import engine

with engine.connect() as conn:
    result = conn.execute(text("SELECT 'hello world'"))
    print(result.all())

Here used a connection call wrapped into the engine object itself and defined a connection context within the with block. This also leads the connection to be automatically closed. We then used the text() construct to write textual SQL, just so we can demonstrate this connection. We perform a SELECT statement and bind the results to the result variable.

Remember we configured our engine with the echo=True setting so we can see the emitted SQL logged to stdout. Here's what running this code results in:

BEGIN (implicit)
select 'hello world'
[...]
[('hello world',)]
ROLLBACK

The SELECT statement is executed, giving the [('hello world',)] result you see above. SQLAlchemy then retrieves this data and returns it as Result objects to the result variable in our code above. SQLAlchemy did all the heavy lift of opening and closing the connection, translating the correct SQL, executing it and parsing the results.

There's another thing of notice on the translated SQL snippet above: the ROLLBACK statement. The reason it's there is because of one major aspect of SQLAlchemy and the underlying Python DBAPI: transactions, by default, are not committed automatically. If you want to persist changes, you must explicitly use the Connection.commit() method:

from sqlalchemy import text

with engine.connect() as conn:
    result = conn.execute(text("select 'hello world'"))
    print(result.all())
    conn.commit()

BEGIN (implicit)
select 'hello world'
[...]
[('hello world',)]
COMMIT

Now, while the Connection pattern I just showed you is typically used on Core-centric implementations, the patterns we used are very similar to what is done on the ORM. The difference is that we'll use Session objects instead of Connection.

The Session is the ORM's fundamental interface for database interaction. It holds the connection and uses with statements just like we saw, but it also has additional features to manage operations for ORM objects. In order to effectively demonstrate this, however, we first need to understand what these ORM objects are. We'll do this in the next section.

5. Setting up mapped models

ORM models, ORM classes, mapped classes or mapped objects are, in short, objects that correspond to a particular database table. Each column is mapped to a class attribute, and each class attribute is implemented in a way that describes the column's properties and constraints. Thus the "mapped objects" designation.

The ORM implements this via Declarative Mapping. Its most common pattern is to define a Base class that inherits from the built-in DeclarativeBase superclass. We then subclass this Base class for each of our database tables. The underlying inheritance chain causes each subclass to acquire ORM mappings and functionalities.

We'll demonstrate this with two simple classes, Artist and Track. First, let's also organize some code into dedicated modules. On the database directory we created earlier, create a engine.py, a models.py, a utils.py and a __init__py file. Our project's directory tree will look like this:

.
├── db
│   ├── __init__.py
│   ├── engine.py
│   ├── models.py
│   └── utils.py
└── requirements.txt

Add the code below to engine.py:

import os
from sqlalchemy import create_engine

current_path = os.path.dirname(os.path.realpath(__file__))
database_url = f"sqlite+pysqlite:///{current_path}/database.db"
engine = create_engine(database_url, echo=True)

This will point the application to the database.bd file, in the same directory as the engine.py file, where we'll keep a SQLite database. Next, add the following bits to the models.py file. I'll explain the classes shortly.

from typing import List
from sqlalchemy import ForeignKey
from sqlalchemy import String
from sqlalchemy.orm import DeclarativeBase
from sqlalchemy.orm import Mapped
from sqlalchemy.orm import mapped_column
from sqlalchemy.orm import relationship


class Base(DeclarativeBase):
    pass

class Artist(Base):
    __tablename__ = "artist"
    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(30))
    tracks: Mapped[List["Track"]] = relationship(
        back_populates="artist", cascade="all, delete-orphan"
    )
    def __repr__(self) -> str:
        return f"Artist(id={self.id!r}, name={self.name!r})"

class Track(Base):
    __tablename__ = "track"
    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str]
    artist_id: Mapped[int] = mapped_column(ForeignKey("artist.id"))
    artist: Mapped["Artist"] = relationship(back_populates="tracks")
    def __repr__(self) -> str:
        return f"Track(id={self.id!r}, title={self.title!r})"

Here we're defining a Base class that inherits from DeclarativeBase, and then defining the Artist and Track subclasses. As discussed earlier, this informs SQLAlchemy that Artist and Track are mapped ORM objects, providing them with all instrumentation needed to interact with the database.

Because of this, we are also able to interact with the table schemas themselves. This is exemplified by the __tablename__ class-level attribute, which defines table names for the respective ORM classes.

The other class attributes are each mapped to a table column. This mapping, more specifically, is established by the mapped_column() directive. Alongside it, the Mapped typing is used by SQLAlchemy to derive the corresponding SQL types and their nullability constraints.

The Column.primary_key parameter, as the name implies, identifies the columns defined as the table's primary key. Table relationships can be defined through the the relationship() construct. We won't go in depth, but relationships and mapped columns have quite a lot of options and arguments that help define every aspect of the underlying relationship. Be sure to check the documentation if you need more detail on this.

Finally, we come to the concept of table metadata. In this context, table metadata is a SQLAlchemy construct defined by our mapped classes, which describes what each underlying database table looks like. This construct is defined by both class-level attributes such as __tablename__ and the attributes that are mapped to table columns. The metadata enables us to interact with the database in some interesting ways. Let's demonstrate some by adding the helper functions below to the utils.py file:

from database import models
from database.engine import engine


def create_database() -> None:
    models.Base.metadata.create_all(engine)

def drop_database() -> None:
    models.Base.metadata.drop_all(engine)

def reset_database() -> None:
    drop_database()
    create_database()

create_database() will create all tables described in the mapped classes in the models module. If the file pointed to in the engine does not yet exist, it will be automatically created. drop_database() will do the opposite, leaving us with an empty database (but still keep the file). reset_database() will drop and re-create the database leaving us with empty tables, which is useful if you want to play around with the mapped objects. Try opening a Python shell in your virtual environment and executing the first function. We'll be able to visualize the emitted SQL due to the echo=True property we set on the engine:

PRAGMA main.table_info("artist")
...
PRAGMA temp.table_info("track")
...
CREATE TABLE artist (
    id INTEGER NOT NULL, 
    name VARCHAR(30) NOT NULL, 
    PRIMARY KEY (id)
)
...

CREATE TABLE track (
    id INTEGER NOT NULL, 
    title VARCHAR NOT NULL, 
    artist_id INTEGER NOT NULL, 
    PRIMARY KEY (id), 
    FOREIGN KEY(artist_id) REFERENCES artist (id)
)
...
COMMIT

Note that handling the schema directly like will will cause changes to be auto-committed. The following statements are logged to stdout as result from running the drop function, showing the reverse operation.

BEGIN (implicit)
PRAGMA main.table_info("artist")
...
PRAGMA main.table_info("track")
... 
DROP TABLE track
...
DROP TABLE artist
...
COMMIT

With functional mapped classes, we can return to the matter of using Session objects for out connection. We'll see this in practice, as we go over some basic CRUD operations on the next section, where we'll also get the hang of data manipulation.

6. Basic CRUD examples

Going forth, we will be adding, fetching, updating and deleting records from the database by means of Session objects. Let us revise some key points about them:

Sessions are how the SQLAlchemy ORM connects to the database and performs transactions.
A Session object connects to the database via the Engine object, where we configure various connection parameters.
Transactions are to be executed within a session context, which is usually done through Python's with statements.
Transactions are not committed automatically, requiring us to call Session.commit() to persist changes.

There's a lot more detail we could talk about sessions, such as how the engine connection is handled or how to create session factories with sessionmaker. We'll stick to the basics for brevity, but you may start exploring these topics in this part of the documentation.

With that being said, let's dig in.

Create

To add rows to a table, in this example, we'll be using session contexts to create instances of our mapped objects. Then, we use Session.add() and Session.commit() to save them. Let's demonstrate this with our Artist model. In the database directory, create a operations.py file:

.
├── database
│   ├── __init__.py
│   ├── engine.py
│   ├── models.py
│   ├── operations.py
│   └── utils.py
└── requirements.txt

And add the following bits:

from sqlalchemy.orm import Session
from database.engine import engine
from database.models import Artist, Track
# from database.utils import reset_database


def create_artist(name: str) -> None:
    with Session(engine) as session:
      artist = Artist(name=name)
      session.add(artist)
      session.commit()

def populate_artists() -> None:
    create_artist("David Bowie")
    create_artist("Queen")
    create_artist("Iggy Pop")

The commented-out reset_database method above can be used to reset the database if at any point you want to return it to a blank slate with empty tables.

We can create mapped object instances simply by passing named arguments to our mapped classes. By doing this within a session context, and then committing the changes, will lead ORM to do all the work of adding new database records for each of the instances we created. Adding data is as simple as that. This can be verified by looking at the SQL statements logged to stdout. Here's an excerpt for the snippet above:

INSERT INTO artist (name) VALUES ('David Bowie')
[...] ('David Bowie',)
COMMIT

We'll discuss reading data on the next section, but here's a sneak peek so I can show you some additional details into one of the rows we just added:

from sqlalchemy import select


def find_bowie():
    with Session(engine) as session:
        query = select(Artist).filter_by(name="David Bowie")
        result = session.execute(query)
        print(result)

Here's the print statement output:

[(Artist(id=1, name='David Bowie'),)]

We can confirm the David Bowie row has been saved successfully, and also that a primary key was automatically assigned to it. By default, primary key columns will be auto-incrementing integers starting at 1, unless configured otherwise.

Also, not the contents of the result variable. It contains a list with a tuple-like data structure. This tuple is actually a SQLAlchemy data structure that we'll come back to in the next section.

Let's shift up a gear. Replace the find_bowie function with the code below:

from sqlalchemy import Row
from typing import Tuple


def find_artist_by_name(name: str) -> Row[Tuple[Artist]] | None:
    with Session(engine) as session:
        query = select(Artist).filter_by(name=name)
        result = session.execute(query)
        return result.first()

def find_track_by_title(title: str) -> (Track | None):
    with Session(engine) as session:
        query = select(Track).filter_by(title=title)
        result = session.scalars(query)
        return result.first()

def create_track(title: str, artist: Artist) -> None:
    with Session(engine) as session:
        track = Track(title=title, artist=artist)
        session.add(track)
        session.commit()

def populate_bowie_tracks() -> None:
    result = find_artist_by_name("David Bowie")
    if result is None:
        return

    bowie = result[0]
    with Session(engine) as session:
        track1 = Track(title="Under Pressure", artist=bowie)
        track2 = Track(title="Starman", artist=bowie)
        track3 = Track(title="Heroes", artist=bowie)
        track4 = Track(title="Space Oddity", artist=bowie)
        track5 = Track(title="Rebel Rebel", artist=bowie)

        session.add(track1)
        session.add(track2)
        session.add(track3)
        session.add(track4)
        session.add(track5)

        session.commit()

def populate_queen_tracks() -> None:
    result = find_artist_by_name("Queen")
    if result is None:
        return

    queen = result[0]
    with Session(engine) as session:
        track1 = Track(title="Bohemian Rhapsody", artist=queen)
        track2 = Track(title="Don't Stop Me Now", artist=queen)
        track3 = Track(title="We Will Rock You", artist=queen)
        track4 = Track(title="We Are The Champions", artist=queen)
        track5 = Track(title="I Want To Break Free", artist=queen)

        session.add(track1)
        session.add(track2)
        session.add(track3)
        session.add(track4)
        session.add(track5)

        session.commit()

def populate_iggy_tracks() -> None:
    result = find_artist_by_name("Iggy Pop")
    if result is None:
        return

    iggy = result[0]
    with Session(engine) as session:
        track1 = Track(title="The Passenger", artist=iggy)
        track2 = Track(title="Lust For Life", artist=iggy)
        track3 = Track(title="Candy", artist=iggy)
        track4 = Track(title="Livin' On The Edge Of The Night", artist=iggy)
        track5 = Track(title="Hold The Line", artist=iggy)

        session.add(track1)
        session.add(track2)
        session.add(track3)
        session.add(track4)
        session.add(track5)

        session.commit()

def populate_tracks() -> None:
    populate_bowie_tracks()
    populate_queen_tracks()
    populate_iggy_tracks()

This gives us a proper artist search function which returns either the searched artist or None if it is not found. There's also a track creation function, and some utility functions to populate our tables with some data.

There's also a relationship creation example in there. You can create one by passing an existing mapped object instance, to another mapped object instance, to establish a relationship. Note that this is done specifically in the attribute defined as a relationship, or in this case the artist attribute in the Track class.

Alternatively, we could achieve the same by passing an instance's primary key to an attribute defined as a foreign key:

Track(title="Ziggy Stardust", artist_id=bowie.id)

Now that we have added some data, let's explore querying.

Read

Querying is a wide topic, with many strategies available in SQLAlchemy, such as filtering, ordering, joins and others. Whichever the case, we'll usually use a combination of Sessions, mapped objects and the SQL Expression Language system.

Once again, we'll stick to the basics for brevity. To start, add this bit to operations.py:

def find_artists() -> None:
    with Session(engine) as session:
        query = select(Artist)
        result = session.execute(query)
        print(result.all())

Here's the printed statement:

[
    (Artist(id=1, name='David Bowie'),),
    (Artist(id=2, name='Queen'),),
    (Artist(id=3, name='Iggy Pop'),)
]

Remember that whenever we execute a SQL statement against the database (in this case a SELECT statement through the select function), it generates a Result object. It represents a set of database results for the performed operation, from which we can retrieve the resulting data.

There are a few ways retrieving this data. One is calling the .all() method like we did in the snippet above. It returns a list of Row objects. The Row object represents a single database row, that is, a table record.

We can also use the .fist() method to extract just the first row from the Result object, or None if it doesn't exists:

print(find_artist_by_name("Lou Reed"))

# Output:
None

Row and Result objects are used on both Core or ORM implementations. However, when we are using mapped ORM objects, rows may be mapped to one of our ORM classes. For example, let's check data types at different levels: the Row object itself, and the data within.

def find_artists() -> None:
    with Session(engine) as session:
        query = select(Artist)
        result = session.execute(query)
        first_artist = result.first()
        print(type(first_artist))
        print(type(first_artist[0]))
        print(first_artist[0])

# First print output:
<class 'sqlalchemy.engine.row.Row'>
# Second print output:
<class 'database.models.Artist'>
# Third print output:
Artist(id=1, name='David Bowie')

Better explaining the outputs above:

When we perform any given operation in the database, we get Result objects wrapping the obtained data. We got one on the result variable.
When we call .first(), .all() or any such methods of the Result object, we get access to one or more Row objects contained in it. These objects usually represent table rows.
Because we're working with the ORM, the data within the Row objects was transformed to an instance of the mapped ORM class we just queried upon.

Because the first_artist[0] is a mapped object instance, we can access any of its attributes to get the corresponding data:

print(first_artist[0].id)
# Output:
1

print(first_artist[0].name)
# Output:
"David Bowie"

Remember that we defined a relationship between artists and tracks. Try accessing the tracks attribute:

print(first_artist[0].tracks)

# Output:
[
    Track(id=1, title='Under Pressure'),
    Track(id=2, title='Starman'),
    Track(id=3, title='Heroes'),
    Track(id=4, title='Space Oddity'),
    Track(id=5, title='Rebel Rebel')
]

If you look back at how we defined this relationship in the Artist and Track classes you'll see that, in addition to a foreign key attribute in the Track class, we added relationship attributes with a back_populate parameter in both of them. The effect of this is that relationships are kept synchronized and easily accessible, which is a nice ORM feature.

The next thing I'd like to introduce to you is the concept of scalars. No doubt you noticed we needed to use some intricate index syntax above (first_artist[0]), to access the mapped ORM instance from the Row object. You might have foreseen even worse situations, like this:

all_artists = result.all()
desired_record = all_artists[1][0]

Scalars are a way to avoid this by bypassing the Row objects entirely, getting direct access to the mapped instances. We can do this in two ways. Either call a .scalars() method in a result object:

def find_artists() -> None:
    with Session(engine) as session:
        query = select(Artist)
        result = session.execute(query)
        all_artists = result.scalars().all() 
        print(all_artists)

Or use Session.scalars() to execute the operation, which is often preferred:

def find_artists() -> None:
    with Session(engine) as session:
        query = select(Artist)
        result = session.scalars(query)
        all_artists = result.all()
        print(all_artists)

Both will lead to the same result:

[
    Artist(id=1, name='David Bowie'),
    Artist(id=2, name='Queen'),
    Artist(id=3, name='Iggy Pop')
]

As you can see, we skipped the Row objects entirely and obtained mapped objects instances directly. This is a very common ORM pattern and reduces code complexity by a fair amount.

Still, you may be asking whether Rows still have use cases. The answer is mostly yes, although it's more common outside the ORM. That's because scalars only work like with queries that return ORM objects. You may encounter scenarios where this is not the case, however. A good example of this is selecting specific table columns, common when doing joins. The snippet below demonstrates this. It uses Row objects to pick data coming from from multiple tables:

def find_tracks() -> None:
    with Session(engine) as session:
        query = select(Artist.name, Track.title).join(Artist.tracks)
        result = session.execute(query)
        for row in result:
            print(f"Track: {row.title}, by {row.name}")

Track: Under Pressure, by David Bowie
Track: Starman, by David Bowie
Track: Heroes, by David Bowie
Track: Space Oddity, by David Bowie
Track: Rebel Rebel, by David Bowie
Track: Bohemian Rhapsody, by Queen
Track: Don't Stop Me Now, by Queen
Track: We Will Rock You, by Queen
Track: We Are The Champions, by Queen
Track: I Want To Break Free, by Queen
Track: The Passenger, by Iggy Pop
Track: Lust For Life, by Iggy Pop
Track: Candy, by Iggy Pop
Track: Livin' On The Edge Of The Night, by Iggy Pop
Track: Hold The Line, by Iggy Pop

If we call session.scalars() instead of the session.execute() in the code above, it runs but the Row object casting does not work as expected. It erroneously become a string:

def find_tracks() -> None:
    with Session(engine) as session:
        query = select(Artist.name, Track.title).join(Artist.tracks)
        result = session.scalar(query)
        print(result)

# Output:
"David Bowie"

So we lost ORM capabilities in exchange for constructing a tailored dataset. While we could still get the same data while sticking to the ORM, this example serves to demonstrate the usability of Row objects. Nevertheless, it's always a good idea to keep the codebase concise and reduce complexity to the minimum. It's better to choose a paradigm and stick to it unless the need really arises.

TO wrap this section up, we saw that Result, Row and Scalar objects are the base blocks we need to understand how querying works in SQLAlchemy. Knowing this makes it easier to read and dive deeper into the ORM querying guide and other related documentation. There's still a lot of querying topics such as joins, set operations, pagination, ordering, subqueries and others that we won't have the time to explore.

Finally before we move on, let's tidy up our operations.py file and make use of scalars in our functions:

from sqlalchemy import select
from sqlalchemy.orm import Session
from typing import Sequence

from database.engine import engine
from database.models import Artist, Track


def find_artists() -> Sequence[Artist]:
    with Session(engine) as session:
        query = select(Artist)
        result = session.scalars(query)
        return result.all()

def find_artist_by_name(name: str) -> (Artist | None):
    with Session(engine) as session:
        query = select(Artist).filter_by(name=name)
        result = session.scalars(query)
        return result.first()

def find_tracks() -> Sequence[Track]:
    with Session(engine) as session:
        query = select(Track).where(Track.id < 10)
        result = session.scalars(query)
        return result.all()

def find_track_by_title(title: str) -> (Track | None):
    with Session(engine) as session:
        query = select(Track).filter_by(title=title)
        result = session.scalars(query)
        return result.first()

def create_artist(name: str) -> None:
    with Session(engine) as session:
        artist = Artist(name=name)
        session.add(artist)
        session.commit()

def create_track(title: str, artist: Artist) -> None:
    with Session(engine) as session:
        track = Track(title=title, artist=artist)
        session.add(track)
        session.commit()

def populate_artists() -> None:
    create_artist("David Bowie")
    create_artist("Queen")
    create_artist("Iggy Pop")

def populate_bowie_tracks() -> None:
    bowie = find_artist_by_name("David Bowie")
    if bowie is None:
        return

    with Session(engine) as session:
        track1 = Track(title="Under Pressure", artist=bowie)
        track2 = Track(title="Starman", artist=bowie)
        track3 = Track(title="Heroes", artist=bowie)
        track4 = Track(title="Space Oddity", artist=bowie)
        track5 = Track(title="Rebel Rebel", artist=bowie)

        session.add(track1)
        session.add(track2)
        session.add(track3)
        session.add(track4)
        session.add(track5)

        session.commit()

def populate_queen_tracks() -> None:
    queen = find_artist_by_name("Queen")
    if queen is None:
        return

    with Session(engine) as session:
        track1 = Track(title="Bohemian Rhapsody", artist=queen)
        track2 = Track(title="Don't Stop Me Now", artist=queen)
        track3 = Track(title="We Will Rock You", artist=queen)
        track4 = Track(title="We Are The Champions", artist=queen)
        track5 = Track(title="I Want To Break Free", artist=queen)

        session.add(track1)
        session.add(track2)
        session.add(track3)
        session.add(track4)
        session.add(track5)

        session.commit()

def populate_iggy_tracks() -> None:
    iggy = find_artist_by_name("Iggy Pop")
    if iggy is None:
        return

    with Session(engine) as session:
        track1 = Track(title="The Passenger", artist=iggy)
        track2 = Track(title="Lust For Life", artist=iggy)
        track3 = Track(title="Candy", artist=iggy)
        track4 = Track(title="Livin' On The Edge Of The Night", artist=iggy)
        track5 = Track(title="Hold The Line", artist=iggy)

        session.add(track1)
        session.add(track2)
        session.add(track3)
        session.add(track4)
        session.add(track5)

        session.commit()

def populate_tracks() -> None:
    populate_bowie_tracks()
    populate_queen_tracks()
    populate_iggy_tracks()

Update

Updating records is really easy. All we need is a mapped object instance, usually obtained after querying the database in any way. If we change one of its attributes within a session context and commit the changes, the ORM will save them in the database. To demonstrate, let's create two update functions for our classes:

def update_artist(artist: Artist, name: str) -> None:
    with Session(engine) as session:
        artist.name = name
        session.add(artist)
        session.commit()

def update_track(track: Track, title: str, artist_id: int) -> None:
    with Session(engine) as session:
        track.title = title
        track.artist_id = artist_id
        session.add(track)
        session.commit()

These methods take a mapped instance of one of the models and their corresponding attributes as arguments. Now, assume we create an artist with a typo, for example:

def create_m83():
    with Session(engine) as session:
        create_artist("m83")
        m83 = find_artist_by_name("m83")
        print(m83)

# Output:
# Artist(id=4, name='m83')

All we need is to pass the instance and updated data to the update function:

def fix_m83():
    with Session(engine) as session:
        m83 = find_artist_by_name("m83")
        print(m83)

        update_artist(m83, "M83")
        m83 = find_artist_by_name("M83")
        print(m83)

        # First output:
        # Artist(id=4, name='m83')

        # Second output:
        # Artist(id=4, name='M83')

It's similar if we want to update a relationship:

def create_midnight_city():
    # Here we create Midnight City with a typo and the wrong artist
    with Session(engine) as session:
        artist = find_artist_by_name("David Bowie")
        create_track("Midnight CIty", artist)

    # Here we query the track and its artist to show data currently in the database
    with Session(engine) as session:
        track = find_track_by_title("Midnight CIty")
        artist = session.scalars(
            select(Artist).where(Artist.id == track.artist_id)
        ).first()
        print(track)
        print(artist)

        # First output:
        # Track(id=16, title='Midnight CIty')
        # Second output:
        # Artist(id=1, name='David Bowie')

def fix_midnight_city():
    with Session(engine) as session:
        artist = find_artist_by_name("M83")
        track = find_track_by_title("Midnight CIty")
        update_track(track, "Midnight City", artist_id=artist.id)

        track = find_track_by_title("Midnight City")
        artist = session.scalars(
            select(Artist).where(Artist.id == track.artist_id)
        ).first()
        print(track)
        print(artist)

        # First output:
        # Track(id=16, title='Midnight City')
        # Second output:
        # Artist(id=4, name='M83')

Surely, these methods are just for demonstration purposes and there are better implementations. This is a good opportunity to play around with them to practice querying and modifying data. Save for the update methods, they will be discarded from the final code.

Delete

Like updating, deletion operations will require a mapped object instance and a session context. Then you call session.delete() and session.commit():

def delete_artist(artist: Artist) -> None:
    with Session(engine) as session:
        session.delete(artist)
        session.commit()

def delete_track(track: Track) -> None:
    with Session(engine) as session:
        session.delete(track)
        session.commit()

Let's try deleting an artist:

with Session(engine) as session:
    m83 = find_artist_by_name("M83")
    print(m83)

    delete_artist(m83)
    session.commit()

# Output:
# Artist(id=4, name='m83')

Now if you try finding the same artist again:

with Session(engine) as session:
    m83 = find_artist_by_name("M83")
    print(m83)

# Output:
# None

Now try running the find_tracks function. You'll notice any tracks from the deleted artist are no longer present in the database. This is because of the deletion constraints we defined in our Artist class, specifically:

tracks: Mapped[List["Track"]] = relationship(
    back_populates="artist", cascade="all, delete-orphan"
)

Which defines an ON DELETE CASCADE behavior when deleting from a parent table. For more details on cascades, check this documentation.

This covers the basic CRUD operations for our application. Next, let's wrap up.

7. Wrapping up

This article has explored the basics of SQLAlchemy and leaves you with a very simple ORM application. From here you're able to head to the documentation to analyze the concepts we discussed in more detail, as well as delve into more advanced topics. I'll leave you with a list of ideas to try and learn more, in no particular order. Write an application of your own, or feel free to reuse the code we have created. It's available here. Best of luck!

Learn more about sessions. Learn about sessionmaker, try refactoring the application around it.
Add more mapped ORM classes with different column definitions. Learn the available data types and constraints, and how to define them.
Try creating different kinds of relationships and testing their behavior.
Write more complex queries. Try sorting, filtering and pagination. Try performing searches that involve relationships.
Head over to Alembic and learn about migrations. Think how you could implement them.
Try writing serializers to convert instances or you mapped objects to simple Python dictionaries. For a whole framework dedicated to this purpose, head over to marshmallow.

Top comments (2)

Charles Brown • May 23 '24

This is a fantastic guide! How would you recommend handling large-scale migrations with SQLAlchemy for complex applications?

Diego Souza • May 29 '24

Hey there! Specific approaches tend to vary with each database, so I'll be better able to provide some guidelines.

Backups are the most important, especially with anything large-scale and on databases with pre-existing data. Ensure data is safe and recoverable before applying any extensive operations.

Keep development and production environments separate. Ideally, we would be working with a separate database instance, in complete isolation from production data. This could be a remote database on another URL or a local instance using tools like Docker Compose to replicate the production environment. This way, you have somewhere to run and test migrations without worry of breaking things.

Having the development database full of data can be very useful to visualize the end results of your changes. I've worked in companies that provided developers with a database dump to populate their local instances with, which was great. However, this is not always an option. A workaround is to create seeding scripts that populate tables with dummy data for you, similar to what we did in the article. Some codebases may already have seeding scripts ready to use.

Finally, we come to handling the migrations themselves. As mentioned, the details will depend on your specific database, but some common approaches I find useful are:

Perform assessment and planning. What is the current state of the application, and what do you need to change? Do the changes come across multiple domains of the application?
- It may be interesting to break down cross-domain migrations into a few smaller ones, which perform related sets of changes. To use an example similar to the application we made in this article: say we have a music festival ticket purchase platform. We need to change some tables so a developer can add a new payment method. In addition to that, another team is working on a "tour" feature where users can track their favorite artists going across multiple festivals, and you need to create some new tables for that. You could therefore create a "payments" migration and a "tours" migration, so that the database changes can be analyzed, implemented, tested, and fixed in context.
Do the changes involve only schema migrations, or also include data migrations? In such cases I often separate data operations in their own migration, and write data migrations myself.
- Some migration tools were designed with schema migrations in mind. While they have some data migration helpers, their implementation is left to the developer's discretion. This includes Alembic (see here).
Ensure migrations are reversible. That's part of their purpose. You may come across scenarios where, even with thorough planning and testing, your migration still has issues or the implementation details change. In such cases, it's crucial to be able to rollback the database to its previous state and retry.
- Despite what I just said, there are migrations that inevitably result in data loss, such as some data migrations or dropping tables or columns. In such cases, it's important to discuss the loss with the team and ensure the loss is acknowledged by the relevant parts.
- Data loss may become a problem even after discussing it beforehand. That's where the backups come in. It's a good idea to perform a backup before sending any changes to production.
Migration tools like Alembic and Django's ORM have migration auto-generation tools that create the migration scripts for you. Be sure to review auto-generated scripts thoroughly and ensure they conform to your intended changes.