Paolo Melchiorre

Posted on Dec 12, 2023 • Originally published at paulox.net on Dec 11, 2023

Database generated columns (3): GeoDjango & PostGIS

#django #database #python #postgis

An introduction to database generated columns, using PostgGIS, GeoDjango and the new GeneratedField added in Django 5.0.

⚡ TL;DR

GeneratedField is a new Django 5.0 feature and has these characteristics:

its value is calculated entirely by the database
it works in all supported backend databases

Here is a model example with a GeneratedField to use with GeoDjango and PostgGIS as the database backend:

from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
    GeoHash,
)


class City(models.Model):
    name = models.TextField()
    point = models.PointField()
    geohash = models.GeneratedField(
        expression=GeoHash("point"),
        output_field=models.TextField(),
        db_persist=True,
    )

🚀 Introduction

I've been thinking for a long time that database generated columns would be a killer feature for Django and now they will finally be available in Django 5.0 with the new GeneratedField.

Given my great interest in this functionality, I proposed it to the Django developer mailing list, I tried to help in the revision phase of the Pull Request on the Django code, but above all I tried to test them as much as possible after the merge, trying to highlight any bugs before the release of Django 5.0 as a stable version.

In this article, I want to share with you the work I have done to allow you to experiment with GeneratedField using common model field types using PostGIS as a database backend.

🤖 Generated Field

The GeneratedField are full-fledged fields that can be used in queries, displayed and even indexed but their values cannot be set or modified because they are automatically calculated by the database itself whenever the other fields of the same row are modified.

For this reason, they are very useful for having immediately available values calculated starting from the other fields of the same model as long as these are in the same database table.

The definition of the GeneratedField class requires specifying the expression, output_field, and db_persist attributes.

class GeneratedField(
    expression, output_field, db_persist=None, **kwargs
):
    ...

🕶️ Attributes

`db_persist`

In this article, we're going to use PostGIS as a database backend with persisted columns, specified with db_persist=True, in that the database column will occupy storage as if it were a real column.

`output_field`

We have to specify an explicit model field instance to define the field’s data type and its attributes.

`expression`

The expression attributes is the one we're going to play more with, using various example of Expression that the database will use to automatically set the field value each time the model is changed.

↔️ Migrations

The SQL code generated from the migrations produced in our examples will all have the GENERATED ALWAYS SQL (...) STORED syntax, varying only in the internal logic of the expression.

💻 Set Up

The requirements to experiment with Django GeneratedField are:

🐍 Python

A stable and supported version of Python 3 (tested with Python 3.11-3.12):

$ python3 --version
Python 3.12.0

⚗️ Virtual environment

A Python virtual environment:

$ python3 -m venv ~/.generatedfields
$ source ~/.generatedfields/bin/activate

🦄 Django

The latest version of Django (tested with Django 5.0):

$ python3 -m pip install django==5.0

🐘 Psycopg

We’ll use the Python PostgreSQL database adapter which requires installing psycopg greater than the 3.1.8 version.

$ python3 -m pip install psycopg[binary]~=3.1.8

⚠️ Warning

The psycopg[binary] package is meant for beginners to start playing with Python and PostgreSQL without the need to meet the build requirements. For production use, you are advised to use the "Local installation". ⬀

🌎 GDAL

GDAL (Geospatial Data Abstraction Library) is a mandatory GeoDjango requirement and we need to install its binaries:

on Debian-based GNU/Linux distributions (e.g. Debian 10-12, Ubuntu 20.04-23.04, …):

$ sudo apt install gdal-bin

⚠️ Warning

For other platform-specific instructions read the Installing Geospatial libraries ⬀ in the Django documentation.

🗃️ The `generatedfields` project

To create the generatedfields project I'll switch to my projects directory:

$ cd ~/projects

and then use the startproject Django command:

$ python3 -m django startproject generatedfields

The basic files of our project will be created in the generatedfields directory:

$ tree --noreport generatedfields/
generatedfields/
├── manage.py
└── generatedfields
    ├── asgi.py
    ├── __init__.py
    ├── settings.py
    ├── urls.py
    └── wsgi.py

🎛️ Activating PostGIS

To use Postgres as the database in our Django project, we modify the project database settings, adding the PostGIS engine and the connection parameters of our PostgreSQL database:

generatedfields/generatedfields/settings.py

DATABASES = {
    "default": {
        "ENGINE": "django.contrib.gis.db.backends.postgis",
        "HOST": "<my_database_host>",
        "NAME": "<my_database_name",
        "PASSWORD": "<my_database_password>",
        "PORT": "<my_database_port>",
        "USER": "<my_database_user>",
    }
}

⚠️ Warning

You need to create/activate a PostgreSQL database instance (e.g. with system packages, with Docker or as a remote service) and then replace your credentials in the DATABASE settings.

🧩 Activating GeoDjango

We can now activate GeoDjango by adding the django.contrib.gis module to the INSTALLED_APPS, in our project settings.

generatedfields/generatedfields/settings.py

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "django.contrib.gis",
]

🗂️ The `samples` app

After switching to the generatedfields directory:

$ cd generatedfields

We can create our samples app with the Django startapp command:

$ python3 -m django startapp samples

Again, all the necessary files will be created for us in the samples directory:

$ tree --noreport samples/
samples/
├── admin.py
├── apps.py
├── __init__.py
├── migrations
│   └── __init__.py
├── models.py
├── tests.py
└── views.py

✅ Activating the `samples` app

Now, we have to activate our samples application by inserting its name in the list of the INSTALLED_APPS in the generatedfields settings file.

generatedfields/generatedfields/settings.py

INSTALLED_APPS = [
    "django.contrib.admin",
    "django.contrib.auth",
    "django.contrib.contenttypes",
    "django.contrib.sessions",
    "django.contrib.messages",
    "django.contrib.staticfiles",
    "django.contrib.gis",
    "samples",
]

📚 Sample models

The example models I have experimented with have expressions with complexity that progressively increase from example to example.

We start by combining two fields of the model to create the value for the generated one and then using database functions, combining them, using them with conditions and even generating multiple fields from the same starting field.

These experiments have allowed us to identify some bugs and fix them during this Django 5.0 testing period, but I leave off some edge cases to take into account which we'll mention at the end of the article.

🔧 Useful commands

In the following examples, we will use some commands that we show here to avoid repetition.

`makemigrations`

Generates a new database migration to apply it to our database:

$ python3 -m manage makemigrations --no-header

`sqlmigrate`

See the SQL code that our migration will apply to the database:

$ python3 -m manage sqlmigrate samples <migration-id>

`migrate`

Apply migration to our database with:

$ python3 -m manage migrate

`shell`

Start an interactive Django shell:

$ python3 -m manage shell

📇 A calculated GeoHash field

The geographical field from which we start is the Point. Using a pair of latitude and longitude we can define the space position of an object. We created a City template with a text field to store the name and point field.

GeoDjango provides many functions that can be applied to a geographic field to extract information or apply transformations.

In our case, we created two calculated fields: the first generates the GeoHash starting from the Point field, and the second generates the GeoJSON of the Point field. GeoHash can be very convenient for representing coordinates, while GeoJson could be used to transmit geographic information in a standard way.

generatedfields/samples/models.py

from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
    AsGeoJSON,
    GeoHash,
)
from django.db.models.functions import (
    Cast,
)


class City(models.Model):
    name = models.TextField()
    point = models.PointField()
    geohash = models.GeneratedField(
        expression=GeoHash("point"),
        output_field=models.TextField(),
        db_persist=True,
    )
    geojson = models.GeneratedField(
        expression=Cast(
            AsGeoJSON("point"),
            models.JSONField(),
        ),
        output_field=models.JSONField(),
        db_persist=True,
    )

    def __str__(self):
        return (
            f"{self.name} "
            f"({self.geohash})"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model City
--
CREATE TABLE "samples_city" (
  "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  "name" text NOT NULL,
  "point" geometry(POINT, 4326) NOT NULL,
  "geohash" text GENERATED ALWAYS AS (
    ST_GeoHash("point")
  ) STORED,
  "geojson" jsonb GENERATED ALWAYS AS (
    (ST_AsGeoJSON("point", 8, 0))::jsonb
  ) STORED
);
CREATE INDEX "samples_city_point_id"
ON "samples_city" USING GIST ("point");
COMMIT;

The SQL code generated for PostGIS uses the ST_GeoHash function for the creation of the GeoHash and the ST_AsGeoJSON function for the generation of the GeoJSON to which a cast to JSONB format is applied.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import City
>>> name = "Pescara, IT"
>>> point = "POINT(14.21 42.47)"
>>> City.objects.create(name=name, point=point)
<City: Pescara, IT (sfw0y6venxzrb89yzr2q)>

I added the data of my hometown, Pescara, Italy, simply by providing its latitude and longitude directly using the well-known text (WKT) format.

🔔 Note

In the Django shell, we only show the automatically calculated GeoHash field, but in a very similar way, we could use the GeoJson field in a view to return all the other model instances without having to recalculate the GeoJSON.

📏 A calculated length field

In GeoDjango we can define fields more complex than a point, and in this case, we used a LineStringField to represent a Route, in addition to a text field to store its name.

We added three example calculated fields from the line: the line length field, the field containing the number of points that define the line, and a Point field that defines the center of the line.

The Length field is defined in meters and its use is immediate in the case of a LineStringField. The field that defines the number of points instead denotes the complexity of a line. Finally, the Centroid is a point that more or less defines the center of the line itself, it can be convenient for example to represent it on a map when it is displayed at such a high zoom level as to make the detail of the line useless.

generatedfields/samples/models.py

from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
    Centroid,
    Length,
    NumPoints,
)


class Route(models.Model):
    name = models.TextField()
    line = models.LineStringField()
    length = models.GeneratedField(
        db_persist=True,
        expression=Length("line"),
        output_field=models.FloatField(),
    )
    numpoints = models.GeneratedField(
        db_persist=True,
        expression=NumPoints("line"),
        output_field=models.IntegerField(),
    )
    centroid = models.GeneratedField(
        db_persist=True,
        expression=Centroid("line"),
        output_field=models.PointField(),
    )

    def __str__(self):
        return (
            f"{self.name} "
            f"(~{self.length/1000:.1f} km)"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model Route
--
CREATE TABLE "samples_route" (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "name" text NOT NULL,
    "line" geometry(LINESTRING, 4326) NOT NULL,
    "length" double precision GENERATED ALWAYS AS (
        ST_LengthSpheroid(
            "line",
            'SPHEROID["WGS 84",6378137.0,298.257223563]'
        )
    ) STORED,
    "numpoints" integer GENERATED ALWAYS AS (
        ST_NPoints("line")
    ) STORED,
    "centroid" geometry(POINT, 4326) GENERATED ALWAYS AS (
        ST_Centroid("line")
    ) STORED
);
CREATE INDEX "samples_route_line_id"
ON "samples_route" USING GIST ("line");
COMMIT;

The SQL code generated for PostGIS uses the functions: ST_NPoints to extract the number of points on the line, ST_Centroid to calculate the center point of the line, and ST_LengthSpheroid to calculate the length of the line on a spheroid.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import Route
>>> name = "90 Mile Straight, AU"
>>> line = "LINESTRING(123.944 -32.455, 125.484 -32.27)"
>>> Route.objects.create(name=name, line=line)
<Route: 90 Mile Straight, AU (~146.4 km)>

In this example we have represented the longest straight route in the world as a straight line, it is a stretch of the Eyre Highway between Perth and Adelaide, 90 miles long, in Australia.

🔔 Note

In the Django shell we have only represented the length converted into kilometers, but the other two automatically calculated fields are also very useful, for example, to indicate the complexity of a route or to represent the route via the point that represents its center.

📐 A calculated area field

Moving up the scale of complexity of the geometries that can be represented in GeoDjango, we used a polygon field to approximate the surface of a State in addition to the text field to specify its name.

We have specified three example calculated fields that use the polygon as the base. The area field automatically calculates the surface of the polygon, the bounding box field represents the rectangle containing the polygon and the point field represents the approximation of the polygon center via a point on its surface.

generatedfields/samples/models.py

from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
    Area,
    Envelope,
    PointOnSurface,
)


class State(models.Model):
    name = models.TextField()
    polygon = models.PolygonField()
    area = models.GeneratedField(
        db_persist=True,
        expression=Area("polygon"),
        output_field=models.FloatField(),
    )
    bbox = models.GeneratedField(
        db_persist=True,
        expression=Envelope("polygon"),
        output_field=models.PolygonField(),
    )
    point = models.GeneratedField(
        db_persist=True,
        expression=PointOnSurface(
            "polygon"
        ),
        output_field=models.PointField(),
    )

    def __str__(self):
        return (
            f"{self.name} "
            f"(~{self.area*10000:.0f} km²)"
        )

After running makemigrations, use sqlmigrate and see the SQL code that will be applied to the database:

BEGIN;
--
-- Create model State
--
CREATE TABLE "samples_state" (
    "id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    "name" text NOT NULL,
    "polygon" geometry(POLYGON, 4326) NOT NULL,
    "area" double precision GENERATED ALWAYS AS (
        ST_Area("polygon")
    ) STORED,
    "bbox" geometry(POLYGON, 4326) GENERATED ALWAYS AS (
        ST_Envelope("polygon")
    ) STORED,
    "point" geometry(POINT, 4326) GENERATED ALWAYS AS (
        ST_PointOnSurface("polygon")
    ) STORED
);
CREATE INDEX "samples_state_polygon_8801f79a_id"
ON "samples_state" USING GIST ("polygon");
COMMIT;

Three functions are used in the SQL code generated for PostGIS: ST_Area to calculate the area of the polygon, ST_Envelope to calculate the bounding box containing the polygon and ST_PointOnSurface to calculate the point on the surface.

After running migrate, open a shell and check the model behaves as intended:

>>> from samples.models import State
>>> name = "Colorado, US"
>>> polygon = "POLYGON((-109 37, -109 41, -102 41, -102 37, -109 37))"
>>> State.objects.create(name=name, polygon=polygon)
<State: Colorado, US (~280000 km²)>

In this example, I have inserted an approximation of the polygon representing the US state of Colorado. Although I believe it is the state with a perimeter as close as possible to a rectangle, in reality, it contains several irregularities and this is why the calculated area value is greater than its real size.

🔔 Note

This time too in the Django shell we have only represented one of the calculated fields: the area. The other fields are also very useful for representing a polygon on a map, for example, the bounding box is useful for defining the area in which to position the map so that the entire polygon is visible inside it, and the calculated point is similar to the Centroid defined in the previous example, but it ensures that it is always on the surface in the case of a very irregular polygon, and is very useful for representing the geometry as a point in case we are viewing the map at a very high zoom level.

🗺️ Maps

If you want to view the automatically generated geographic fields on a map, I recommend you follow the instructions I have already published to create a map with Django in my 2-part series "Maps with Django":

🙋 Get involved

Thank you for getting this far with reading this article.

I hope you found it interesting to read the examples I proposed, as it was for me to formulate them and report them here. The motivation behind this work and this article is to collaborate with all Django developers to churn out features that are as solid and stable as possible.

As I have repeatedly invited you to do in my articles and talks, I invite you to get involved.

Try Django 5.0 and experiment with its new features, but above all report any strange behavior or bugs so that they end up being better for everyone.

🥠 Conclusion

As we have seen, the generated fields of Django 5.0 are a very powerful tool, which lend themselves to solving a wide variety of problems.

I think they also have the merit of bringing a lot of logic back into the database, which in most projects is the lowest layer of the project stack.

Together with the database-computed default values they also make it possible to simplify the collaboration of different applications on the same database.

That's all!

— Paolo

⚠️ Disclaimer

This code is for demonstration purposes only and should not be used in production as is. However, the code is released without any guarantee from the author and no liability can be attributed. Use at your own risk.

⚡ TL;DR

🚀 Introduction

🤖 Generated Field

🕶️ Attributes

db_persist

output_field

expression

↔️ Migrations

💻 Set Up

🐍 Python

⚗️ Virtual environment

🦄 Django

🐘 Psycopg

⚠️ Warning

🌎 GDAL

⚠️ Warning

🗃️ The generatedfields project

🎛️ Activating PostGIS

⚠️ Warning

🧩 Activating GeoDjango

🗂️ The samples app

✅ Activating the samples app

📚 Sample models

🔧 Useful commands

makemigrations

sqlmigrate

migrate

shell

📇 A calculated GeoHash field

🔔 Note

📏 A calculated length field

🔔 Note

📐 A calculated area field

🔔 Note

🗺️ Maps

🙋 Get involved

🥠 Conclusion

⚠️ Disclaimer

🔗 Resources

Read next

Explaining defaultdict in Python

[Python-CV2] Image Segmentation : Canny Edges, Watershed, and K-Means Methods

2025 Guide to Architecting an Iceberg Lakehouse

BigQuery's Capacity-Based Compute Pricing Model Explained

`db_persist`

`output_field`

`expression`

🗃️ The `generatedfields` project

🗂️ The `samples` app

✅ Activating the `samples` app

`makemigrations`

`sqlmigrate`

`migrate`

`shell`