An introduction to database generated columns, using PostgGIS, GeoDjango and the new GeneratedField added in Django 5.0.
⚡ TL;DR
GeneratedField
is a new Django 5.0 feature and has these characteristics:
- its value is calculated entirely by the database
- it works in all supported backend databases
Here is a model example with a GeneratedField
to use with GeoDjango and PostgGIS as the database backend:
from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
GeoHash,
)
class City(models.Model):
name = models.TextField()
point = models.PointField()
geohash = models.GeneratedField(
expression=GeoHash("point"),
output_field=models.TextField(),
db_persist=True,
)
🚀 Introduction
I've been thinking for a long time that database generated columns would be a killer feature for Django and now they will finally be available in Django 5.0 with the new GeneratedField
.
Given my great interest in this functionality, I proposed it to the Django developer mailing list, I tried to help in the revision phase of the Pull Request on the Django code, but above all I tried to test them as much as possible after the merge, trying to highlight any bugs before the release of Django 5.0 as a stable version.
In this article, I want to share with you the work I have done to allow you to experiment with GeneratedField
using common model field types using PostGIS as a database backend.
🤖 Generated Field
The GeneratedField
are full-fledged fields that can be used in queries, displayed and even indexed but their values cannot be set or modified because they are automatically calculated by the database itself whenever the other fields of the same row are modified.
For this reason, they are very useful for having immediately available values calculated starting from the other fields of the same model as long as these are in the same database table.
The definition of the GeneratedField
class requires specifying the expression
, output_field
, and db_persist
attributes.
class GeneratedField(
expression, output_field, db_persist=None, **kwargs
):
...
🕶️ Attributes
db_persist
In this article, we're going to use PostGIS as a database backend with persisted columns, specified with db_persist=True
, in that the database column will occupy storage as if it were a real column.
output_field
We have to specify an explicit model field instance to define the field’s data type and its attributes.
expression
The expression
attributes is the one we're going to play more with, using various example of Expression
that the database will use to automatically set the field value each time the model is changed.
↔️ Migrations
The SQL code generated from the migrations produced in our examples will all have the GENERATED ALWAYS SQL (...) STORED
syntax, varying only in the internal logic of the expression.
💻 Set Up
The requirements to experiment with Django GeneratedField
are:
🐍 Python
A stable and supported version of Python 3 (tested with Python 3.11-3.12):
$ python3 --version
Python 3.12.0
⚗️ Virtual environment
A Python virtual environment:
$ python3 -m venv ~/.generatedfields
$ source ~/.generatedfields/bin/activate
🦄 Django
The latest version of Django (tested with Django 5.0):
$ python3 -m pip install django==5.0
🐘 Psycopg
We’ll use the Python PostgreSQL database adapter which requires installing psycopg
greater than the 3.1.8
version.
$ python3 -m pip install psycopg[binary]~=3.1.8
⚠️ Warning
The psycopg[binary]
package is meant for beginners to start playing with Python and PostgreSQL without the need to meet the build requirements. For production use, you are advised to use the "Local installation". ⬀
🌎 GDAL
GDAL
(Geospatial Data Abstraction Library) is a mandatory GeoDjango requirement and we need to install its binaries:
- on Debian-based GNU/Linux distributions (e.g. Debian 10-12, Ubuntu 20.04-23.04, …):
$ sudo apt install gdal-bin
⚠️ Warning
For other platform-specific instructions read the Installing Geospatial libraries ⬀ in the Django documentation.
🗃️ The generatedfields
project
To create the generatedfields
project I'll switch to my projects
directory:
$ cd ~/projects
and then use the startproject
Django command:
$ python3 -m django startproject generatedfields
The basic files of our project will be created in the generatedfields
directory:
$ tree --noreport generatedfields/
generatedfields/
├── manage.py
└── generatedfields
├── asgi.py
├── __init__.py
├── settings.py
├── urls.py
└── wsgi.py
🎛️ Activating PostGIS
To use Postgres as the database in our Django project, we modify the project database settings, adding the PostGIS engine and the connection parameters of our PostgreSQL database:
generatedfields/generatedfields/settings.py
DATABASES = {
"default": {
"ENGINE": "django.contrib.gis.db.backends.postgis",
"HOST": "<my_database_host>",
"NAME": "<my_database_name",
"PASSWORD": "<my_database_password>",
"PORT": "<my_database_port>",
"USER": "<my_database_user>",
}
}
⚠️ Warning
You need to create/activate a PostgreSQL database instance (e.g. with system packages, with Docker or as a remote service) and then replace your credentials in the DATABASE settings.
🧩 Activating GeoDjango
We can now activate GeoDjango by adding the django.contrib.gis
module to the INSTALLED_APPS
, in our project settings.
generatedfields/generatedfields/settings.py
INSTALLED_APPS = [
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
"django.contrib.sessions",
"django.contrib.messages",
"django.contrib.staticfiles",
"django.contrib.gis",
]
🗂️ The samples
app
After switching to the generatedfields
directory:
$ cd generatedfields
We can create our samples
app with the Django startapp
command:
$ python3 -m django startapp samples
Again, all the necessary files will be created for us in the samples
directory:
$ tree --noreport samples/
samples/
├── admin.py
├── apps.py
├── __init__.py
├── migrations
│ └── __init__.py
├── models.py
├── tests.py
└── views.py
✅ Activating the samples
app
Now, we have to activate our samples
application by inserting its name in the list of the INSTALLED_APPS
in the generatedfields
settings file.
generatedfields/generatedfields/settings.py
INSTALLED_APPS = [
"django.contrib.admin",
"django.contrib.auth",
"django.contrib.contenttypes",
"django.contrib.sessions",
"django.contrib.messages",
"django.contrib.staticfiles",
"django.contrib.gis",
"samples",
]
📚 Sample models
The example models I have experimented with have expressions with complexity that progressively increase from example to example.
We start by combining two fields of the model to create the value for the generated one and then using database functions, combining them, using them with conditions and even generating multiple fields from the same starting field.
These experiments have allowed us to identify some bugs and fix them during this Django 5.0 testing period, but I leave off some edge cases to take into account which we'll mention at the end of the article.
🔧 Useful commands
In the following examples, we will use some commands that we show here to avoid repetition.
makemigrations
Generates a new database migration to apply it to our database:
$ python3 -m manage makemigrations --no-header
sqlmigrate
See the SQL code that our migration will apply to the database:
$ python3 -m manage sqlmigrate samples <migration-id>
migrate
Apply migration to our database with:
$ python3 -m manage migrate
shell
Start an interactive Django shell:
$ python3 -m manage shell
📇 A calculated GeoHash field
The geographical field from which we start is the Point. Using a pair of latitude and longitude we can define the space position of an object. We created a City template with a text field to store the name and point field.
GeoDjango provides many functions that can be applied to a geographic field to extract information or apply transformations.
In our case, we created two calculated fields: the first generates the GeoHash starting from the Point field, and the second generates the GeoJSON of the Point field. GeoHash can be very convenient for representing coordinates, while GeoJson could be used to transmit geographic information in a standard way.
generatedfields/samples/models.py
from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
AsGeoJSON,
GeoHash,
)
from django.db.models.functions import (
Cast,
)
class City(models.Model):
name = models.TextField()
point = models.PointField()
geohash = models.GeneratedField(
expression=GeoHash("point"),
output_field=models.TextField(),
db_persist=True,
)
geojson = models.GeneratedField(
expression=Cast(
AsGeoJSON("point"),
models.JSONField(),
),
output_field=models.JSONField(),
db_persist=True,
)
def __str__(self):
return (
f"{self.name} "
f"({self.geohash})"
)
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model City
--
CREATE TABLE "samples_city" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"name" text NOT NULL,
"point" geometry(POINT, 4326) NOT NULL,
"geohash" text GENERATED ALWAYS AS (
ST_GeoHash("point")
) STORED,
"geojson" jsonb GENERATED ALWAYS AS (
(ST_AsGeoJSON("point", 8, 0))::jsonb
) STORED
);
CREATE INDEX "samples_city_point_id"
ON "samples_city" USING GIST ("point");
COMMIT;
The SQL code generated for PostGIS uses the ST_GeoHash
function for the creation of the GeoHash and the ST_AsGeoJSON
function for the generation of the GeoJSON to which a cast to JSONB format is applied.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from samples.models import City
>>> name = "Pescara, IT"
>>> point = "POINT(14.21 42.47)"
>>> City.objects.create(name=name, point=point)
<City: Pescara, IT (sfw0y6venxzrb89yzr2q)>
I added the data of my hometown, Pescara, Italy, simply by providing its latitude and longitude directly using the well-known text (WKT) format.
🔔 Note
In the Django shell, we only show the automatically calculated GeoHash field, but in a very similar way, we could use the GeoJson field in a view to return all the other model instances without having to recalculate the GeoJSON.
📏 A calculated length field
In GeoDjango we can define fields more complex than a point, and in this case, we used a LineStringField to represent a Route, in addition to a text field to store its name.
We added three example calculated fields from the line: the line length field, the field containing the number of points that define the line, and a Point field that defines the center of the line.
The Length field is defined in meters and its use is immediate in the case of a LineStringField. The field that defines the number of points instead denotes the complexity of a line. Finally, the Centroid is a point that more or less defines the center of the line itself, it can be convenient for example to represent it on a map when it is displayed at such a high zoom level as to make the detail of the line useless.
generatedfields/samples/models.py
from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
Centroid,
Length,
NumPoints,
)
class Route(models.Model):
name = models.TextField()
line = models.LineStringField()
length = models.GeneratedField(
db_persist=True,
expression=Length("line"),
output_field=models.FloatField(),
)
numpoints = models.GeneratedField(
db_persist=True,
expression=NumPoints("line"),
output_field=models.IntegerField(),
)
centroid = models.GeneratedField(
db_persist=True,
expression=Centroid("line"),
output_field=models.PointField(),
)
def __str__(self):
return (
f"{self.name} "
f"(~{self.length/1000:.1f} km)"
)
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model Route
--
CREATE TABLE "samples_route" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"name" text NOT NULL,
"line" geometry(LINESTRING, 4326) NOT NULL,
"length" double precision GENERATED ALWAYS AS (
ST_LengthSpheroid(
"line",
'SPHEROID["WGS 84",6378137.0,298.257223563]'
)
) STORED,
"numpoints" integer GENERATED ALWAYS AS (
ST_NPoints("line")
) STORED,
"centroid" geometry(POINT, 4326) GENERATED ALWAYS AS (
ST_Centroid("line")
) STORED
);
CREATE INDEX "samples_route_line_id"
ON "samples_route" USING GIST ("line");
COMMIT;
The SQL code generated for PostGIS uses the functions: ST_NPoints
to extract the number of points on the line, ST_Centroid
to calculate the center point of the line, and ST_LengthSpheroid
to calculate the length of the line on a spheroid.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from samples.models import Route
>>> name = "90 Mile Straight, AU"
>>> line = "LINESTRING(123.944 -32.455, 125.484 -32.27)"
>>> Route.objects.create(name=name, line=line)
<Route: 90 Mile Straight, AU (~146.4 km)>
In this example we have represented the longest straight route in the world as a straight line, it is a stretch of the Eyre Highway between Perth and Adelaide, 90 miles long, in Australia.
🔔 Note
In the Django shell we have only represented the length converted into kilometers, but the other two automatically calculated fields are also very useful, for example, to indicate the complexity of a route or to represent the route via the point that represents its center.
📐 A calculated area field
Moving up the scale of complexity of the geometries that can be represented in GeoDjango, we used a polygon field to approximate the surface of a State in addition to the text field to specify its name.
We have specified three example calculated fields that use the polygon as the base. The area field automatically calculates the surface of the polygon, the bounding box field represents the rectangle containing the polygon and the point field represents the approximation of the polygon center via a point on its surface.
generatedfields/samples/models.py
from django.contrib.gis.db import models
from django.contrib.gis.db.models.functions import (
Area,
Envelope,
PointOnSurface,
)
class State(models.Model):
name = models.TextField()
polygon = models.PolygonField()
area = models.GeneratedField(
db_persist=True,
expression=Area("polygon"),
output_field=models.FloatField(),
)
bbox = models.GeneratedField(
db_persist=True,
expression=Envelope("polygon"),
output_field=models.PolygonField(),
)
point = models.GeneratedField(
db_persist=True,
expression=PointOnSurface(
"polygon"
),
output_field=models.PointField(),
)
def __str__(self):
return (
f"{self.name} "
f"(~{self.area*10000:.0f} km²)"
)
After running makemigrations
, use sqlmigrate
and see the SQL code that will be applied to the database:
BEGIN;
--
-- Create model State
--
CREATE TABLE "samples_state" (
"id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
"name" text NOT NULL,
"polygon" geometry(POLYGON, 4326) NOT NULL,
"area" double precision GENERATED ALWAYS AS (
ST_Area("polygon")
) STORED,
"bbox" geometry(POLYGON, 4326) GENERATED ALWAYS AS (
ST_Envelope("polygon")
) STORED,
"point" geometry(POINT, 4326) GENERATED ALWAYS AS (
ST_PointOnSurface("polygon")
) STORED
);
CREATE INDEX "samples_state_polygon_8801f79a_id"
ON "samples_state" USING GIST ("polygon");
COMMIT;
Three functions are used in the SQL code generated for PostGIS: ST_Area
to calculate the area of the polygon, ST_Envelope
to calculate the bounding box containing the polygon and ST_PointOnSurface
to calculate the point on the surface.
After running migrate
, open a shell
and check the model behaves as intended:
>>> from samples.models import State
>>> name = "Colorado, US"
>>> polygon = "POLYGON((-109 37, -109 41, -102 41, -102 37, -109 37))"
>>> State.objects.create(name=name, polygon=polygon)
<State: Colorado, US (~280000 km²)>
In this example, I have inserted an approximation of the polygon representing the US state of Colorado. Although I believe it is the state with a perimeter as close as possible to a rectangle, in reality, it contains several irregularities and this is why the calculated area value is greater than its real size.
🔔 Note
This time too in the Django shell we have only represented one of the calculated fields: the area. The other fields are also very useful for representing a polygon on a map, for example, the bounding box is useful for defining the area in which to position the map so that the entire polygon is visible inside it, and the calculated point is similar to the Centroid defined in the previous example, but it ensures that it is always on the surface in the case of a very irregular polygon, and is very useful for representing the geometry as a point in case we are viewing the map at a very high zoom level.
🗺️ Maps
If you want to view the automatically generated geographic fields on a map, I recommend you follow the instructions I have already published to create a map with Django in my 2-part series "Maps with Django":
- Maps with Django⁽¹⁾: GeoDjango, SpatiaLite & Leaflet
- Maps with Django⁽²⁾: GeoDjango, PostGIS & Leaflet
🙋 Get involved
Thank you for getting this far with reading this article.
I hope you found it interesting to read the examples I proposed, as it was for me to formulate them and report them here. The motivation behind this work and this article is to collaborate with all Django developers to churn out features that are as solid and stable as possible.
As I have repeatedly invited you to do in my articles and talks, I invite you to get involved.
Try Django 5.0 and experiment with its new features, but above all report any strange behavior or bugs so that they end up being better for everyone.
🥠 Conclusion
As we have seen, the generated fields of Django 5.0 are a very powerful tool, which lend themselves to solving a wide variety of problems.
I think they also have the merit of bringing a lot of logic back into the database, which in most projects is the lowest layer of the project stack.
Together with the database-computed default values they also make it possible to simplify the collaboration of different applications on the same database.
That's all!
— Paolo
⚠️ Disclaimer
This code is for demonstration purposes only and should not be used in production as is. However, the code is released without any guarantee from the author and no liability can be attributed. Use at your own risk.
Top comments (0)