"This is my django partitioning solution. There are many like it, but this one is mine."
The impatient can jump to the codey bits.
Background
The axiom can be applied to anything, but "django is great... until it isn't". There is no explicit or implicit support for partitioning in django, but it's not impossible. Join me for my tale of hacking a solution to this problem.
Are you sitting comfortably? Then we'll begin.
I had already made some code to "convert" a table in postgres to a partitioned table. It, too, worked great... until it didn't. I had tried some other modules, but they all had their problems and simply did not work with our app or how we are deploying. Despite our convert code, we still had the issue of being unable to initially create a partitioned table. So I took a deep dive into the django code and eventually found the pieces I needed in the BaseDatabaseSchemaEditor
class.
So, I tried to subclass this and figure out how to tell the migration app to use it. But with all of my searching (and a definite time window for completion) it looked like I would have to write my own migration app. This was distasteful because we were already using a tenant module's app for schema migrations and I really did not want to write yet another django app just to change one SQL statement and emit a few deferred ones. So I focused on the methods in BaseDatabaseSchemaEditor
to see exactly what happened during create and delete model (table).
In the BaseDatabaseSchemaEditor
class, there were two methods I focused on: table_sql
for creating a table; and delete_model
for dropping a table. Once I had read over these methods and understood what they were doing, I formulated my hack. This required some method overrides, and enable function, a disable function, a list of known partitioned model names, a partition information metaclass, and two support functions.
Python Code
Support Functions
I wanted a function that would be able to get me any model I wanted, globally. It needed to be able to return a model based on the model_name, app_label.model_name or table_name. This function's data would need to be cached and that cache loaded at runtime to ensure that all models were compiled and loaded into django's app/model cache.
_load_db_models
def _load_db_models():
"""Initialize the global dict that will hold the table_name/model_name -> Model map"""
qualified_only = set()
for app, _models in django.apps.apps.all_models.items():
for lmodel_name, model in _models.items():
qualified_model_name = f"{app}.{lmodel_name}"
if lmodel_name in DB_MODELS:
qualified_only.add(lmodel_name)
del DB_MODELS[lmodel_name]
DB_MODELS[qualified_model_name] = model
DB_MODELS[model._meta.db_table] = model
if lmodel_name not in qualified_only:
DB_MODELS[lmodel_name] = model
This uses a module-level cache variable named DB_MODELS defined as an empty dict. There is also a module-level lock defined as:
import threading
...
DB_MODELS_LOCK = threading.Lock()
get_model
def get_model(model_or_table_name):
"""Get a model class from the model name or table name"""
with DB_MODELS_LOCK:
if not DB_MODELS:
_load_db_models()
return DB_MODELS[model_or_table_name.lower()]
Thus if I had an app named eek
and an ORM model defined in this app as
from django.db import models
...
class MySuperCoolModel(models.Model):
class Meta:
db_table = "my_super_cool_model"
id = models.UUIDField()
start_date = models.DateField()
...
Then each these calls would all return the MySuperCoolModel
class:
get_model("mysupercoolmodel")
get_model("MySuperCoolModel")
get_model("eek.MySuperCoolModel")
get_model("my_super_cool_model")
PartitonedModels
This is simply a list of model names that should be partitioned when created. I called my list PARTITIONED_MODEL_NAMES
Metaclass
This is a ORM model attribute that itself is a class containing information needed to make a partitioned table. It takes this form:
class PartitionInfo:
partition_type = <str: a partition type such as "RANGE" or "LIST" or "HASH">
partition_cols = <list[str]: a list of ORM model columns (this needs to match the format of the column as it would appear in the database>
As applied to MySuperCoolModel
:
PARTITION_MODEL_NAMES = [
...
"MySuperCoolModel",
]
class MySuperCoolModel(models.Model):
class PartitionInfo:
partition_type = "RANGE"
partition_cols = ["start_date"]
class Meta:
db_table = "my_super_cool_model"
id = models.UUIDField()
start_date = models.DateField()
...
Override Methods
table_sql
I made a new function called p_table_sql
(p_
for partitioned). This function will call the original function which I call o_table_sql
(o_
for original) to get the initial SQL created as normal. Then I check to see if there is a name match from the model class passed in against the PARTITIONED_MODEL_NAMES
list. If there is a match, I use get_model
to retrieve the full ORM model class.
If I have a name match and the ORM model has the PartitionInfo metaclass, then I proceed with the new functionality. Otherwise, I return the original sql.
The new functionality consists of adding the PARTITION BY <type> (<cols>)
clause to the create table SQL. Also the PRIMARY KEY qualifier is removed from the SQL. The primary key is then added to the deferred SQL list as an ALTER TABLE
statement. This is inserted into the deferred_sql
list before any FOREIGN KEY statements. If you have needs for tracking information or other post-create sql to be run for the new table, append that sql to the deferred_sql
list.
This is because django does not support multiple column primary keys well, if at all. For partitioned models, it is best that any column that you would consider a primary key be created explicitly and not be a automatic incrementing field. UUID type is a good candidate for this.
from django.db.models import ForeignKey
...
def _count_fk_fields(model):
num_fk = 0
for f in model._meta.fields:
num_fk += int(isinstance(f, ForeignKey))
return num_fk
def p_table_sql(self, model):
# Use default model class for the original django SQL generation
sql, params = self.o_table_sql(model)
# Based on model name match, get the defined model from the app
# from the django migration processing
if model.__name__ in PARTITIONED_MODEL_NAMES:
pmodel = get_model(model.__name__)
else:
pmodel = None
# If there was a partition name match and the class has the required attribute,
# use this information to add the partition clause to the create table sql
# Otherwise, return the original sql and params
if pmodel is not None and hasattr(pmodel, "PartitionInfo"):
LOG.info(f"Creating PARTITIONED TABLE {pmodel._meta.db_table}")
partition_cols = pmodel.PartitionInfo.partition_cols
sparams = {
"partition_type": pmodel.PartitionInfo.partition_type.upper(),
"partition_cols": ", ".join(f'"{c}"' for c in partition_cols),
}
# The primary key will be overridden here
# Partitioned tables require that the partition column(s) be part of the primary key
p_sql = self.sql_partitioned_table % sparams
sql = sql.replace("PRIMARY KEY", "") + p_sql
pk_cols = partition_cols[:]
try:
mod_pk = pmodel._meta.pk.get_attname()
except Exception:
pass
else:
pk_cols.append(mod_pk)
sparams = {
"table_name": f'"{pmodel._meta.db_table}"',
"constraint_name": f'"{pmodel._meta.db_table}_pk"',
"constraint_cols": ", ".join(f'"{c}"' for c in pk_cols),
}
pk_constraint = self.sql_partitioned_pk % sparams
num_fk = _count_fk_fields(pmodel)
if num_fk:
self.deferred_sql.insert(-num_fk - 1, pk_constraint)
else:
self.deferred_sql.append(pk_constraint)
# Add any deferred sql statement for post-create to
# the self.deferred_sql list
else:
LOG.info(f"Creating TABLE {model._meta.db_table}")
return sql, params
delete_model
This function (which I've named p_delete_model
) needs to do any pre-processing before the table is dropped. Use the original functionality (which I've reassigned to o_delete_model
) to do the actual table drop.
For my implementation, I have a tracking table that lists all of the partitions of any of my partitioned tables. So I want to clean that up.
def p_delete_model(self, model):
if model.__name__ in PARTITIONED_MODEL_NAMES:
pmodel = get_model(model.__name__)
else:
pmodel = None
if pmodel is not None and hasattr(pmodel, "PartitionInfo"):
sparams = {"partitioned_table_name": pmodel._meta.db_table}
with self.connection.cursor() as cur:
drop_partitions_sql = cur.mogrify(self.sql_drop_partitions, sparams).decode("utf-8")
self.execute(drop_partitions_sql)
self.o_delete_model(model)
Enable Function
This function will set SQL templates for partitioning as well as reassign original functionality to o_
attributes and assign the new functionality to the original attributes.
def set_partitioned_schema_editor(schema_editor):
"""
Add attributes and override method of given schema_editor to allow partition table sql statements
to be emitted.
"""
# Add SQL templates, if not already present
if not hasattr(schema_editor, "sql_partitioned_table"):
setattr(schema_editor, "sql_partitioned_table", " PARTITION BY %(partition_type)s (%(partition_cols)s) ")
if not hasattr(schema_editor, "sql_partitioned_pk"):
setattr(
schema_editor,
"sql_partitioned_pk",
"ALTER TABLE %(table_name)s ADD CONSTRAINT %(constraint_name)s PRIMARY KEY (%(constraint_cols)s)",
)
# Template to drop partitions by using the partition manager trigger function set
# on the table
if not hasattr(schema_editor, "sql_drop_partitions"):
drop_partitions_sql = """
DELETE
FROM partitioned_tables
WHERE schema_name = current_schema
AND partition_of_table_name = %(partitioned_table_name)s
"""
setattr(schema_editor, "sql_drop_partitions", drop_partitions_sql)
# Backup original method to emit create table sql and replace with the new method
if not hasattr(schema_editor, "o_table_sql"):
setattr(schema_editor, "o_table_sql", schema_editor.table_sql)
setattr(schema_editor, "table_sql", types.MethodType(p_table_sql, schema_editor))
if not hasattr(schema_editor, "o_delete_model"):
setattr(schema_editor, "o_delete_model", schema_editor.delete_model)
setattr(schema_editor, "delete_model", types.MethodType(p_delete_model, schema_editor))
Disable Function
This function will restore the schema_editor instance back to default.
def unset_partitioned_schema_editor(schema_editor):
# Delete partition template attributes, if present
if not hasattr(schema_editor, "sql_partitioned_table"):
delattr(schema_editor, "sql_partitioned_table")
if not hasattr(schema_editor, "sql_partitioned_pk"):
delattr(schema_editor, "sql_partitioned_pk")
if not hasattr(schema_editor, "sql_drop_partitions"):
delattr(schema_editor, "sql_drop_partitions")
# Restore original functionality for create table sql emit
if not hasattr(schema_editor, "o_table_sql"):
setattr(schema_editor, "table_sql", schema_editor.o_table_sql)
delattr(schema_editor, "o_table_sql")
# Restore original functionality for delete_model method
if not hasattr(schema_editor, "o_delete_model"):
setattr(schema_editor, "table_sql", schema_editor.o_delete_model)
delattr(schema_editor, "o_delete_model")
Migrations
Once all of this is done, I had to utilize it in a migration file. After I dug through the django logic that ran migrations, I could see that (at least with migrate_schemas
from django_tenant_schemas
) that the schema_editor
was created as a context manager and utilized throughout all migration files to be applied from the current state.
This means that I can set and clear the schema editor
from within a migration file as discrete RunPython
migration operations. The first operation should be to set the schema_editor
for partitioning with the reverse action to unset. The last operation should be to unset the schema_editor
with the reverse action to set.
Example:
from <your_partition_code_module> import set_partitioned_schema_editor
from <your_partition_code_module> import unset_partitioned_schema_editor
def set_partition_mode(apps, schema_editor):
set_partitioned_schema_editor(schema_editor)
def unset_partition_mode(apps, schema_editor):
unset_partitioned_schema_editor(schema_editor)
class Migration(migrations.Migration):
dependencies = []
operations = [
migrations.RunPython(code=set_partition_mode, reverse_code=unset_partition_mode),
migrations.CreateModel(
name="OCPAllCostLineItemProjectDailySummaryP",
fields=[
("id", models.UUIDField(serialize=False)),
("source_type", models.TextField()),
...
]
),
migrations.RunPython(code=unset_partition_mode, reverse_code=set_partition_mode),
]
It's a little complicated, but it does work.
I hope that this can be adapted to your projects and open up the possibility of partitioning to those projects that need it.
Top comments (1)
@redhap I kind of want to try your code to implement partition in my project, but it's unclear where to place which code block. It would be better if you can mention where the code blocks will go (in which file), and even better if you can share an example on GitHub.