Dynamically Create ORM Models for Individual Table Partitions

#django #python #postgres

Some examples you may find on the internet will describe creating models for partitions by hand. What a pain! This does not allow for dynamic data expansion and utilization. This is a short article regarding a solution for the Django to dynamically create ORM models for individual table partitions. The database backend is PostgreSQL, for those interested.

For those of you who haven't followed my partitioning adventures with django, I do have statically defined models for the partitioned tables, just not for the individual partitions. Currently, all work utilized the interface provided by the partitioned table. But I wanted to see if there was a way to possibly increase efficiency by accessing table partitions directly. I just didn't want to write another SQL builder. But how to do this without writing one?

I first tried to dynamically create a subclass of the partitioned table. Here be dragons! What initially surprised me the most was that it worked in terms of compiling and creating the class object. What harshed my mellow was that dynamically created pointer fields to the superclass were created and would cause their own havoc. So what to do? Now I had to figure out how to create the necessary meta internal class and field definitions with all of the correct (or at least application-required) options set properly. With all of the myriad options for fields, that is a daunting task.

Or so I thought. Taking another deep dive through all of the settings in the model class and fields for an application ORM model, I struck gold here:

<model_class>._meta.fields[<some_index>].db_type_parameters()

This returns a dict of all of the parameters that were set for the field (including any derived ones). I tried to just use the dict directly, but you end up with errors because of the derived fields (those keys starting with an underscore, among some others). There were also issues because the to and on_delete are not populated in that dict for ForeignKey fields. So, knowing the minimum number of parameters needed to properly create an ORM model class for a partition of a table, I constructed a set of "allowed" keys to be the minimum to create a model field and I had to probe the field relation to get the to and on_delete values for the ForeignKey fields.

Here's the final form of the suite of functions:

from django.db import transaction
from django.db.models.fields.related import RelatedField

def _create_partition_model(partition_info):
    class _Meta:
        db_table = partition_info.table_name

    model = get_model(partition_info.partitioned_table_name)

    partition_model_name = model.__name__ + _get_partition_model_suffix(partition_instance)

    partition_model_attrs = {
        "Meta": _Meta,
        "__module__": model.__module__,
    }
    allowed_fields = {"null", "primary_key", "default", "editable", "serialize", "db_column", "base_field"}

    for field in model._meta.fields:
        partition_field_attrs = {
            k: v for k, v in field.db_type_parameters(transaction.get_connection()).items() if k in allowed_fields
        }
        if isinstance(field, RelatedField):
            partition_field_attrs["to"] = field.related_model.__name__
            partition_field_attrs["on_delete"] = models.DO_NOTHING
            for rel_obj in field.related_model._meta.related_objects:
                if rel_obj.remote_field == field:
                    partition_field_attrs["on_delete"] = rel_obj.on_delete

        partition_model_attrs[field.name] = type(field)(**partition_field_attrs)

    _model = type(partition_model_name, model.__bases__, partition_model_attrs)

    return _model

Let's break it down:

def _create_partition_model(partition_info):
    class _Meta:
        db_table = partition_info.table_name

    model = get_model(partition_info.partitioned_table_name)

    partition_model_name = model.__name__ + _get_partition_model_suffix(partition_instance)

Since we're working with the same field definition, but a different database table, we need to set the Meta.db_table to the actual table partition name.

The partition_info object is simply an object instance that holds information regarding the target table partition.

Substitute get_model() with whatever means you would use to dynamically get model classes from django. It'll be some form of django.apps.apps.all_models probably.

This assumes that partitions are named as (partitioned_table_name + partition_suffix) In this function, the partition model name (ORM model class name) is derived from the partitioned table class name plus the db table partition name suffix. Substitute _get_partition_model_suffix with whatever means you would use to get the table partition suffix or with whatever name mangling you prefer.

    partition_model_attrs = {
        "Meta": _Meta,
        "__module__": model.__module__,
    }
    allowed_fields = {"null", "primary_key", "default", "editable", "serialize", "db_column", "base_field"}

Now we are creating the dict of class attributes for the new class. The allowed_fields is the base set of fields required (in our app, anyway) to create a field. Adjust these values as needed. You will need base_field for things like an array of another type.

    for field in model._meta.fields:
        partition_field_attrs = {
            k: v for k, v in field.db_type_parameters(transaction.get_connection()).items() if k in allowed_fields
        }
        if isinstance(field, RelatedField):
            partition_field_attrs["to"] = field.related_model.__name__
            partition_field_attrs["on_delete"] = models.DO_NOTHING
            for rel_obj in field.related_model._meta.related_objects:
                if rel_obj.remote_field == field:
                    partition_field_attrs["on_delete"] = rel_obj.on_delete

        partition_model_attrs[field.name] = type(field)(**partition_field_attrs)

Now we're building all of the fields for the new class from the partitioned table's ORM class fields. Note that we are constraining the field parameters to the allowed_fields set.

When we have detected a field type class derived from RelatedField we need to set to and on_delete properly. The to attribute is easy. The on_delete attribute is a bit more difficult. The solution I found that works well is to loop through the field's related objects to find the one relation that points back to the field in question. There may be a better way to do this.

Finally, we add the field to the model attributes using the field attributes we've collected.

    _model = type(partition_model_name, model.__bases__, partition_model_attrs)

    return _model

Finally, we create the model class using type(). It was important here to use the __bases__ from the partitioned table's model. Using the model directly creates the problematic subclass.

The cool part is that django will also register the new model class in its all_models dict due to the metaprogramming going on under the hood.

And that's how I did it. Now we can use the new ORM model class to perform CRUD on the partition directly. And we didn't have to taint the original model to do so.

So now, there is the possibility to process data in multiple threads or processes and access the partitions directly without having to write a bunch of permutations of table partition classes. So long as the structure is consistent between the partitioned table and its partitions, this should work seamlessly.

I hope those projects taking advantage of partitioning in PostgreSQL and using django for the webapp will find this useful. It would really surprise me if there was not some sort of way to also do this with SQLAlchemy.