Most of the time when we think of migrations in Django, we are referring to schema migrations. Django can create these for you automatically as they describe a change to the structure of your database, but not a change in the data itself. However, another type of migration you may find yourself using is a data migration. Data migrations are useful when you are loading new data, or want to change the data in your database using a specific schema.
I came across this problem when I was building ickly, a search interface into NYC Restaurant Health Inspection Data. I wanted users of my app to be able to search for a restaurant by name and see all of its inspections data. The dataset was a CSV file whose rows corresponded to inspections, however, it did have a 'camis' field which was a unique identifier for a business. I wanted to transform this data to match the data models I wanted for Businesses and Inspections and I needed to get all of the unique businesses.
If you are just loading a fixture or some sample data that is already in the structure you need it to be in, you may not need a data migration, but can use the loaddata
command provided by Django.
Creating a data migration
Django can't automatically generate data migrations for you, but you can write one yourself. You can run the following command to generate an empty migration file in which you will add operations.
python manage.py makemigrations --empty yourappname
The main operation that we will look at and you will use for your data migration is RunPython
. Here is what the auto generated file will look like:
# Generated by Django A.B on YYYY-MM-DD HH:MM
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
('yourappname', '0001_initial'),
]
operations = [
]
RunPython
expects a callable as its argument. This function which you will write takes two arguments, an app registry and a schema editor. We then add the RunPython
operation passing in our function. This will cause it to be executed when we run ./manage.py migrate
from the command line.
from django.db import migrations
def my_function(apps, schema_editor):
# logic will go here
pass
class Migration(migrations.Migration):
dependencies = [
('yourappname', '0001_initial'),
]
operations = [
migrations.RunPython(my_function),
]
The app registry maintains a list of the historical versions of all your available models. We want to use the app registry in our function to get the historical version by using apps.get_model('your_app_name', 'your_model_name)
instead of just importing the model directly. We do this because we want to make sure we are using the version of the model that this migration expects. If you use a direct import you may be importing a newer version.
The SchemaEditor can be used to manually effect database schema changes. With the exception of highly advanced cases, you most likely will not want to interact with this directly. The SchemaEditor exposes operations as methods and turns things like "create a model" or "alter a field" into SQL.
The RunPython operation can also take a second callable. This second function would contain the logic you want to happen when migrating backwards. If you do not provide one, attempting to migrate backwards will raise an exception. If you want to learn more about the RunPython operation and other optional arguments check out the documentation here
Example
Lets look at an example of a migration taken directly from my code for ickly. I've added comments to point out all the relevant parts we went over in this post.
# -*- coding: utf-8 -*-
# Generated by Django 1.10.1 on 2017-04-20 21:02
from __future__ import unicode_literals
from django.db import migrations, models
import csv
from datetime import datetime
def load_initial_data(apps, schema_editor):
# get the correct versions of models using the app registry
Business = apps.get_model("api", "Business")
Inspection = apps.get_model("api", "Inspection")
# This is where your migration logic will go.
# For my use case i needed to get unique businesses and
# transform data from the csv file into the schema i wanted
with open('DOHMH_NYC_Restaurant_Inspection_Results.csv') as csv_file:
reader = csv.reader(csv_file)
header = next(reader)
businesses = []
inspections = []
for row in reader:
camis = row[0]
business = next((b for b in businesses if b.camis == camis), None)
if not business:
business = Business(camis=row[0], name=row[1],
address="{} {} {} {}".format(row[3], row[4], row[2], row[5]),
phone=row[6], cuisine_description=row[7])
businesses.append(business)
inspection = Inspection(business=business,
record_date=datetime.strptime(row[16],"%m/%d/%Y").date(),
inspection_date=datetime.strptime(row[8],"%m/%d/%Y").date(),
inspection_type=row[17], action=row[9], violation_code=row[10],
violation_description=row[11], critical_flag=row[12],
score=int(row[13]) if row[13] else None,
grade=row[14],
grade_date = datetime.strptime(row[15],"%m/%d/%Y").date() if row[15] else None)
inspections.append(inspection)
Business.objects.bulk_create(businesses)
Inspection.objects.bulk_create(inspections)
## logic for migrating backwards
def reverse_func(apps, schema_editor):
Business = apps.get_model("api", "Business")
Inspection = apps.get_model("api", "Inspection")
Business.objects.all().delete()
Inspection.objects.all().delete()
class Migration(migrations.Migration):
# Django automatically adds dependencies for your migration
# when you generate the empty migration
dependencies = [
('api', '0002_auto_20170420_2101'),
]
# the RunPython operation with the two callables passed in
operations = [
migrations.RunPython(load_initial_data, reverse_func)
]
There is a lot more to know about Django data migrations, but you now have the knowledge to know whether or not you may need to write one and to get you started if you do. If you want to learn more about Django migrations in general the documentation provides a great overview.
If you have any questions, comments, or feedback - please let me know. Follow for new weekly posts about JavaScript, React, Python, and Django!
Cover Photo by Taylor Vick on Unsplash
Top comments (3)
Don't forget to test your data migrations! One can use
django-test-migrations
package for this:wemake-services / django-test-migrations
Test django schema and data migrations, including migrations' order
django-test-migrations
Features
django
schema and data migrationsmypy
, PEP561 compatibleRead the announcing post See real-world usage example.
Installation
We support several
django
versions:1.11
2.1
2.2
Other versions might work too, but they are not officially supported.
Testing django migrations
Testing migrations is not a frequent thing in
django
land But, sometimes it is totally required. When?When we do complex schema or data changes and what to be sure that existing data won't be corrupted We might also want to be sure that all migrations can be safely rolled back And as a final touch we want to be sure that migrations areβ¦
Thanks! Will definitely look into this package when i get a chance!
Hello.
It would be great to wrap the entire code of
load_initial_data
function intotransaction
as a best practice.