As a Django developer, there are many times that you are asked for CSV export of your database tables.
To do this, Python’s CSV module is a simple library to use. But when your dataset is large, it becomes so inefficient that often leads to timeout error. There are two problem with this method:
- Python is not very fast.
- Generating model objects by Django Orm is very resource consuming.
As an alternative approach, I suggest using Postgres CSV functions. This way, not only we don’t need to create thousands of model objects, but also we rapidly generate CSV thanks to Postgres speed.
But do I need to write complex SQL queries?
What should I do if I currently have a filtered queryset, for example, in Django admin?
Don’t worry. There is no need to write SQL. Here, we write a function that accepts a queryset and a filename and returns a CSV response
from django.db import connection
from django.utils import timezone
from django.http import HttpResponse
def qs_to_csv_response(qs, filename):
sql, params = qs.query.sql_with_params()
sql = f"COPY ({sql}) TO STDOUT WITH (FORMAT CSV, HEADER, DELIMITER E'\t')"
filename = f'{filename}-{timezone.now():%Y-%m-%d_%H-%M-%S}.csv'
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = f'attachment; filename={filename}'
with connection.cursor() as cur:
sql = cur.mogrify(sql, params)
cur.copy_expert(sql, response)
return response
Suppose we have these models:
from django.db import models
from django.utils.translation import ugettext_lazy as _
class City(models.Model):
name = models.CharField(max_length=50)
class Place(models.Model):
PLACE_TYPE_CHOICES = (
(1, _('Park')),
(2, _('Cafe')),
(3, _('Resturant')),
(4, _('Cinema')),
)
name = models.CharField(max_length=50)
city = models.ForeignKey(City, on_delete=models.CASCADE)
place_type = models.PositiveSmallIntegerField(
choices=PLACE_TYPE_CHOICES
)
If we want to have a CSV export from Place table containing place_id, place_name, place_type and city_name, we pass this query to the function above:
from django.db.models import F
Place.objects.values(
'id',
'name',
'place_type',
city_name=F('city__name')
)
There is some caveat here. Postgres doesn’t know anything about our pretty, human readable and translated place types and putting some numbers in place_type columns can be completely useless.
The function below can be used to tell Postgres about this kind of mappings using Case and When:
from django.db.models import Case, When, Value, CharField
def map_choices(field_name, choices):
return Case(
*[When(**{field_name: value, 'then': Value(str(representation))})
for value, representation in choices],
output_field=CharField()
)
And then, we use it in the query:
Place.objects.values(
'id',
'name',
verbose_type=map_choices('place_type', Place.PLACE_TYPE_CHOICES),
city_name=F('city__name')
)
By using Django ORM tools like F, Func, ExpressionWrapper, RawSQL … you can easily write more complex queries in a performant way.
Top comments (1)
Neat solution! This made my day!