Generic CSV data exports in Python

Whether for sharing with third parties, outside analytics work, or simply content extraction, many sites need to provide data export functinality, and in a user friendly way to boot. Comma separated format fits the bill in most cases - it's portable and spreadsheet friendly, and its something developers and the non-tech savvy can work with right away.

Whether for sharing with third parties, outside analytics work, or simply content extraction, many sites need to provide data export functinality, and in a user friendly way to boot. Comma separated format fits the bill in most cases - it’s portable and spreadsheet friendly, and its something developers and the non-tech savvy can work with right away.

If you have multiple data types or simply variations on your export, you’ll end up duplicating a bunch of code to extract the right data columns. That’s neither pretty nor sensible.

This function instead handles [most] generic export scenarios. It takes a writeable object, the csvfile argument, into which to write the results. The objects argument should be an iterable and fields should be a dictionary for mapping CSV fields to data attributes of the objects in your iterable argument. The fields keys are expected to be the CSV column headers and the values the attribute names from which to extract the CSV column data.

The builtin function attrgetter (Python 2, Python 3) allows us to get chained attributes, e.g. someattr.subattr.furthersubattr.

import csv
from operator import attrgetter
 
 
def generic_csv(csvfile, objects, fields):
    writer = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
    row_headers = fields.keys()
    writer.writerow(row_headers)
 
    for obj in objects:
        row = [attrgetter(fields[header])(obj) for header in row_headers]
        for count, column in enumerate(row):
            if callable(column):
                row[count] = column()
            if type(column) is unicode:
                row[count] = column.encode('utf8')
        writer.writerow(row)
 
    return csvfile

The csvfile argument can be any writable object. You could send a regular system file, but you can just as easily send something like an instance of a Django HttpResponse object.

Now presuming you have a utils.py module for the generic_csv function, a simple Django view function like the following will return the desired CSV file without having to know anything about CSV handling.

from django.contrib.auth.models import User
from django.http import HttpResponse

from myapp.models import UserProfile
from utils import generic_csv


def csv_view(request):
    response = HttpResponse(mimetype='text/csv')
    response['Content-Disposition'] = "attachment; filename=site-users.csv"
    user_list = User.objects.select_related('profile').all()
    fields = OrderedDict([
        ("First name", "first_name"),
        ("Last name", "last_name"),
        ("Email", "email"),
        ("City", "profile.city"),
        ("Avatar", "profile.avatar"),
    ])
    return generic_csv(response, user_list, fields)

Two important things to note here. If you’re going to make use of chained attributes with a QuerySet make sure to use select_related so that each subattribute request doesn’t generate a new database query. And if column header ordering matters you’ll need to use an OrderedDict (Python 2, Python 3) so that the key order is respected.

This can be further streamlined using a CsvResponse or even a CSV repsonse-based generic view, but this is representative of a clean and reusable solution.

blog comments powered by Disqus