Sushi, but for web apps: raw SQL & Django (This Old Pony #50)

The first time I heard the term “raw SQL” was simultaneously amusing and confusing. Amusing because, well, “raw”!? As if it’s such an odd thing to write SQL! Confusing because of the fear it seemed to conjure in many developers. Fear not of something that might pose challenges to maintain or secure - more on that shortly - but of something completely esoteric that only shamans with titles like “DBA” ever wrote.

It turns out that quite a few software developers, including web developers, _have not _written any non-trivial SQL and in fact querying the database directly is largely a dark art.

Raw SQL

The Django ORM does support raw SQL querying from within the ORM though, and you can always execute wholly custom SQL using the database connection.

So should you? What are the drawbacks? And how does this impact the maintainability of Django projects? All great questions, so let’s proceed with them.

From here on out “raw SQL” will reference SQL queries directly written by developer, whether using the raw() method or not.

When to use raw SQL

The first answer to “when should you use raw SQL in a Django project?” is it depends. The second answer is rarely.

There are two primary reasons to use raw SQL rather than the ORM’s API.

The first is query complexity. You can build some significantly complex and powerful queries in SQL. And modern SQL offers some features that have at least historically been missing from Django’s ORM. Django 1.11 added support for union queries, enabling results from multiple queries to be concatenated into one result. However there is not yet, to my knowledge, a good way of expressing many outer joins (contra an inner join, which is the intersection of two sets, an outer join is an intersection plus one or more of the original sets).

An example we found where raw SQL was the best choice was in a simple CMS feature. The requirement was simply the ability to “pin” a blog post to the top of a blog post feed. And in this requirement it also needs to maintain the same post pagination. And the toggle for pinning this lived inside a nested CMS plugin with draft and published versions. The SQL solution made sense: a common table expression, a query excluding the CTE results, and a union. Trying to do this in the Django ORM was painful, to say the least, and the raw SQL was actually simpler to understand… and of course, it worked, unlike the ORM attempts.[0]

The second time to use raw SQL? When there’s an obvious performance benefit. If what raw SQL affords you lets you simplify or speed up the querying on a major feature of your app, it may be worth using raw SQL.

The bonus third reason is when the cost of writing or maintaining the raw SQL is lower than that of the ORM queries.

A significant benefit of using the ORM is that you’ll get earlier feedback about changes to models, something you won’t find out with raw SQL until you run a query. This is on top of the benefits of having a consistent platform for other developers and Python model instance serialization - nothing to sniff at! However for one-off queries like data migrations or other data transformations where there’s unlikely to be much ongoing maintenance, speed of development and speed of execution often call for raw SQL.

Where and how to use raw SQL

Anyone who did web development with early PHP is probably scarred from dynamic web pages that mixed HTML, business logic, and database querying all in one. So “where” to use raw SQL is a reasonable question.

Queries using Django’s ‘raw’ queryset method should be added via custom queryset methods on your models’ manager. This keeps the interface clean for code that calls these queries, and keeps the raw and gory details out of the way.

It also maintains consistent organization for testing. It wasn’t entirely accurate to say that raw SQL hides changes in your codebase, insofar as running tests with these queries will show you breaking changes just as surely as tests on any other code which hasn’t caught up.

Even “totally” raw SQL, executed directly with the database connection, is most often best housed in a queryset/manager method. This is a great time to use type hinting though, since it won’t be returning a model queryset, and to make sure you identify your column headers in your response.

Why not to use raw SQL

It’s extraordinarily helpful to understand the SQL queries behind the ORM querysets, but that doesn’t mean you need to use it.

There are very basic features built into most database drivers (e.g. psycopg2) for sanitizing query inputs, but if you fail to use these you open yourself up to SQL injection attacks.

Your results will be query result sets, not Django querysets, which is to say (a) not chainable and (b) eagerly executed.

All things being equal raw SQL tends to be a bit harder to maintain.

And last but not least, with every release of Django you can do more and more in the ORM. Features for adding expressions and new PostgreSQL aggregations make a compelling case for sticking with the ORM for as long as possible.

SELECT author::char FROM newsletter;
“Ben”

[0] Written up in a blog post about managing content rivers: https://wellfire.co/learn/featured-items-content-rivers-django-cms/ Years later that queryset has never needed to be updated, so no regrets about the choice of execution model.

Wellfire Interactive // Expertise for established Django SaaS applications

Sushi, but for web apps: raw SQL & Django (This Old Pony #50)

When to use raw SQL

Where and how to use raw SQL

Why not to use raw SQL

More like this