Wellfire Interactive // Django consultants for your existing Django site

Not so fast - when your Django site is slow (This Old Pony #40)

Django powers some high wattage websites: Instagram, Disqus, Pinterest, numerous media properties. This works because despite Python’s relative speed deficit compared to, say, C or even Java, it’s more than fast enough for 99% of web applications. 

Those sites are also fast because of numerous optimizations those teams have made, including continuous performance optimization for both the application and supporting infrastructure[0].

All of which is to say when Django sites are running slow, it’s a solvable problem, and most often due to suboptimal use of Django’s own features.

 

Why does it matter?

It might seem obvious because we all know that fast websites are good, but you need to ask this question first. Debugging performance issues anywhere can take quite a bit of time. 

Before you start down this path you need to understand what the costs are to whom. Maybe an app is blocking requests or reaching its resource limits because it’s too slow. Is it not impressing a handful of senior stakeholders? Or is the performance hampering the experience of paying customers?

The decisions you make in how you prioritize assessing problems and how you solve them depends on these answers.

Okay, on to some of the most “popular” issues.
 

It’s the database!

No, it’s probably not the base. True, it’s probably worth looking at your indexing strategy[1] but usually the database is not the bottleneck. That’s contrary to a lot of old common sense, but most often it’s _how the database is used _that is the bottleneck.

Slow, poorly designed queries are slow. And making lots of individual queries is slow. Also, making lots of individual slow, poorly designed queries is slow.

The first step in identifying query usage, including count and time, is to start monitoring this. In production you’ll want to use a dedicated monitoring tool, but to start with in development django-debug-toolbar[2] will get you far. You’ll want to pay attention to the number of queries on a page as well as _what you expect this number to be (roughly) _given what kind of data the page returns. And then start examining the queries themselves, the toolbar will report.

As if often the case, you may see what look like duplicate queries, sometimes all over the place.

This is great because this is low hanging fruit. It probably means you need to make better use of queryset methods like select_related which will query for related data, often dropping query counts - and times - to a fraction of what they were before.

From here you can start deferring data required so as to minimize the scope and size of the data the database must actually query for.
 

Python vs. the database

Another slower pattern is pulling data back from the database only to filter in Python or otherwise perform some statistical calculation, when the database can do this much faster.

Occasionally there are Very Good Reasons (TM) for doing this, but only occasionally[3]. The modern Django ORM offers a huge amount of coverage of what a SQL database like PostgreSQL can do, so it behooves you to look for these non-database traps.
 

More common issues

That’s some low hanging fruit, but other common performance concerns crop up from:

  • Excessive template logic
  • Insufficient use of caching
  • Including blocking API requests _unnecessarily _as part of the HTTP request cycle There are good reasons to minimize the amount of logic required in your templates, but one reason is performance. Big, complex templates render slower, not to mention the logic is harder to debug in templates.

Caching is one of those magic pills that, when it works, it kind of just obviates the need for too many other solutions. It doesn’t work everywhere or for every part of every site, but for common data - whether common for one user or across many - caching can reduce response times for queries, templates, and everything in between.

Blocking requests, whether to a third-party API or to a data crunching process, are a significant source of response latency. Often these can be pulled out of the request cycle to asynchronous tasks and/or caching can be employed to reduce latency for non-critical requests.

A good rule of thumb for deciding to remove these from the Django request/response cycle is: “does the user absolutely need this result when the next page loads?” It generally makes a lot of sense to make user wait for payment to process, whereas sending a confirmation email or even executing a subscription payment cancellation can be handled outside of the request/response cycle.
 

Diagnosing and prioritizing

As previously mentioned, django-debug-toolbar is one of the most effective tools for diagnosing performance issues. It’s not as granular as other tools but if you had to choose only one tool to work with, this should be it. Tool wise the browser inspector is also helpful, as are the various Python profiling tools, including cProfiler. 

Your first goal - presuming you have already identified a slow page or URL - is to identify how the different _components _of the response add up to the response time.

  1. Database queries and count
  2. Template rendering and templates used
  3. Outbound API requests
  4. All other Python execution Not only are database queries one of the biggest sources of performance loss and gain, they’re one of the easiest to diagnose, so start here.

Both database query performance and template rendering performance can be solved or ameliorated with relatively easy to implement and managing caching solutions, too.

After you’ve squeezed there you may find that you’ve tweaked enough of the performance such that it’s no longer worthwhile to continue tuning.

We’ve really only scratched the surface on a big topic. So could I ask you a favor? What’s one performance issue you’ve struggled with or seen that you’d like to see me write about for another issue?

Quickly yours,
Ben

[0] And the same is true of major sites built on any other language or framework
[1] A lot of this comes down to field-based decisions (including field uniqueness) but you can also specify index classes
[2] https://django-debug-toolbar.readthedocs.io/en/stable/
[3] Okay, you want to know when, don’t you? Complex logic for small data sets is a good enough explanation for now :)

Learn from more articles like this how to make the most out of your existing Django site.