No matter how fast your site is, you probably think it could be faster. Fast response times are great. They mean that pages load faster for customers and fewer server resources are required per request. That means your existing app servers can handle more requests and you can handle less infrastructure spend.
So how do you get there? First, measure. And then, among other things, identify how expensive your developer time is relative to the cost of a bigger cache backend.
We’ll assume two things: (1) that we’re concerned only with a read endpoint and (2) that we have identified that it is consistently slow.
You can get very far by first analyzing the endpoint locally using the Django Debug Toolbar. This is true even for API endpoints. There are some good API-specific tools for measuring performance, but at this point they’re unnecessary. The trick is to ensure you render to an HTML response. Using Django Rest Framework? Ensure you have the browsable renderer enabled locally. Using a custom JSON endpoint? Consider amending it to allow configurable HTML response of the serialized JSON for the sole purpose using the debug toolbar.
The debug toolbar will actually slow down your responses a little, but at this point the goal is less about gathering precise metrics and more about finding out where to start. With the right panels enabled you can break down query counts, query time, cache hits/misses, and template rendering time. These will point in the direction of the low hanging fruit (hopefully!).
If you see a lot of queries, even if the query time is a fraction of the response time, that’s often the best place to start. Inefficient querying is very slow and ORMs make it very easy to do. That said, Django’s ORM provides many features to correct for these, if you know what they are and where to use them.
We’re not going to dive into Django query optimization this week though (let me know by responding if you’d like to see that in the future though!). We’re talking cache strategies this week. The tradeoff for fixing more complex queries, e.g. with nested relations including generic relationships, can be costly developer time, with or without a clear finish point. If you have inefficient queries you should fix them! But even given the fastest, most efficient queries possible, there may be room for faster performance.
Simply caching the entire response is an aggressive strategy, warranted in some cases, lazy in others. If the response is the same regardless of outside variables like the time or the user, then it’s a brilliant strategy (provided you can invalidate the cache appropriately).
This is a good strategy for public facing pages or CMS type sites. As with most caching strategies, the most challenging part is figuring out how to invalidate the cache. But if your site is dog slow with a heavy request load and you can afford for information to be stale for a few minute, then you can safely use response caching with a timeout and just let the content stay cached for that period.
The primary downside is that if you change one thing then the entire cached response is stale and entire response must be regenerated.
Response caching can be handled at various levels, too, from a CDN to the webserver to Django.
If you’ve decided that you do need to worry about more prompt cache invalidation and your site is read-heavy and a significant part of the performance lags are query-related, then one option is to cache and invalidate on database queries.
This probably sounds like a lot of work, and it is, just not for you. There is at least one specific tool, django-cachalot, which uses a middleware to cache SELECT queries and invalidate based on modification queries (INSERT, UPDATE, DELETE). It means that any change to a table will invalidate any cached results referring to that table, but - but! - for read heavy sites this is a fine trade off. Further, the developer most of integration is fairly low.
As a bonus, this is a fine way - even for write-heavy sites - of [temporarily!] solving issues of duplicated queries. If 100 queries are fully duplicated within a single request then 99 of those will be cache hits.
If none of the above works by itself its time to get cache at a more granular level. This could template fragments, partial calculations, or object representations. In any event you’re left with very specific cached items with individually controlled invalidation.
The downside is that it’s not as easy as caching (and invalidating) everything in one fell swoop like the previous suggestions. However it can get you pretty far when the other strategies won’t, and even if they will it can speed them up.
An API response on a site with frequent writes is a good example. Consider a response that consists of a list of rendered objects. If each object representation is cached but not updated at the same time, then an update to one need not invalidate the rest. This means a response with 1000 objects can be rendered with 999 cache hits and 1 potentially expense serialization hit.
Further yet this kind of caching lends itself to granular cache warming - regenerating cache values for individual representations when they’re changed.
 Our frequent guest, the Django Debug Toolbar https://django-debug-toolbar.readthedocs.io/en/stable/
 Know your renderers http://www.django-rest-framework.org/api-guide/renderers/#browsableapirenderer
 The esteemable django-cachalot https://django-cachalot.readthedocs.io/en/latest/
Learn from more articles like this how to make the most out of your existing Django site.