A customer hits your site, fills out a form, and hits the submit button. What happens?
Do you they see the next page or do they wait and wait for a chain of decisions and database commits and even timeouts?
Adding asynchronous execution to your Django app is one of the most effective ways of improving overall performance and improving customer satisfaction.
Async execution is not the same thing as AJAX, although the two can be used together. By asynchronous execution I mean outside of the request and response cycle. Using AJAX can, depending on the structure of your app and requests, improve the perception of performance, whether by breaking up requests or allowing for some kind of visual interstitial. However AJAX requests still require a request and response.
So why is this a problem? After all, HTTP requests and responses are very means of interacting with a web application!
For one, web requests, long ones, that is, are prone to timing out. The default timeout in many configurations is 30 seconds (Heroku enforces this, and this is the default configuration in Nginx as I recall). This can be tuned, if you own the web infrastructure, but even 30 seconds in an eternity for web users.
Then when a request does fail for some reason, whether it’s a timeout or an error, the only means for retry is left to the end user (aka “the customer”).
And requests soak up web processing. Most people have finite deployed web resources, only so many processes and so many workers. There’s a scarcity factor here and the longer any one request runs the fewer available processes there are for new requests.
Lastly, async execution is not the same thing as using asynchronous methods from within the request and response cycle, e.g. making concurrent API requests would still, in the context of the customer, be considered synchronous execution. The individual requests may be asynchronous but the user still must wait on the requests for their response.
(And I hope you honed in on that phrase, “interacting with a web application”, because that’s going to be very important shortly.)
At the delight of generalizing here, there are two questions which will guide what to make async:
Execution of tasks directly involving revenue in a business should often be synchronous. If someone is subscribing to your site, you want to ensure you have actually captured their subscription information before giving them access. Aside from losing money over customers who haven’t actually subscribed, you’ll create unhappy customers when you try to rectify this.
Sending email is something of a canonical example for making async tasks. Few people want to wait hours for an email from a website, but most people expect that emails might take at least a few seconds to arrive. This goes for many other notifications as well, whether direct user communication or requests to third party APIs. If you rely on any kind of marketing automation integration, this can almost certainly be made async.
Lastly, anything more calculation intensive is a good option. This includes complicated calculations (which change only with the committed data) and indexing content. I’m not aware that you can make indexing async directly in your database, rather, this is an advantage of using a dedicated search index (e.g. ElasticSearch or Solr) if you have non-trivial search needs. Indexing on content updates can and should be async.
(Sorry about the earlier teaser - the key here is that lack of immediate interaction is a good indicator that something can be made async.)
Okay, first get RabbitMQ and Celery and… okay, let’s slow down.
Before you start building infrastructure you’d be well served to make sure you have the structure first.
If you want “things” done async, first make sure they’re encapsulated by tasks (a function, for example) that work with only the absolute required data. They don’t take the full request as an argument, they take just the user, or an email address. For reasons general to async execution and specific to task queues, send even less information in some cases, e.g. not a user but just the user’s primary key in the database. We want lookups to be fresh when the task is executed, and to keep as little data in the queue as possible.
An acid test for your tasks is whether they can be run from a management command. Not only does this come in handy for testing (bonus) but it ensures that the task encapsulates what it needs to do - and maybe too that your management commands start taking the shape of thing, command line interfaces to well architected logic.
The first thing you’ll need for async execution is a queue of some kind. This could be an AMQP broker like RabbitMQ, a key value store like Redis, or your database. There are very good reasons that using your database is not a great idea, but for many uses cases it’s actually the best option, all things considered. If you have only a couple of types of tasks that can be run in intervals, then you can combine a cron task with a management command to poll your database for updates and process as needed.
For most Django based sites, especially SaaS sites, running a separate worker process with a Redis backend is the best option. Celery is the go-to workhorse, but we’ve developed a favorable impression of RQ. It’s not quite as powerful as Celery, but it’s simplicity is a virtue. In either case if you have your asynchronously executable tasks properly encapsulated then turning them into async-ready tasks for your async workers will be the easy part, and only some minor changes in the main application code are then required.
What about logging for async tasks? And what about using Channels with Web Sockets? A critical topic in the former, and a very interesting one in the latter, but each requires at least one edition of This Old Pony to cover on its own.
Remember, async execution is not a magic bullet. It goes a long way but is only one arrow in the performance quiver.
Learn from more articles like this how to make the most out of your existing Django site.