Wellfire Interactive // Expertise for established Django SaaS applications

Triaging and squashing bugs in existing Django apps (This Old Pony #53)

We’re officially in the middle of an ongoing series on how to prioritize issues in your Django application.

This week, we’re going to talk about one of my favorite topics:

A very elusive bug

That’s right, bugs[0]. More specifically, strategies for triaging, identifying, and resolving bugs - in your web app, at least.
 

What is a bug?

A bug is an insect with piercing and sucking mouthparts_…_ sorry_,_ what I mean is that it’s an error, a defect, a thing that does not do the thing it was supposed to do.

Bugs can be of trivial concern and they can also cause industrial catastrophe. They can result in application errors and they can look like normal working software. They can be painfully obvious once someone has identified a problem and they can be subtle, based on factors no one imagined affecting the software. And perhaps most importantly, they can be as easy to fix as deleting - or adding - a delimiter, or require significant architectural changes to evade.

You won’t know how your bug or bugs fit into these spectrums initially, not for every category, but the first step in approaching any bug is to identify as much of the above as possible.
 

Triaging bugs

There are tools for debugging, including Python and Django specific tools, but - so that we can belabor the point - the most important tools are strategies. You need to understand who has identified the issue, when, doing what, and whether it’s repeated. A bug is like a bit like a crime statistic: the evidence is likely based only on reports, not actual incidents.

Ultimately, your goal is to figure out how critical a bug is, whether it’s user facing, and what the costs of leaving it unsolved are. If it’s a less-than-a-nuisance for one staff user and affects no one else, perhaps a resolution can wait. But if you discover that it’s a data error or application error even and causing ripple effects throughout the application for customers? Then you should tackle it immediately.

This goes against the common wisdom that you always tackle bugs first. However this common wisdom - while reasonably wise - misses the implicit economics of software development decisions. Weighing the costs and benefits of identifying and fixing a bug against the costs and benefits of working on a business driving feature, it may make the most sense to put off going after the bug first[1].

My own recommendation is to create a hybrid prioritization based on both criticality and perceived ease of identification/resolution. The latter is an element of a “snowball” strategy where you attack small things first and build momentum to the large issues. It tends to be more an issue of psychology (both individual and team) than technical, but code doesn’t write itself. You may be left starting with a giant ball of mud because it’s so serious and urgent, but most often you’ll end up rolling up at least a few good bugs before getting to something really hairy. Aside from generating happy feels from watching tickets close, you’ll also likely start clearing the field for additional updates and bug fixes.
 

Quick wins: what to look for

When you’re hunting for bugs with a somewhat-known scope, there a few things you can keep an eye out that tend to lend themselves to including or hiding buggy behavior:

  • Giant try/except blocks: often swallow errors and hide what’s going on - if required these should log exceptions
  • “Naked” try/except blocks: like the above, and often just followed by “pass” and…. nothing
  • Mutable default function arguments: these will change with the function call resulting in unexpected values passed into your function
  • Insufficient use of custom queryset methods: no, really! these are independently testable, sure, but they also allow for consistent querysets across different parts of the app
  • Huge views: so many bugs tend to accumulate in oversized views! Just like out in the woods there are more places to look than under rocks, this is just a sampling of where to start.
     

    Tools of the trade

    Debugging tools let us do several different things: observe, replicate, inspect, and test. 

In the case of application errors, the first step is making sure you have a good system set up for tracking exceptions and errors. Logging is a good step here, and if you get the full stacktrace included, that’s a significant step forward. In our own work we usually recommend a dedicated error tracking service, and when integrated into your project these can often get the full stacktrace by default, including context for each frame, which is _exceptionally _helpful.

More generic logging of minor issues and general information is also helpful. However advice to “use logging” is good but more often than not a bit weak on details. What should you log, when, how? This will vary from situation to situation within an app, but there are a few good places to start:

  • Calls and interactions with third-party services, i.e. HTTP APIs: at least know when these calls were made and whether they succeeded or not
  • Significant changes to state, especially when invoked by an app user
  • Significant user actions, including the above
  • Any asynchronous tasks should be logged, whether executed by a worker such as a Celery or cron Once you’ve gotten close to identifying the bug, you’ll likely need to do further discovery locally. A debugger[2] is an obvious choice here, which will allow you to step through lines of code to examine the context at each step. These can be less handy when there are lots of steps involved, and sometimes you just want to inspect one or two points in the process. That’s where you can just drop in with IPython[3] and inspect, even modify, the state of the program at that point. Modified values will carryover after you exit the IPython process, allowing you to test how these changes affect the end outcome.

Testing in production gets a bad wrap, but you already do it and so does everyone else[4]. The referenced article is more about testing new changes, but you can also get data from production to identify your bugs, beyond logging. Provided you’re not changing their data or violating privacy agreements, you can _assume the role _of an end user and see the site as they actually see it. There are tools for this[5] which don’t require that you log in with someone else’s password. It can also be handy to include some data in your presentation layer, whether that’s an HTML template or a JSON API response. This might include version sentinels or other identifying information which will give you some information about the source of the current state.

Last but not least, tests (i.e. automated, code-based tests) are not just useful for ensuring bug fixes work, but can be handy for identifying bugs to start with. An underused strategy in testing Python projects of all kinds is property based testing. Hypothesis[6] is the tool of choice in Python, and in essence it allows you to run highly parameterized, randomized tests. Imagine a parameterized test (or table test) with values for each provided argument that span the range of allowed values for each, hundreds or thousands of them, that allow you test the range of a function given a specific domain. It can be tricky to wrap your head around at first because you don’t have recourse to testing for specific result values, but it’s invaluable for finding edge cases that are otherwise difficult to think of.

Ever on the hunt,
Ben

[0] Technically Apheloria virginiensis isn’t a bug, seeing as how they’re Diplopoda _and not _Hemiptera, but colloquially, yeah.
[1] Yes, this is technical debt, not the creation of it, but the carrying forward of it. It’s like not paying down a loan when you have the money to do so because there’s another use for that cash that’s more critical or carries a greater return. If that sounds dangerous to you think about it in the context of a business rather than personal finance. There can be long term benefits to tackling bugs first as a policy, as over time it may focus you (or your team) on producing fewer blocking bugs, but getting to the long term often requires slightly different decisions.
[2] pdb: https://docs.python.org/3/library/pdb.html, variants on PyPI: https://pypi.org/search/?q=pdb
[3] Embedding IPython: https://ipython.readthedocs.io/en/stable/interactive/reference.html#embedding
[4] “Testing in production: Yes, you can (and should)”: https://opensource.com/article/17/8/testing-production
[5] User hijacking: https://djangopackages.org/grids/g/user-switching/
[6] Hypothesis: https://hypothesis.readthedocs.io/en/latest/

Learn from more articles like this how to make the most out of your existing Django site.