title: ‘Aggressively testing Django form validation (This Old Pony #58)’ layout: newsletter published: true date: ‘2018-08-07T10:45:00.000Z’

This week I wanted to write a little bit about a library I’ve mentioned before, Hypothesis, as well as the property-based testing methodology behind it. It can be a little challenging figuring how to adapt this to Django projects, or whether its even worth it, but I want to share one place where it really shines and that’s testing form validation.

Property testing in a nutshell

I want to explain property based testing by first explaining the mechanics. Property testing kind of like table testing on steroids, but with randomized data. 

Table testing, or parameterized testing, is the use of a table or list of records used to parametrize a single test, i.e. where there the mechanics of the test don’t change, only the values tried. For example, you might have a number of test methods that each test a different known value for a function and known expected result. These tests all do the same thing, so can be collapsed into one method that is parameterized on the data[0].

With property based testing the input values are generated by the test framework. For our purposes right now they might as well be random. “Lot of good that does” you might say, and if you’re testing _specific result values _then you’re right, it’s not very good. E.g. if you wanted to test the various values (range) of a function across it’s domain you’d have to know the result of the function at each point.

But what if you all you needed to test was a property of the function for certain parameters or combinations of parameters? Like, for any value in such and such a range, the result is _always _positive? Or, if any of such and such selected values are included, then the result is _always _False? In a very cursory nutshell, that’s how property testing works.

And for form validation, where we have a True/False result, it works marvelously.

Restricting viewers on an international brewery website

Here’s the scenario. You’re building a website for a brewery with an international audience and way too much investment in their outsourced legal department. They’ve got to have one of those little birthday prompts on their website and this one also includes a country and state/province selection. The user’s age is checked against the legal drinking age in their selected locale. The list of legal drinking ages is stored in the app in a Python dictionary, including country and states as nested values.

legal\_drinking\_ages = { "CA": { "BC": 19, "QC": 18, }, "US": 21, ... }

There are three fields in the form: (1) birthdate, a date field, then (2) country, a choices field, and (3) state/province. It’s not critical how the latter is populated, but we’ll assume it’s dynamically populated from data sourced from the app. The brewery’s legal department is _adamant _that no one be allowed to sneak through to view the website because the form didn’t correctly match them. And they couldn’t get data for every state/province just yet either, so in the absence of specific data for a country the oldest legal drinking age has to be used.

Now let’s be honest, this isn’t the world’s trickiest form to test. We can think of a few boundary values and test those[1]. Or… or we could just have the computer cleverly generate a bunch of data and throw it at the form allowing us to test a few known conditions. This means we write one test and even if in this case it looks a little overkill we can be very confident in the result.

Consider this pseudocode since I’m neither writing it in a proper editor nor writing the benefit of getting the argument names correct.

from hypothesis import given from hypothesis import strategies as st @given( st.dates(min=date(1890, 1, 1), max=date(2020, 1, 1)), st.choices(sampled\_from=master\_countries\_list), st.choices(sampled\_from=master\_states\_list) ) def test\_form\_valid(dob, country, state): try: max\_age = legal\_drinking\_ages[country][state] except KeyError: max\_age = legal\_drinking\_ages[country] if any([dob \< date(1900, 1, 1), # arbitrary 'too old' date (date.today() - dob).years \< max\_age,]): assert not MyForm(dob, country, state).is\_valid() else: assert MyForm(dob, country, state).is\_valid()

This test will run using Hypothesis’ default number of example runs, which is 100. That means this one test will be run 100 times, each with different values. And they’re not all random! Hypothesis will seek edge values values and remember across test runs which values were used before. The upshot is that sometimes even for what you think is a straightforward function you find some weird but unfortunately possible scenario which you hadn’t previously accounted for.

The utility of this kind of testing is most obvious in forms with custom clean methods, especially those with clean methods for the entire form. Once you’re in the business of testing form validation logic across different combinations of data you’re basically in the business of writing tons of test methods. 

And to be clear, this kind of testing doesn’t solve every testing problem, and it’s not always clear how to implement even when it could. But when you can implement it, and when it does make sense, it’s like showing up a playground basketball tournament with the Dream Team[2].

Assertively yours,

P.s. I failed to check the “max age” for a country condition, but that’s what you get for writing and thinking about code in a WYSIWYG editor.

[0] The library formerly known as nose-parameterized: https://github.com/wolever/parameterized
[1] Boundary-value testing: https://wellfire.co/this-old-pony/a-fistful-of-testing-strategies–this-old-pony-54/
[2] Wow, 1992!