Making an app webscale, or, app schemas vs. DB schemas (This Old Pony #25)

Despite having authored an at-one-time-briefly-popular library[0] for working with Mongo in Django, I’ve done just about no work with Mongo, nor have Wellfire had any clients working with MongoDB. Ever. Really.

The excitement around Mongo, “back in the day”, was that as a schema less database it was simpler to make changes as you go, which is very important for fast-moving startups (or startup wannabes). And indeed it was really neat how you could add a field to a model without recourse to building a schema migration.

Discussing webscale databases

Except - except! - that you still had to update references to fields including potential issues if your model defined a required field which was previously empty. You could avoid this by using simple dictionaries I suppose, but then you’d lose out on validation, which was provided by… by having a schema.

See, the problem was and is that there is _always _a schema. The only question is the degree to which it is explicitly defined.

This is a valuable takeaway from which we can extract several lessons, but today we should look at what it means to have your application data defined in your database and how that might make making changes to your application easier.

Schematic ambiguity

When we’re working with Django models we almost always define the database schema via the model definition, using fields and Meta options. These classes provide a high degree of control over how data is validated at the model layer and what is ultimately written to the database.

Which is to say, they define the core data that makes up the application.

The problem lies in when these values do not align or do not _sufficiently _align with the data requirements of the app. The usual culprits are required [or not required] data, uniqueness, and value bounding.

It’s not uncommon for fields that are intended to have values to be defined with “blank=True” or even nullable. Usually this is due to historical reasons or because it’s desired to set the value programmatically before insertion, and it seems easier to prevent eager validation.

Uniqueness is too often left up to forms to validate, especially in the case of uniqueness across fields. Again, it may look simpler to perform the validation this way, and an unnecessary constraint to add to the database.

The case is similar - although not identical and less common - with value bounding.

A source of truth

All of the above become a problem for several reasons. One, they increase the surface area of values that you _must _check when performing custom validation and two they increase the surface area of weird data you must account for when pulling data _out _of the database.

The database should always be a source of truth.

This means if data is required, the onus should be on the functionality inserting data to ensure it is present. Any functionality extracting the data should be able to depend on the data there matching the application’s data requirements. And the clearest way to satisfy this requirement is with an explicitly enforced data schema.

Yes, you may lose some tolerance up front whereby you can just put data in and clean it up later. This may even mean exceptions if you don’t manage to update every part of the codebase that allows for data entry. However it consolidates this responsibility which is a huge win.

About Mongo and schemas

Sorry, this wasn’t really about Mongo. The key thing is that in a document database like Mongo[1] you can insert more or less arbitrary data. This flexibility puts the onus on the application to verify on read as well as write[2]. With a structured SQL database like PostgreSQL you can rely on the database to keep your data clean, even if as a last resort it means a failed insert. Now you can keep the validation in one level of the application and let the rest of the application worry about doing business logic.

Arbitrarily yours,
Ben

[0] https://github.com/bennylope/mongoengine-extras
[1] From the docs: “Unlike SQL databases, where you must determine and declare a table’s schema before inserting data, MongoDB’s collections do not enforce document structure.” Although it’s worth noting there’s now a newish feature to enforce document structure.
[2] y tho

Savvy readers will point out that this is essentially a discussion about relying on static typing in support of a dynamic programming language; these readers will please reference Emerson’s quote on foolish consistencies and little minds.

Wellfire Interactive // Expertise for established Django SaaS applications

Making an app webscale, or, app schemas vs. DB schemas (This Old Pony #25)

Schematic ambiguity

A source of truth

About Mongo and schemas

More like this