We’ve been looking at why people erroneously think they’ve reached the limits of their software stack:
Both generally and with regard to Django, in most of the cases where I’ve seen _ people who think they’ve reached the limits of what their tools can do, they’re simply wrong, often by significant margins _. The reasons, though, vary a lot. They’re interesting in and of themselves but identifying them is important unless you have piles of cash to turn and don’t mind risking your entire software stack.
This week we’re going to look at “sub optimal” architecture in Django apps and the problems it causes. Just like the refrain from that classic rock hit[0]:
Bad software architecture
And I can’t deny
Bad software architecture
Till the day I die, oh
Till the day I die
Till the day I die
What I mean by architecture
There are fairly specific definitions for things like “software architecture” and “systems architecture” but here I’m going to use it a bit more generally.
What we’re talking about it the design of the parts and how they interact. For a web application it’s fair to include higher level “architectures” like the relationship of the services you’re running, from the application server running Django to the web server, databases, task queue, etc. This also includes lower level “architectures” like how [Django] apps are structured, and even the interfaces available within the application for sharing or updating data (including middleware, template context processors, etc., etc., etc.).
If at any point you feel like substituting “design decisions” for “architecture” feel free.
What architectural problems look like
It’s story time, and I’m going to tell you about the worst architecture I ever created.
We started on what was supposed to be a fairly small project that the client originally described as a “pretty much just a database”. It just needed some facility to fetch data from a third-party service and then update the data locally with a dashboard for admin users.
Actually, it was an applicant tracking system (ATS) for managing assessment workflows for multinational HR departments that would use web services to integrate with a number of third-party assessment providers and then allow the client to create custom scored reports for each candidate.
Here’s what we didn’t know at the very beginning:
- The application was going to need to be multilingual
- It would be multi tenant
- It would require workflow processing for managed by customer users to guide other users
- These other users would be guided through steps involving third-party sites
We figured that all out pretty early, but here’s what we learned only once we were under way:
- The third parties which we were promised all used the same industry standard, didn’t. Even the ones that did use the same standard used different versions of it.
- Some of those third parties were developing their solutions at the same time
- Unicode is hard
- Site usage was very spiky, but somewhat predictably so
- The third parties would send data to our API even when it wasn’t requested
- The site would have to support IE7 because the client’s bank customers upgrade their internal infrastructure once ever 24 years
- The dashboards that the client gushed over would have a hard requirement of supporting IE6 because one important client had users based in their stores where the computers only supported IE6
The whys behind this are interesting, they’re important, and we’re going to skim them for now:
- Unknown requirements (due to project management or analysis failure[1])
- Changes in requirements (due to changes in the outside world[1])
- Insufficient understanding of the problem or the domain
- Inexperience (more generally)
So what did a bad architecture look like?
- The web application was broken into a few Django apps, but most of the functionality was in one main giant Django app.
- Everything went to one domain, including customer access and the API (which third parties used for updating candidates’ assessment data).
- The model for updating assessment data was focused on transforming all incoming data into a locally defined model before saving.
- There was stuff in the database that really should have been in source code, namely workflow models.
It would have been easy to look at the site getting sluggish when *every single applicant to be a bank teller at the bank’s China branches ran their assessments at the same time* and say, “Django isn’t up to the task” or blame PostgreSQL when the entire database became corrupted due to the failure of a networked drive, but in every case the real blame was in the design decisions (seriously, don’t use networked drives with PostgreSQL).
So what’s the problem?
Let’s do a drive by, point by point.
- Having everything in one giant Django apps made testing difficult, not to mention subsequent changes.
- Breaking out at least the API by a different domain would have made it easier to break out so that people and machines were hitting different servers at the same time.
- Transforming the data first not only slowed down the response time for API requests, it meant we lost out on a bunch of data that we had to figure out how to accommodate later. If we had matched the local models to the incoming data and just saved it as is in service specific tables, we would have had performance plus flexibility. Concerns about database size were unfounded, and these tables could have even been moved to a different database.
- With the workflows, you could test the workflow engine but each workflow (for each position) had to be set up manually anyhow, and then tested manually. Getting it wrong, or at least far from optimal, in the “should it go in the database or should it not?” question can cause quite a bit of pain.
And here’s the thing: you can get these kinds of questions wrong with any language or framework, just as you can get them right with pretty much any language or framework. Your tech stack matters, don’t get me wrong! But it won’t solve your problems by itself.
You won’t win a race in a Porsche if you leave the parking brake engaged[2].
Over-architectedly yours,
Ben
[0] Slightly modified, from Bad Company’s song Bad Company from their album called Bad Company (remember, naming is hard)
[1] Does agile solve for this? I think the answer is “it depends”. Agile (TM) probably can’t, but *agile* closer to the original manifesto will help. Ultimately if your target shifts you’re still going to have to backtrack, the hope is that your methodology reduces the need for backtracking, at least w/r/t latent requirements.
[2] Maybe. Years later the application above is still alive. It was a success, as far as the client and their customers were concerned. I’d tend to think it’d have been a bigger success and quicker given the alternative to some of the questionable choices enumerated above, but working software is working software. ¯_(ツ)_/¯