Back in the early aughts, before touch screen phones and Web 2.0, I had an internship at a web development firm. We worked on an in-house product - in ColdFusion, mind you - for selling travel insurance and we used the latest and greatest source control system available - Microsoft Visual SourceSafe. Think FTP but with file locks.
Fast forward a decade - skipping any mention of Subversion, not to mention CVS - and pretty much everyone is using Git. Git is so superior to SourceSafe in nearly every way, with the one exception being complexity of workflows. That’s probably an unfair comparison because they’re such radically different tools. The SourceSafe that I knew was a source control system, but not a version control system. There wasn’t much affordance for managing iterative changes in the code.
Hang on, we’re going to bring this back to working with Django projects, I promise.
One of the key benefits of a version control system is that you have a timeline of application state based on each committed change, including marked versions or releases (e.g. tags) and intermediate changes (e.g. commits). This allows you to see how an application behaved at a certain point, how it changed, when it changed, [given well documented changes] why it changed, and in short piece together a compelling narrative about the history of a piece of software. That’s yet a proximate benefit. At a deeper level, this makes understanding an application both simpler and faster for new developers. It makes for more accurate debugging forensics. And it provides context for both bugs and feature changes that can lead to process improvements, automated or otherwise.
So let’s address these suggested benefits and then what it takes to achieve them in your Django project.
First, a good Git history explains each atomic change, including why it was added and any other context that might be missing. Combined with the sequential log itself - and bonus for tagged major releases - a new developer can browse the history of the project to see where it was at its beginning and various stages, to get an idea for the cadence of change, why kinds of changes have introduced, when, in what sequence, how other developers on the project have approached changes, etc.
This history is also useful for identifying bugs, not just the bug itself but from whence it sprang. First, a well composed and ordered history lets you make use of git bisect, which combined with tests and exploratory techniques lets you quickly identify the individual commit in which a bug was first introduced. Is this for the purposes of blame? No, I mean, if you want to run a dysfunctional team then yes, but the answer should be to identify the what first and then the why; the who is incidental.
The adjacent changes can provide context about what was missed in the introduction of the bug, and the well written commit messages can provide clues as to any organizational or process forces influenced the decisions in the given commit.
Even for changes in the code that aren’t bugs a helpful Git history tells a story of why a feature was added, what tradeoffs were required, and the course by which data models and functions changed. Even without documentation or helpful commit messages the changes alone can tell quite a story.
If you’re still wondering why this is relevant, it’s because your Django project most likely uses Git, or a not-terribly-dissimilar tool, and it’s a critical component in the health and maintenance of your Django project.
In executive brief fashion, here’s how - from this point on - you can make sure your Git history serves as an asset:
Commits that are broken up by function enable a clean view into the changes. Mixing formatting changes into commits with feature changes obfuscates the feature changes and makes it harder to understand the changes and then separately to identify bugs (if that’s an issue).
A more impactful set of strategies has to do with how changes from the master branch are reconciled with changes on feature branches. If you follow the school of thought that all feature branches are very short lived and development is pretty much conducted on master then this is more or less a moot point. But if you have feature branches that survive active development on parallel branches then its something you need to take into consideration.
This last suggestion is a bit more challenging and not without controversy, but by rebasing prior to merging, the changes in a feature branch stick together, providing an accurate sequence of how changes were applied. Imagine tracking changes on a building under construction, in which different components are worked on in parallel. Every step is logged and you want to get an understanding of how the building came together. Under a normal merge, you’d view the log in a purely chronological order based on when each step of each component was completed. That’s neat and all, but what we really care about is seeing these in order of when each component was completed, and the sub-steps can flow from completion step to completion step.
The benefits of a functionally accurate log include the ability trace how features were built without noise from outside branches and often being able to neatly step through each layer (commit) in a feature to see how it’s changed, a handy tool for debugging.
That said, it’s a strong suggestion, and not a hill on which I want to die! The most important aspects of a healthy and useful Git (or VCS more generally) history are helpful commit messages and meaningful diffs.
Signed-off-by: Ben Lopatin <[email protected]>
Learn from more articles like this how to make the most out of your existing Django site.