Writing a Deployable Application — Database Edition
With rare exception, applications and databases need to be deployed to be used. Whether it’s installing/updating your application across multiple customers or deploying your changes across Dev/UAT/Prod environments, you need to be able to push your code in a way that is deterministic and requires no manual intervention.
You can’t/shouldn’t stand over your customer’s shoulder while they install the application and walk them through “now, alter this table and execute these 3 scripts” or “Here, restore this backup”. Expecting to do it this way is unprofessional, usually not even feasible, and always prone to human-error.
In order to deploy changes to the database that goes along with your application, you need to apply some discipline. The following guidelines are broadly applicable to almost every situation I’ve faced.
As a side-note: throughout this article I’m using the term database but for your use-case the terminology used might better be schema or something different. The point is I’m not referring to the actual server software but the encapsulated set of tables, views, indexes, etc.
Every developer needs their own instance
This may sounds strange, wasteful, or even like a lot of extra overhead — but every developer needs to be operating on their own database. Usually, find a way to run it on localhost (containerized database server instances via something like Docker work well for this) but having a central dev server that runs a separate database for each developer is fine.
The primary reason for this is that development often involves experimental changes, breaking changes, and concurrent changes. Any of these 3 on a database with multiple people, or even one person working multiple branches will often result in disrupted or broken work.
Having a separate database dedicated to each developer will allow the developer the freedom to explore changes without worrying about disrupting, or being disrupted by, other developers. It has the additional advantage that a developer can completely destroy/recreate their own instance without impacting anybody else.
Every database alteration needs to be codified
When a change to the database is needed. It’s important that the change be codified. In this way, the change can be replayed across developers and environments. Finding out which changes need to be made and executing them is done via tools. A few examples of these are Entity Framework, Liquibase, and FlyWay to name a few.
Making the changes that need to be made are often called “Migrations” or “Evolutions”.
When I add a new feature that needs a change I don’t send out an email stating, “Hey, everybody add a new column to the Users table!”. Instead, an
ALTER TABLE ... command is scripted into whichever tool is used to migrate the changes. Though I will probably communicate to the team what change I’m making and why, the actual change isn’t orchestrated by a human.
Codifying the changes and using a migration method of some sort, allows the changes I make to my database to be carried across to other developers. It also means that I can deploy to a UAT/Testing environment and then after that is approved, I can be certain that when I deploy to prod the exact same changes will be made.
Codified changes need to be immutable
As a developer, you also need to be careful that the database changes are immutable. After you’ve integrated and your migration has been executed it’s important that you not change it. Changing an existing migration will mean that anybody who has already executed your migration does not get the change — as each migration will only be executed once in each environment.
Instead, the alteration needs to be added as another migration.
This isn’t a hard-and-fast rule, more of a guideline. There are cases where a migration can be altered after-the-fact but it should only be done using extreme caution. The guideline holds true the vast majority of the time.
Codified changes can be code-reviewed
Because altering the database is written in code and not some random developer or DBMS manually executing scripts in various environments, they can/should be reviewed in your pull requests. In this fashion, changes are more easily communicated and errors are caught sooner.
Integrate codified changes often
Having multiple database changes from multiple developers all come in at the same time can be a huge hassle. Instead, changes should be merged often to a centralized, common branch in source control. Developers should often merge their changes (via Pull Requests hopefully), and should often branch off the latest code to make future changes. In this way, conflicts or errors are found sooner, and the scope of those errors/conflicts are kept to a minimum.
Integrating often leads to …
Use automation to deploy your migrations
This is where CI/CD come into play. When code is integrated into a common code branch, an automated build needs to happen to do verification and testing followed by immediate deployment of changes to a central integration environment. This ensures that the deployment steps can be executed from start to finish, the application can be deployed, and the migrations can be run.
To reiterate, the deployment of your database, whether it’s an initial install or a migration to bring it up to date, needs to be automated. It needs to be done consistently across environments in a precise, logical manner. This can be done using some sort of release pipeline like Azure DevOps, TeamCity, GitHub Actions, etc.
In automating your deployment and migration process, you codify the steps taken, inject the secrets, and reduce the chance that human-error is introduced into the equation.
In the case of using the same database across multiple applications you probably don’t want each application to execute their own changes. Instead, the shared database should have an independent, centralized source of migrations. This ensures that the user writing the alteration has access to the previous alterations. Pull requests to this repository/project should cross team-boundaries so that other affected applications will both be notified AND have the ability to weigh-in on the review.
Databases can still be seeded via scripts. These seed scripts shouldn’t run on production. Whether they are run separately or somehow skipped in certain environments.
It’s better if these can, in some way, be specified to run in some environments but not others. The reason behind this is that if you write your seed data, it can continue to “migrate” after it was written. In contrast, if you write your seed scripts separately and you intend for them to be run manually by your developers, you need to maintain them. For example, if your seed script inserts 10 (even worse, 100 or 1000) rows into a specific table, and later you drop a column you then have to modify the seed script and remove that column from all those
INSERT statements. If instead, you were able to conditionally execute the script as a part of the migration process, then dropping the column would happen after the data was inserted and the seed script wouldn’t need to be corrected.
One example of this can be accomplished in Liquibase which allows for the specification of a “context” where a migration or “changeSet” would run in only in certain contexts (though to use this in Liquibase you want to always provide a context to the update command because if blank, ALL contexts run). Therefore, a seed migration in Liquibase might be specified with the context like so to only run in
dev and not any other environment (more complex expressions like
!prod and !qa are also possible):
<changeSet id="Insert test data to table XYZ" context="dev">
INSERT INTO ...
Execution of the migration in
dev would provide the context to the update command:
liquibase --changeLogFile=changelog.xml --contexts="dev" update
And if you executed in prod, you’d provide that context and it would not run the seed changeSet:
liquibase --changeLogFile=changelog.xml --contexts="prod" update
In architecting your process, consideration needs to be given to how the application will be installed and updated.
How you deploy these changes has a significant impact on the health of your project and your competence in the eyes of your customer and team.