Treating database as a code

Everything as a Code

Over the last two years I have been working lot on improving the tooling and processes related to database development at where I work. One concept born out of this work is what we have started calling a “Database-as-a-Code” model. Originally the idea was to introduce some of the good practices, such as version control, build automation and unit testing to database development. Over the time it has evolved to include even more, in an effort to break down traditional silos between software developers, database developers and the operations people.

In this post I’ll describe some of the decisions we made and the steps we have taken in our Database DevOps journey.

In the end databases are just code

This is probably the one of the more important realizations I have had over the years. Just like the applications and services, the SQL Server database objects are created with a programming language (one of the oldest ones in fact, with roots going back to 1970’s) and are usually stored in their own “source files”, the scripts. Because of this, version controlling was really one of the first things we had to figure out how to do right. We tried a few different things and ended up with a solution where every object on the database was stored as its own script. Combined with version control, this gives an incredibly good understanding of how the objects are changing over the time and different versions.

One of the drivers for renewing the database development was that we were taking our system architecture towards microservices. The big thing in microservice architecture is the loose coupling of different services, which can initially feel bit complicated (scary) from the database perspective. For the database schemas this ment that while they would need to be loosely coupled from other schemas, they’d need to be tightly coupled with its application. In fact, each schema designed for a microservice was now owned by the said service, and we took it as far as to include the database code into same repository as the rest of the application code.

As we’d then proceed to build and test the microservices the same build automation, code quality analysis, unit testing and deployments now included the database objects alongside the services.

Updating the Database

Where applications are generally simple to update by overwriting or fist removing the existing code, databases contain a state that needs to be maintained over the upgrade. For updating the databases in a Database-as-a-Code approach, you need to have a tooling that can create the code required for the update. These tools usually work in one of the two ways, making updates either state or migrations based. Let’s take a quick look at these two.

State based updates

In a state based update you have a source and target database, then some tooling which will do it’s best to create the script to update the target to match the source. The source doesn’t need to be a live database, it can also be a bunch of scripts created from the database, but it will still work the same. There are some limitations in this approach, f.ex. how to handle data when you remove or add columns. That is why the state based tooling usually provide a way to add and execute sets of migration scripts to handle the data transformations.

Migrations based updates

Migration based updates require bit more work, but provide the highest level of control on how and what happens during the database update. With migrations the developer creates the scripts, implicitly telling the tooling how to perform the update actions on both the schema and the data. While this sounds bit more work, a good tooling goes long way to make writing these scripts as ready as possible.

Hybrid approach

There is of course the possibility of combining the two approaches mentioned above. I think it’s a bit more uncommon approach and as far as I have seen, the only vendor providing this kind of capability is Redgate with their SQL Change Automation. Their solution bundles together their old state based update mechanism (DLM Automation) to migrations based ReadyRoll.

The hybrid solution is actually pretty interesting one, and we’ve looked if we should move to this one from the current state based approach we have adopted.

But what about the data?

Version controlling databases and objects is a one thing, but quite often the databases you end up deploying aren’t just the objects, but need to include some amount of data. This data can be anything from registers to initial configurations required by the applications. Handling data inserts is bit more complex and we eventually ended up building our own solution for handling the data feeds, or data seeds, which ever term you prefer. Data feeds are simply T-SQL scripts that can be either:

  • Initial data feeds which are run only once, when the table is empty. This can only contain INSERT clauses
  • Update data feeds which are run when ever needed, and the table has some rows in it. This can contain DELETE and UPDATE clauses

We also had to add support for handling CSV files for really large initial data inserts (for millions and millions of rows), as well as some logic to run some of these data feeds in parallel to optimize the time it would get everything set up.

Data for different use cases

There’s also one more difference in version controlling data when comparing to database objects. Depending on where you deploy, the data content might be different even if the actual database schema remains the same. In production you might deploy just a few configurations, where in testing environment you might need to deploy millions of rows of synthetically generated data (especially  if you’re testing performance). In demo environments you probably end up deploying less than in testing, but it has to look very similar to that you’d expect to find from the production.

Wrapping it up

If you are looking to really do DevOps with software development, don’t ignore the database. Most of the modern applications are data-centric requiring some database or another in the backend, and quite often it’s the one thing that the application really needs to work in the first place. Luckily for us there’s a lot of tooling around for database DevOps these days, the job is just find the one that’s right for you.

Hopefully you found this post useful!

Published by

Leave a Reply

%d bloggers like this: