Containers are a popular way to virtualize and package applications, but when it comes to database workloads, the adoption has definitely been much slower. The technology has been around for a long while (at least in the IT-industry or dog years), and I wondered if we’re ready to use it widespread for databases. And if not, what is holding us back?
In this post, I’ll take a look into the wonderful world of containers. We start by looking at the current use cases, and then at what lies ahead.
Do I use containers for database workloads?
I do, for some specific use cases. I don’t like to run database engines on my own laptop, so if I need one for development work, I’ve previously used Docker. The database engines that I’ve used with Docker are SQL Server, for most of the time, but occasionally also PostgreSQL.
Both have official images available from Docker Hub.
- postgres – Official Image | Docker Hub
- Microsoft SQL Server – Ubuntu based images by Microsoft | Docker Hub
The other, more common use case for using containers is related to topics in my recent posts. When I build Azure DevOps Pipelines, rather than using SQL Server in Azure as a target for my scripts, I just spin up a containerized SQL Server and create a test database on it. It takes just a few lines of code to do it.
In less than 30 lines of code, I can install SQL Server, SQL server client and run the SQLCMD command to create a database, that I can use further down the pipeline for testing. And all that happens in less than a minute.
Thanks to the fact, that Docker containers are not by default persistent, I don’t need to worry about cleaning up after myself when I am done.
Where else are database containers used?
If we look at SQL Server specifically, the support for running it on containers has been around for a while already. It was introduced back in 2016, alongside the option to run SQL Server database workloads on Linux. We’re still some way from seeing containers being mainstream for running SQL Server workloads, but several other cases come to mind.
The containerization of databases seems to be picking up speed more quickly in the open-source database space. CockroachDB can run on Kubernetes, Vitess has a Kubernetes operator, and there’s a K8ssandra for Cassandra.
While open-source seems to be the forerunner in here (probably to no one’s surprise), commercial databases do follow the trend. Azure Arc-enabled data services, is a great example of this. I was lucky to be invited to get my hands on this very early on. It was also my first time to work with containers in a more serious manner. Another commercial database offering that makes use of containers is IBM, with their DB2 Click to Containerize offering.
What is the potential for database workloads and containers?
Personally, I see containers being an integral part of building database services in the future, by bringing the capabilities typically associated with cloud-native to everywhere. Some scenarios that immediately come to mind, when thinking about where I could easily apply containers:
- Creating SQL Server Reporting Services container for Azure SQL DB.
- Debug and test production DB issues and DR capabilities.
- Migrate databases (to the cloud, between the clouds).
And this is just, when consider it from the infrastructure level. Going beyond that, other ideas for creating data products aren’t too difficult to come up with. Containers also support the driving goals of platform engineering, enabling multiple self-service automation scenarios for database delivery.
How about the benefits for database containers?
Containerization brings with it many benefits, some of which are quite well understood: Portability, speed of deployment, etc. What many database people (like me) don’t necessarily recognize are the other benefits. One of the concepts that I find intriguing with the containers, is the service mesh.
Service mesh is a dedicate layer, with the responsibility to handle inter-service communications. While that doesn’t sound too impressive, what it really means is, that service mesh enables:
- Telemetry collection from the services
- Routing network traffic
- Enforcing networking policies
With these, you can actually start building more secure and more observable database services. As an example. Some service meshes, like Istio, can also provide metrics and performance information about individual queries.
What are the main challenges with database workloads and containers?
This is just my opinion. Old habits, for one, play a big part in it. When virtualization in the form of VMs was introduced, no one was pushing their database workloads there first. Some of us who we early to adopt the new technology, certainly got our fingertips burned and occasional database corrupt. Yet, despite the early issues, VMs are an extremely popular way to host databases in on-premises and in cloud today.
The other main challenge I see is, that you really need to know the container platform of your choice quite well. It takes a very mature, and skilled ops team to manage any virtualization platform (also something we learned in the early days of Virtual Machines), to make it suitable for business critical workloads. This applies even if you’re considering using one of the managed Kubernetes services, like AKS. They do take away some operational overhead, but you still must understand the technology you are using.
How do containerized databases compare to PaaS ones?
This is an interesting point. While containerization does bring the cloud-native capabilities everywhere, it also introduces a new platform for us to master. When I participated in the early testing for Azure Arc-enabled data services, the first thing I created on the Kubernetes cluster was a SQL Server Managed Instance. It was quite similar to Managed Instances that I were running already in Azure, with the exception that I now had to manage and understand the underlying cluster architecture.
This is why I still prefer PaaS to containerization. When I run databases as a service, I don’t necessarily want to build my own Azure to do it. I am absolutely thrilled to let someone else deal with the complexities of providing me with a stable platform. The other reason I prefer PaaS databases, is the fact that containers are still running within a cluster. This means that PaaS databases will have a much larger pool of hardware and data centers to run on.
If I was considering building a PaaS database service though, I would build it on top of containers. However, I would also hire a decent number of smart people to run those container environments for me.
My opinion is, that the technology itself is mature and battle tested. If you’re planning to go down this way, just make sure that the operations teams you have, know what they’re doing. And even then, start with small, less critical databases.
This is very likely my final post for the year 2022, thanks for reading, and see you next year! I plan to get more hands on with containers then, and that’ll hopefully result to at least one blog post about the topic!