Nevergreen SQL Server deployments and mitigating patching pains

Not green, but still pretty

Not too long ago, Microsoft released The Final Service Pack for any version of SQL Server, as a part of their move to a Modern Servicing Model. This (and the fall time here in Finland) got me again thinking about the benefits of running your database workloads on PaaS over self-managed VMs. And one of the very real benefits is, that PaaS databases tend to be running with up-to-date versions. This is what you might have seen called as “evergreen” by Microsoft. When it comes to SQL Server running on VMs, well, these tend to be more of a “nevergreen” type of deployments. At best, they are somewhat yellow, but almost always never green.

And this isn’t an on-premises problem only, patching self-managed database services in the cloud is just as unpleasant as it is on-premises. But unlike death and taxes, continuously patching your operating system and the database software is something that you can avoid, without having to fear cosmic powers or a prison. It does require us to adjust our thinking on how we’re managing these systems, though: The databases aren’t the same as the virtual machines, and your virtual machines should be treated like they are disposable, not like pets.

Think of your VMs as disposable, not as pets

Back in the ground, in our on-premises data centers, we have always put our database servers above everything else. We gave them names, the fastest storage connections with dedicated fibers, and the most high performing disks from the SAN. And generally we treated these servers like they were the most precious thing we had. We also spend a lot of time patching the operating systems and the database engines, and worrying about the downtime the patching would require. And a lot of time has also been spent on fixing things, when something went sideways during the patching or other planned maintenance.

Then we went to the cloud, where spinning up new servers with pre-installed SQL Server is pretty quick, and they can be deployed directly from the marketplace. And yet, we don’t really take advantage of this very often. What if, instead of patching the same VMs month after month, you would just replace them with new ones? It sounds somewhat crazy at first, but give it some thought. Without the need to patch servers running production workloads, you can actually minimize risks for your environment. You also don’t need to carry on all the garbage from dozens of previous patching, but you get to start with a fresh environment each time.

What does disposable really mean?

Let’s take a really simple example, SQL Server with Availability Group running on 2 nodes. What you do, is basically this:

  1. Create 2 new VMs with the latest versions of OS and SQL Server
  2. Install any updates that could be necessary, such as cumulative updates
  3. Join the nodes to the Cluster and Availability Group
  4. Synchronize server configurations, databases, and instance level objects (logins, linked servers, etc.) to new nodes
  5. When you’re happy about the results, failover to a new node
  6. Remove original nodes from the AG and evict from cluster
  7. Destroy the evicted nodes
Diagram of a disposable VM based, SQL Server architecture.
So simple. At least in theory.

This gets even easier if you’re dealing with dev or test environments that have only a single node. You create a new server, log ship databases and perform cutover. The best part here is, that you avoid the risks for production workloads by not patching the original nodes at all. At the end of this process, the only downtime comes from failing over to the new nodes. And if there would be issues from the newer versions, you could always failback to a fully working environment with the old versions.

You can also use this exactly same approach to fix any issues you might have with your VMs. If it starts blue-screening or otherwise behaving badly. Replace it.

How to get there?

You get there just the same way you would eat an elephant. One bite at the time. For an existing environment, the easiest way to do this is to first perform these steps manually. Not through GUI or Azure Portal, though, but using things like Powershell. Once you do this one time, from start to finish, you already have all the scripts or cmdlets you need, to automate the whole process. These days, you can even get the ARM templates from the portal, to make automating these all so much easier in the future.

There are also some rather good open-source tools for automating most aspects of SQL Server management, personally I prefer to use DBATools. It provides hundreds of Powershell CmdLets for almost anything you want to do with a SQL Server, like managing your Availability Groups. It can also streamline some rather complex tasks, like copying all your scheduled jobs, logins, etc. between servers.

If you’re looking to do this for a completely new deployments, then you have plenty of options. Depending on your cloud platform, you’ll typically find some tools for deploying infrastructure: ARM, Bicep, CloudFormation, Terraform, etc. And once you have the infrastructure deployment done, you can pick your favorite tool for configuring all that: Powershell DSC, Ansible, Chef, or any other of the dozen products that does this.

What other benefits there are?

Glad that you asked. Once your VMs become disposable, you don’t need to back them up anymore. You do need to backup the databases, but the VMs themselves, who cares. It’s not like you are naming them anymore after your favorite sci-fi series starships, or anime characters. In fact, they should be named in a completely random way, and if you need to find them from the portal, that is what the tagging is used for. Removing the need for backups will provide some additional savings too, as backups aren’t exactly free in the cloud. The only backup you should need for the servers are the infrastructure-as-a-code scripts and configurations that you have stored in your source control system.

You will also improve the RTO of your database system. If you’ve ever restored a VM from a cloud backup, that’s sometimes super slow. And after you’ve done the restore for the VM, you will very likely need to follow up with a restore of the databases. Unless you have really, really relaxed RPO for that system. This whole process tends to take up more time than just creating a new server, and running the configuration scripts you have for it.

The VMs themselves will also have less garbage in them, when they get replaced regularly. Also, once you adapt the mindset that the server might not be there tomorrow, people will be less inclined to put extra stuff there. You know what I am talking about, those scheduled tasks they got in the task scheduler, or that one weird console app that needs to be running with someone’s login.

The End

Thank you for reading the post, hopefully this gave you some ideas on how to possibly improve your cloud based database estate.

Leave a Reply