The Managed Instance that couldn’t be deleted

I’ve written quite a few posts previously about creating Azure infrastructure, and it’s something that I do rather frequently. Typically, it’s to test the templates that I’ve created, after I’ve changed them somehow. While I fully expect my code to sometimes (often) fail, I usually don’t expect the removal of the Azure resources to fail. That is, however, something that happened to me a while back, when my Managed Instance deletion failed.

It failed
Sometimes you break the cloud…

The never-ending deletion?

I know that sometimes it takes a while to delete the resources from Azure, but I was quite surprised to see this status after coming back to work in the following morning. The virtual cluster deletion for the Managed Instance was still showing “In Progress”.

Managed instance deletion showing in progress.
The never-ending cluster deletion.

As I was looking at the Resource Group for some clues on what’s going on, I noticed that there were quite a few errors visible in the deployments. When I started to dig in further, I got a pretty good look at what was happening.

Managed instance deployment keeps on failing.
Okay, this doesn’t look too good.

I did spend a reasonable amount of time looking at the logs, trying to figure out what I could do here. But nothing I tried, seemed to make any difference. The deletion was stuck, and I was left with one last thing to do.

Which was to contact Microsoft Support.

They got back to me pretty quickly, after I had done submitting the issue. We then went back and forth with the support person for a couple of days over email. I wasn’t in a rush, since it didn’t generate costs and the deployment was done just for the testing purposes. However, we still had no luck in getting the deletion completed. Eventually, the support brought in someone from the product engineering team to look at the deployment, and they could perform the mitigating steps that finally removed the resources.

Why did the Managed Instance delete fail?

According to the root cause analysis from Microsoft, the issue resulted from failure to refresh the backend SQL component cache files. To me, this was a first time to run into this issue, but according to the support, it is something that can occasionally happen with Managed Instance deletion.

While most of the time my deployments to Azure work just perfectly, and the issues are typically of my own making, this was a good reminder to keep in mind the nature of the public cloud platforms. When you build, always build with the failure in mind. It’s not a matter if it breaks, it’s a matter of when it breaks.

And when it does, eventually, you can always reach out for support. Generally speaking, Microsoft does an impressive job with Azure support, they’re fast to respond and helpful. Except those times when the first suggestion to fix Azure SQL Database performance issues is to run index rebuild.

Published by

Leave a Reply

%d bloggers like this: