Database restore performance oddities of SQL Server RDS

One of the things I’ve learned over the years is, that sometimes the performance in cloud platforms can be unpredictable and you always need to expect the unexpected. Most of the times these issues are just minor annoyances, but they can certainly be confusing the first time you run into any of them.

A while back I had one of these experiences as I was working with moving databases from on premises datacenter to AWS using the native restore and backup capability in RDS. It’s a really nice feature giving you ability to easily move data in and out from the RDS, using familiar tools for DBAs. However this time I ran into a weirdest issue with the initial restore performance.

When restores go slow…

Not in a hurry…

The database I was restoring wasn’t too large, if I recall correctly the size of the backup was only couple hundred MBs. The environment I was restoring to was an RDS making use of the Availability Groups for High Availability. There were no other databases restored there, so this was the first one.

What happened was, that I started the execution of the stored procedure that does the restore and then.. Nothing happened. I could monitoring the restore and saw that it was in progress, and it stayed that way for 40 minutes. The problem with RDS is that there’s no way to actually monitor that restore, all I could see was the SQL Server errorlog which didn’t tell me anything and the stored procedure that showed that the restore was underway.

Then after 40 minutes it suddenly took off, database got restored and we were done. Since I was already digging for the instructions on how to open AWS support case I decided to do some testing to get more data into the ticket. I ran the same restore few times and the seconds restore after the 40 minute one took 17 minutes, next one took 11 minutes and then rest finally settled somewhere below 10 minutes.

And you don’t know why

I never figured out the exact reason for this and there were never any errors about to explain me why it took so long the first time. My guess is that maybe the first time you add a database in there, the Availability Group gets created. And maybe even the secondary server gets spun up after that? If someone knows how this works, I’d really like to hear the explanation how AG’s get setup in RDS.

I’ve tried to re-produce this slowness a couple times later, but I haven’t been able to. So maybe it is one of those unexplained things that sometimes happen in the cloud. Like the kids these days say, “cloud goes brrrrr”, and you don’t really get the details, just the end result, eventually. Hopefully.

Conclusion

I really liked how the process of migrating from on-prem can make use of familiar tools like the native backups and T-SQL commands, even if there’s some oddities there every now and then. However when you’re planning a cloud migrations, these are the types of things you need to always factor in. While public cloud is essentially someone elses state of the art datacenter, in the cloud you won’t have the same visiblity to performance and you can’t yell at storage or network admins like you’re used to in the on-premises.

Or you could, but yelling at the clouds..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.