It took me a while to write the first post for 2017, but it has been rather hectic at the office since I moved inside the organization to quite a different role. However I noticed this story a while back and it has been haunting me ever since. Now if you are a DBA then you probably already understand just how important it is to have, not just backups, but valid backups. But what is the difference of a backup and a valid backup?
I believe a recent story about GitLabs loss of data highlights this difference rather well. It all started with GitLab system becoming under attack by spammers hammering their database, which eventually led to issues with database lockups as the problem escalated. During recovery, while trying to fix the replication, there was a humane error and production data was removed. Normally you would go to the backups and restore the accidentally deleted data, except when something like this happens. The following is directly from the GitLab report on the incident.
So in other words, out of five backup/replication techniques deployed none are working reliably or set up in the first place. We ended up restoring a six-hour-old backup.
And they had that six-hour-old backup only because the DBA had manually taken one, as he was working on load balancing the database. Normally this backup is run once every 24 hours.
A valid backup
I admit that I read that text above quite a few times. Five different backup and replication techniques used and not a single one of them working properly. That is some amazingly bad luck, but also an excellent reminder of how important it is to actually test your backups to make sure that they are valid. I believe that most people who maintain databases have some sort of backup routines in place, after all, they are relative simple to set up. Now I don’t have any exact numbers on this, but I also believe that not that many actually test their backups. However the thing with backups is, that only way to be 100% certain that you have a working backup, is to restore it somewhere.
Until you do that, it is just a file from which you might be able to restore something.
Testing a backup
Now that we know that the only reliably working backup is a backup you have restored data from, we can concentrate on the means on how to get there. Naturally no one wants to do a database restore manually in a daily basis and luckily we don’t have to. One very easy method is to do it by using Powershell. The example below uses Restore-SqlDatabase CmdLet to restore a database from a BAK-file to another name and placing both data- and logfiles to alternative locations.
$DataFileLoc = New-Object Microsoft.SqlServer.Management.Smo.RelocateFile("TestDatabase", "C:\Databases\DBData\RestoreTestData.MDF") $LogFileLoc = New-Object Microsoft.SqlServer.Management.Smo.RelocateFile("TestDatabase_log", "C:\Databases\DBLog\RestoreTestLog.LDF") Restore-SqlDatabase -ServerInstance "LOCALHOST" -Database "TestRestoreDB" -BackupFile "C:\Databases\DBBackup\TestDatabase.BAK" -RelocateFile @($DataFileLoc,$LogFileLoc)
What you can do, for example, is to schedule a similar restore script to run every night after the backups are finished. If it fails, you’ll get an error that is picked up by your monitoring system. This makes it possible to react quickly and fix the issue before it becomes a huge problem, like when you need to restore something and you realize that backups haven’t been successfully run in a month. Naturally after the restore has been done successfully, you can drop the database in another job step.
Depending on the size of the backup and the space you have on your database server, you might need to do the restore elsewhere. Which would just require you to add a copy step to your restore job. Of course, it’s always nice to have enough free space on your database server to restore a copy alongside the production database, just in case.
Add some log shipping
I love log shipping because it is an old and extremely reliable feature. The ability to delay transaction log backup restores for several hours alone has been a life-safer on a number of occasions where an UPDATE or DELETE has gone bad. On top of that greatness, it also tests your transaction log backups for you! So basically, if you use a Powershell script similar to that above and combine it with log shipping, you will not only have a solution to test your database backups, but also every single log backup as well. You can’t really get much more thorough than that.