SAN migration in Windows Failover Cluster

I recently did a migration from one SAN to another and decided to write a quick blog post about the procedure I used. In this particular case the difficult part was handled by the SAN administrator as we were moving from one manufacturer to another. He had the pleasure of trying to add disks from two different storage systems to two nodes, which required not a small amount of dismantling features such as MPIO. We did have some problems with disks showing up multiple times, but nothing we couldn’t work around with.

After preparing disks in Windows with the usual initialize and formatting routine they were added to cluster through Failover Cluster Manager. After adding the disks we also did the validation for storage to make sure that they were clusterable.

Creating Dependency Report.
After the validation test we documented the dependencies in the group by saving the HTML page from Show Dependency Report. Since we had to remove the old disks, the dependencies would be gone so having them documented is a good thing!
Something new and something old.
Having saved the dependency reports we proceeded to take all other resources Offline except the disks, after which we added the new disks to their planned resource groups. At this point we had both the old (P2*) and the new (EC2_*) disks sitting there nicely side by side.

I personally like to do things one at the time, so I just picked one disk pair to copy between. This one disk at the time approach is of course only good when you’re dealing with relatively small amounts of data and the downtime will be short. In this case I also always used the same drive letter for the target (K:), so I only needed to change the source drive and the log file name for the command.

robocopy_start

If you ever need to perform more complex copying the Robust Copy File or as it’s more commonly known, ROBOCOPY, is the tool to use. In the example above I’m copying drive H: to drive K: doing mirroring (/MIR) and taking with me all the security details (/SEC). In case of problems I re-try (/R) 2 times with wait time (/W) of 1 second between re-tries. I also want to log it into a file (/log) and I’m excluding directory (/XD) called “System Volume Information”. Besides the log file I also want to see everything on the console (/TEE) with the estimate time it’ll take to copy the files (/ETA). I also want all of the file information (/COPYALL).

When the files have been copied, you’ll see a following screen with the information on how it went.

robocopy_end

After making sure that everything was copied properly I removed the drive letter from the old disk and change it to new drive (K:), added the required dependencies I had document and then removed the old disk from the group back to Available Storage. After all the disks in the group had been copied and the dependencies were set the resources could be brought back Online.

Some additional tips

I had originally planned to use a certain workaround that can be used with Failover Clusters to replace the disks after copying the data was done, but in this particular case it didn’t work out of the box. As there weren’t that many resources to migrate I decided not to spend time figuring it out and to do dependency mappings etc by hand.

The workaround is to change the allowed failure count for the clustered disk resource to 1 and tell it not to initiate failover after failing. After you copy all the files to the new disk you can then initiate failure from the Failover Cluster Manager manually on the old disk and use the Repair function to replace the “faulty” disk with the new one.

Author: Mika Sutinen

Hi, My name is Mika Sutinen and I'm a Senior Database Administrator for a company called Tieto. I've been working in IT-industry for two decades and I've spend most of my career working with healthcare information systems. I've worked with SQL Server for most of my career, starting with version 6.5 a long, long time ago. My other interests are high availability, everything related to performance (testing, monitoring, etc), Windows operating systems and I'm currently learning more about Azure.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s