I've recently survived setting up 13 new clusters and moving 4 others to a new "secure" network zone that required changing the IP's of the virtual SQL Instances and physical Ip's of the machines. 3 SQL 2008 r2 Clusters (with EMC Clones / BCVs) and 10 SQL 2012 clusters with always on.
TRUST NO ONE when you do this, not your SAN Admin, Network Admin, Firewall Admin, Active Directory Admin, Windows Admin, not even yourself. Get a dedicated conference room and plan to spend 3-4 weeks in that room, cancel your life while you do it, as you will not have one. You will work many many hours.
If you can find you a Microsoft PFE (Primary Field Engineer), I can't name who came to our site, but these individuals are incredibly helpful and I've never been disappointed with their work, definetly the best of the best.
Also, strangely almost everything is documented in MSDN or Books on Line, you may have to read.
Some things to look for when clustering goes bad, and SQL either will not install, will not fail over or will not start.
- Do not forget the ancient c:\windows\system32\drivers\etc\hosts file. Someone who puts an entry in here and then you inherit the environment will cause you much grief. The sql server startup should check this on start and put an entry in the errorlog file if it finds anything, just to make it known that there is "something" in there !
- Check the Client Alias's on the sql boxes, both 32 bit and 64 bit. We found a client alias on a sql server, and it was causing an issue.
- Ensure active directory is not over-writing local policy on the cluster nodes. We had a sql install on a cluster, everything worked perfectly and then after a bit of time, things would just go to hell. Turns out the sql install would adjust the local policy during install (for something say like "logon as a service"), than the AD policy would re-sync and push down and over write it, that only took a few days to find. "Of course your admins will tell you that isn't happenning !".
- Do not forget how to look at the cluster log, the event log (windows / cluster) are good, but do not contain near the same amount of information as dropping to a command prompt and typing "Cluster log -g". Once you type this command look in SystemRoot\Cluster and you will find a cluster.txt file that contains a lot more helpful information than any of the event logs. This command is only available via powershell on new versions of clustering (2012).
- EMC Timefinder, BCV/Clone pairs are still a pain to work with, EMC's solution is to upsell you to something else that costs even more money, nice.
Clustering is still a head ache, but if you need all those 9's and you need to backup terrabytes of data with a non-existant maintenance window, than you need it.