The postings on this site are my own and do not represent my Employer's positions, advice or strategies.

LifeAsBob - Blog

 

Home

No Ads ever, except search!
Thursday, March 28, 2024 Login
Public

SAN Patching, you'd better ! 5/15/2009 1:14:28 PM

Recently I had a nice experience of working an outage of a SQL Server caused by a SAN Issue.  Here is where clustering breaks down.  Fortunately I work in a big shop which uses Microsoft, Veritas, Polyserve and VM Ware clustering technologies; but all of them have a single point of failure, the SAN.

The official response to the problem was:

We are experiencing intermittent {vendor here} issues causing some SAN storage to become read only. Server team is closely monitoring for this condition and putting the setting back to read/write. A fix is available and being planned for Saturday night, unless the issue becomes more prevalent that it is now.

 

Lovely.  What is missing from the statement above is depending on which clustering technology you are using, it may require a reboot to bring the storage back for windows (sometimes all nodes !).  Veritas, Polyserve and VMWare seem to handle san / fiber hickups the best.

 

It may be time to research a stretch cluster with different sans and some type of replication or mirroring.  The uptime of 9's (pick your number) is a difficult task to reach and in my opinion not truly possible with one SAN.  I've seen too many SAN Failures.  SANS are supposed to be built in redundant everything, but somehow almost all my outages on High Availability SQL Implementations are the SAN.

 

Of course it has to be something, i'm not inferring that a SAN is no good or poorly designed, just that as every point of failure is addressed, another one appears. 

 

How the vendor could know about this issue and not let us know, is confusing in itself.  The vendor is responsible for maintenance and patching of the SAN, seems they wanted to keep this bug "close to the vest" and maybe just "roll" it in with some other firmware patching.....i'm not impressed.

 

Keep your vendors accountable and ask them how often they patch the san, and what patches are missing from your environment.  Work with the vendor so they know that you are willing to accept patches and get them applied, don't wait for the bug to affect you before applying it.

 

This may apply to SQL Server as well, how often do we patch to a specific level and try and stay steady there, not wanting to apply all the cumulative updates, unless it affects something.  It may be an affect you don't like.

 

Be more pro-active.

 


Blog Home