Horkay Blog
The postings on this site are my own and do not represent my Employer's positions, advice or strategies.
Wednesday, July 02, 2008

Our Polyserve cluster took a deep dive and crashed, all nodes.  Root cause is still under research, but basically we zoned some new storage to the cluster and after a reboot of the nodes the Polyserve software was unable to read or write to the membership partitions.  Of course the error didn't state that, as that would have made troubleshooting the problem easier, instead we received this error:


Event Type: Error
Event Source: sanpulse
Event Category: SAN Storage
Event ID: 17005
Date:  7/2/2008
Time:  9:35:40 AM
User:  N/A
Computer: BCPLYSQL03
Description:
This matrix is unable to take control of SAN because the servers are unable to perform fencing operations, possibly due to a networking or fencing hardware failure or misconfiguration. As a result, some or all filesystem operations may be paused throughout the matrix. In addition, filesystem mounts and unmounts and disk imports and deports cannot be performed.


We have zoned storage to Polyserve many many times, and never had a stability issue, we've had isolated issues with LUNS not showing up, mini / storport issues, emulex issues, but nothing that caused the cluster to become unstable.

So we eventually de-zoned the new storage, rebooted the entire cluster and everything worked fine.  We're not sure if we zoned the storage incorrectly (we have a new SAN Administrator, so maybe it wasn't done correctly), though I don't suspect this.  Our SAN Administrator while new has succesfully zoned storage to our clusters in the past with no issues, and understands how / what Polyserve is. 

More so, I suspect some internal issue to windows / emulex / Powerpath or something that upon the zoning of the new storage, caused the LUN Id's change to map incorrectly.

 

Wednesday, July 02, 2008 9:47:50 AM (Central Standard Time, UTC-06:00) | Comments [0] | Polyserve#
Search
Popular Posts
Unpatched Vulnerabiltiy discovered ...
Spring Fornicator brewed...
TOE, Packet Loss, Blue Screen crash...
Bravo base to Ghost rider tango
Error installing Cumulative Update ...
Recent Posts
Archive
Links
Categories
Admin Login
Sign In
Blogroll