Horkay Blog
The postings on this site are my own and do not represent my Employer's positions, advice or strategies.
Friday, March 21, 2008

We have been experiencing a Polyserve Pan Pulse error that was difficult to troubleshoot and explain.  Most perplexing was the lack of failover, as the Matrix eventlog entry for the sql instance, indicates it stopped communicating than started communicating, almost like a "stutter".

Had we not been carefully monitoring the sql instances, we would not even notice this happens, as the sql instance does not fail over, it is stopped and started by the cluster software.  We run a sql agent starting job that sends an alert whenever the agent stops and starts, which alerts us to this condition.

For some reason this is only happenning on 3 of our newer machines, all DL585's 4-way dual core and a DL580 4-way quad-core machines. 

The event log entries are as follow, notice they are 2 seconds apart:

--------------------------------------------

Event Type: Information
Event Source: PANPulse
Event Category: Interface
Event ID: 100
Date:  3/13/2008
Time:  1:21:26 PM
User:  N/A
Computer: BCPLYSQL07
Description:
10.10.50.48     2008-03-13 13:21:26 Interface 10.10.50.48 address 10.10.50.48 has gone down
-------------------------------------------------------

Event Type: Information
Event Source: PANPulse
Event Category: Interface
Event ID: 100
Date:  3/13/2008
Time:  1:21:28 PM
User:  N/A
Computer: BCPLYSQL07
Description:
10.10.50.48     2008-03-13 13:21:28 Interface 10.10.50.48 address 10.10.50.48 has come up because interface statistics indicate there is incoming traffic
-----------------------------------------------------------

What we stumbled across when reviewing this was Flow Control.  The flow control is a nic card setting.  These 3 machines were all set to Auto.  There is an option in the HP Network Configuration Utility where you can select the Information; and it shows the currently selected Flow Control, of which all 3 of these were somehow auto-selecting Rx Pause.  We reconfigured this property to disabled.

We're hoping this resolves the issue, as we couldn't understand why a sql instance would go up and down with a pan pulse error. 

We have 9 servers clustered together and run SQL Instances on all of them.  We only experienced the pan pulse error on these 3 machines, and all of them had the wrong flow control.

Friday, March 21, 2008 11:46:15 AM (Central Standard Time, UTC-06:00) | Comments [0] | Polyserve#
Search
Popular Posts
Unpatched Vulnerabiltiy discovered ...
Spring Fornicator brewed...
TOE, Packet Loss, Blue Screen crash...
Bravo base to Ghost rider tango
Error installing Cumulative Update ...
Recent Posts
Archive
Links
Categories
Admin Login
Sign In
Blogroll