The postings on this site are my own and do not represent my Employer's positions, advice or strategies.

LifeAsBob - Blog

 

Home

No Ads ever, except search!
Friday, April 26, 2024 Login
Public

Polyserve Pan Pulse Error 3/21/2008 12:46:15 PM

We have been experiencing a Polyserve Pan Pulse error that was difficult to troubleshoot and explain.  Most perplexing was the lack of failover, as the Matrix eventlog entry for the sql instance, indicates it stopped communicating than started communicating, almost like a "stutter".

Had we not been carefully monitoring the sql instances, we would not even notice this happens, as the sql instance does not fail over, it is stopped and started by the cluster software.  We run a sql agent starting job that sends an alert whenever the agent stops and starts, which alerts us to this condition.

For some reason this is only happenning on 3 of our newer machines, all DL585's 4-way dual core and a DL580 4-way quad-core machines. 

The event log entries are as follow, notice they are 2 seconds apart:

--------------------------------------------

Event Type: Information
Event Source: PANPulse
Event Category: Interface
Event ID: 100
Date:  3/13/2008
Time:  1:21:26 PM
User:  N/A
Computer: BCPLYSQL07
Description:
10.10.50.48     2008-03-13 13:21:26 Interface 10.10.50.48 address 10.10.50.48 has gone down
-------------------------------------------------------

Event Type: Information
Event Source: PANPulse
Event Category: Interface
Event ID: 100
Date:  3/13/2008
Time:  1:21:28 PM
User:  N/A
Computer: BCPLYSQL07
Description:
10.10.50.48     2008-03-13 13:21:28 Interface 10.10.50.48 address 10.10.50.48 has come up because interface statistics indicate there is incoming traffic
-----------------------------------------------------------

What we stumbled across when reviewing this was Flow Control.  The flow control is a nic card setting.  These 3 machines were all set to Auto.  There is an option in the HP Network Configuration Utility where you can select the Information; and it shows the currently selected Flow Control, of which all 3 of these were somehow auto-selecting Rx Pause.  We reconfigured this property to disabled.

We're hoping this resolves the issue, as we couldn't understand why a sql instance would go up and down with a pan pulse error. 

We have 9 servers clustered together and run SQL Instances on all of them.  We only experienced the pan pulse error on these 3 machines, and all of them had the wrong flow control.


Blog Home