The postings on this site are my own and do not represent my Employer's positions, advice or strategies.

LifeAsBob - Blog

 

Home

No Ads ever, except search!
Wednesday, April 24, 2024 Login
Public

TOE, Packet Loss, Blue Screen crash, CPU Synchronization 3/4/2008 12:45:01 PM

Want Fries with that ?

Experienced a Polyserve failover today, root cause has been very difficult to flush out. 

One lucky hit was our solar winds monitor was reporting 80% packet loss on the server !

Since we are using server based fencing we wouldn't expect a crash dump.  The problem is that the node is fenced (power_cycled) before the crash dump occurs.  Unfortunately, there is really no way to determine the cause of the crash without a Memory.dmp.  But, one of the side effects of TOE being enabled on the servers is a blue screen. 

TOE, what's that ?

This stands for TCP/IP offload engine. Modern NICs have the ability to offload the processing of network transmission. This allows the CPU to focus on its other responsibilities. This feature is dependent on the  OS supporting the feature. SP2 supports TOE. This is the first iteration of Windows that does so. It definitely has advantages, but we have seen some issues in other areas so turning it off may help solve the problem.

Of course like any good new technology, it doesn't always work right and the side effect of TOE is a blue screen, nice !

Review the attached KB Article on TOE from HP:  TOE_KB.htm (5.76 KB) 

At the same time, we're still working on issues with the CPU synchronization errors in the SQL Server error log:

The time stamp counter of CPU on scheduler id 14 is not synchronized with other CPUs.
The time stamp counter of CPU on scheduler id 2 is not synchronized with other CPUs.

See Microsoft KB Article:  http://support.microsoft.com/default.aspx?scid=kb;EN-US;931279

Now this error I though only affected the AMD Chipsets, but this particular machine is an Intel 4 way quad core, so 16 processors.  On the AMD Chipsets we modified a boot.ini file with /usepmtimer, but on this intel box we had to go to the bios settings.

Hopefully after changing the BIOS Power settings to maximum always, we'll see no more fail overs and no more synchronization issues.


Blog Home