Sun Fire[TM] 12K/15K/E20K/E25K: WDC Event on CPU### at TL=0

Asset ID:	1-72-1006038.1
Update Date:	2017-08-01
Keywords:

Solution Type Problem Resolution Sure

Solution 1006038.1 : Sun Fire[TM] 12K/15K/E20K/E25K: WDC Event on CPU### at TL=0

Applies to:

Sun Fire 15K Server - Version All Versions and later
Sun Fire 12K Server - Version All Versions and later
Sun Fire E20K Server - Version All Versions and later
Sun Fire E25K Server - Version All Versions and later
All Platforms

Symptoms

The following error messages appear in the domain messages file:

NOTICE: [AFT0] WDC Event on CPU419 at TL=0, errID 0x0000002f.517933f1
AFSR 0x00300440<ME,PRIV,UCC,WDC>.000000a8 AFAR 0x000001c1.fa859580 INVALID
Fault_PC 0x100596ec Esynd 0x00a8 AMBIGUOUS SB13/P3/E0 J7400
[AFT0] errID 0x0000002f.517933f1 Data Bit 95 was in error and corrected
Mar 14 15:52:12 2002 NOTICE: [AFT0] UCC Event on CPU419 in Privileged mode at
TL =0, errID 0x0000002f.51e2a326

Changes

Cause

A WDC event, is a correctable Data Cache Parity Error during a Write-back operation. Since we only detect PEs when reading out of the data cache, and a writeback operation is generated when a dirty (modified) cache line is being pre-empted for a new cache line, and therefore, the modified cache line has to be written back to memory, it is safe to assume that the CPU was not directly involved in the Parity Error other than the fact that it occurred in the ECache connected to it.

It is overwhelmingly likely that the error is in the ECache SRAMs (It is slightly possible the failure lies in the CPU itself), but we swap the same FRU (System Board) to correct this regardless of whether it is the CPU or the Ecache SRAM which caused the failure.

Solution

For a single occurrence of a correctable Data Cache Parity Error, no action should be taken except to log the failure and monitor the domain to be sure it doesn't repeat on the same bit. This is similar to an ECC correctable memory error and should be treated as such.

For multiple occurrences of such an error on the same SB gather an explorer from the Main System Controller and Domain and contact your hardware service provider.

See also <Document: 1019337.1> Introduction to cache-line retirement feature for Ultrasparc-IV+ (USIV+) processors

For multiple occurrences of such an error on the same Ecache SRAM and the same bit, replace the implicated SB. The actual bad part, as stated above is most likely the Ecache SRAM, but the SRAM is not a FRU. Therefore, the SB is the FRU which would be replaced.

Reference <Document: 1004903.1> for UltraSPARC[R]-III,UltraSPARC[R]-III+, UltraSPARC[R] IIIi, and UltraSPARC[R] IV Systems cpu error messages

Previously Published As 47291

Attachments

This solution has no attachment