Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1489176.1
Update Date:2017-11-27
Keywords:

Solution Type  Sun Alert Sure

Solution  1489176.1 :   Sun SPARC T4-x Systems May Experience Memory and Power Faults Which can be Prevented by Upgrading to System Firmware 8.2.1.b (or later)  


Related Items
  • Netra SPARC T4-1 Server
  •  
  • SPARC T4-2
  •  
  • Sun Software - Generic
  •  
  • SPARC T4-1
  •  
  • SPARC T4-1B
  •  
  • Netra SPARC T4-2 Server
  •  
  • SPARC T4-4
  •  
  • Sun Hardware - Generic
  •  
  • Netra SPARC T4-1B
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

SPARC T4-2
SPARC T4-4
Netra SPARC T4-1 Server
Netra SPARC T4-2 Server
SPARC T4-1
SPARC
_________________________________

BUG:15726304

Date of Resolved Release: 06-Sep-2012
_________________________________

Description

Sun SPARC T4-x systems without system firmware 8.2.1.b (or later) may experience memory and power faults, or prompt unnecessary hardware replacement, which can be prevented by upgrading to System Firmware 8.2.1.b.

Note: There are a number of CRs associated with this issue - please see "Symptoms" for complete details.

Occurrence

These issues can occur on the following platforms:

SPARC Platform

  • Sun SPARC T4-1, T4-1B, T4-2, T4-4 and Netra T4-1, T4-1B and T4-2 systems without firmware 8.2.1.b (or later)

Notes:

    1. Memory fault issues may appear on Sun SPARC T4-1, T4-1B, T4-2, T4-4 and Netra T4-1, T4-1B, and T4-2 servers. Power fault issues may appear on T4-2 and T4-4 servers.

    2. No other systems are affected by this issue.

    3. This issue does not exist for the x86 platform.

To determine the firmware version on one of these systems, use one of the following methods:

  A) Log into  the Service Processor and run:

-> show /HOST sysfw_version

    /HOST
     Properties:
        sysfw_version = Sun System Firmware 8.2.0.f 2012/07/09 22:11

  B) From Solaris:

    # prtdiag -v | grep Firmware
    Sun System Firmware 8.2.0.a 2012/05/11 07:34

Symptoms

Symptoms for these issues will vary depending on the Bug/CR and system affected, as in the following examples:

A. Memory faults

FMA fault.component.disabled messages with DIMM(s) disabled or MCU disabled with MB FRU faulted. Failure signature(s) seen in hostconsole log are cited below against each CR.

CR 7062523:
    0:0:0>Setup POST Mailbox ....Done
    0:0:0>Decode of Disrupting Error Status Reg (DESR HW Corrected)  bits 00000000.00040000
    0:0:0>Decode of NCU Error Status Reg bits 00000000.10000000
    0:0:0>        1    NESR_MCU0SRE:     MCU0 issued a Software Recoverable Error Request
    0:0:0>Decode of Mem Error Status Reg Branch 0 bits 02040000.00000000
    0:0:0>        1      VEU 57     R/W1C Set to 1 on an UE, if VEF = 0 and no fatal error is detected in same cycle.
    0:0:0>        1      DAU 50     R/W1C Set to 1 if the error was a DRAM access UE.
    0:0:0>        DRAM Error Address Reg for Branch 0 = 00000000.11581100
    0:0:0>            Physical Address is 00000000.00410000

CR 7177943:
    2012-06-02 06:29:11.277 1:0:0>ERROR: TEST = Map to VA-ALL TSB
    2012-06-02 06:29:11.389 1:0:0>H/W under test = /SYS/PM0/CMP1/BOB1/CH1/D1 (J7101)
    2012-06-02 06:29:11.536 1:0:0>Repair Instructions: Replace items in order listed by 'H/W under test' above.
    2012-06-02 06:29:11.725 1:0:0>MSG = END_ERROR

CR 7185320:
    [CPU 1:0:0] ERROR:   MCU0.BoB1.Ch1.D0: Failed to set clock delay
    [CPU 1:0:0] ERROR:   set_clk_delay failed for MCU0, BoB1, Ch1, DIMM0
    [CPU 1:0:0] ERROR:   command_clk_training failed for MCU0
    [CPU 1:0:0] ERROR:   Calibrate DRAM interface failed for MCU0
    [CPU 1:0:0] ERROR:   MCU0: DRAM init failed
    [CPU 1:0:0] ERROR:   /SYS/PM0/CMP1/BOB1/CH1/D0 failed to initialize

CR 7177528:
    [CPU 1:0:0] ERROR:   Lane failures during DQS cleanup for MCU0
    [CPU 1:0:0] ERROR:   train_ddr_channels failed for MCU0
    [CPU 1:0:0] ERROR:   Calibrate DRAM interface failed for MCU0
    [CPU 1:0:0] ERROR:   MCU0: DRAM init failed

CR 7177481:
    2012-06-13 12:35:15.074 0:0:0>ERROR: TEST = Test Mailbox region
    2012-06-13 12:35:15.260 0:0:0>H/W under test = /SYS/PM0/CMP0/BOB3/CH1/D0 (J4301)
    2012-06-13 12:35:15.496 0:0:0>Repair Instructions: Replace items in order listed by 'H/W under test' above.
    2012-06-13 12:35:15.803 0:0:0>MSG = CE in critical POST code space.
    2012-06-13 12:35:16.002 0:0:0>END_ERROR

For any memory faults seen on systems with System FW 8.2.1b or later, normal troubleshooting procedures should be followed.

B. Power faults

Power faults triggered by some Emerson A239 power supplies on T4-4 and T4-2 platforms may provide incorrect data on the I2C bus. This may lead to false fault indications for other components on that I2C bus segment. For example on T4-4, RIO/TGB are on the same I2C bus segment as the PSU. The updated firmware filters out the incorrect data.

FMA 'fault.chassis.voltage.fail', 'fault.chassis.power.fail', and 'fault.chassis.env.power.loss' messages with Power Supply Unit (PSU) MB (T4-2) or PM (T4-4), faulted most commonly though other hardware components, could also be faulted. Failure signature seen in hostconsole log is cited below.

CR 7180196:
    Sensor | minor: Voltage : /SYS/RIO/VDD_+1V8 : Lower Non-critical going high : reading 1.82 >= threshold 1.71 Volts

For power faults seen on T4-2 and T4-4 systems with Emerson PSUs, the upgrade to System FW 8.2.1b or later should be tried first. If power faults are seen on systems with system firwmare 8.2.1b or later, then normal troubleshooting procedures should be followed.

Workaround

There are no workarounds for these issues.

These issues are addressed in the following releases:

SPARC Platform

System Firmware 8.2.1.b or later, as delivered in the following patches:

  • SPARC T4-1 Server with patch 148822-03 or later
  • SPARC T4-1B Server with patch 147287-01 or later
  • SPARC T4-2 Server with patch 148823-03 or later
  • SPARC T4-4 Server with patch 148824-03 or later
  • Netra T4-1 Server with patch 148826-03 or later
  • Netra T4-1B Server with patch 148828-02 or later
  • Netra T4-2 Server with patch 148827-02 or later
Note: Firmware 8.2.2.b, which is now available, fixes additional Bugs that may be associated with this issue. It is recommended to upgrade the System Firmware to 8.2.2.b at the earliest opportunity.

Patches

<SUNPATCH:148822-03>
<SUNPATCH:148823-03>
<SUNPATCH:148824-03>
<SUNPATCH:148826-03>
<SUNPATCH:148827-02>
<SUNPATCH:148828-02>
<SUNPATCH:147287-01>

History

06-Sep-2012: Document released, issue Resolved
19-Sep-2012: Internal Maintenance update; no change in content
25-Oct-2012: Updated to include SPARC/Netra T4-1B and associated fix patches
29-Nov-2012: Updated to reference new FW release 8.2.2.b is now available
04-Dec-2012: Add "or later" to denote FW releases going forward - no other changes

There are a couple of additional error messages induced by Emerson A239 PSU that are not currently addressed by FW but will be resolved in future FW version(s). These error messages are seen in SC logs and no FMA fault is triggered so no customer or service action should be initiated:

1) Chassis | major: Hot removal of /SYS/SASBP/HDD#   

and

2) Chassis Log critical (##) /SYS/PS#/SEEPROM.FRU_PROM (#x##) Read Data Compare FAILED

Where # indicate numerical integer non-negative values

These are a result of minor corruption of the i2c bus by the Emerson A239 power supply and do not indicate a real system issue, hence should be ignored with no action taken.

---------------------------------
Note also that Emerson PSUs have also been reported as 'Astec', as in the following example:

fru_description = A239C_Power_Supply
fru_manufacturer = 10465 ASTEC INTERNATIONAL LTD SHEN ZHEN CITY CN
fru_version = 02
fru_part_number = 300-23xx

---------------------------------

Note: Additional CRs associated with this issue, as listed below, are now addressed with System Firmware 8.2.2.b:

1) 7197312 Update the Write Leveling scheme for T3/T4 platforms
2) 7197319 Downgrade DDR nClamp/pClamp ERROR to a DEBUG message on T3/T4 platforms
3) 7201943 Enable DLL Staggering on T4 platforms
4) 7201944 Revert ODT settings on T3/T4 platforms

Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
responsible engineer listed below.

Internal Contributor/Submitter: joe.carr@oracle.com
Internal Eng Responsible Engineer: joe.carr@oracle.com
Internal Services Knowledge Engineer: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems
Internal Escalation ID:
Internal Resolution Patches: 148822-03, 148823-03, 148824-03, 148826-03, 148827-02

References





SUNUBUG:7062523




Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback