Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-2130478.1
Update Date:2017-11-28
Keywords:

Solution Type  Sun Alert Sure

Solution  2130478.1 :   SPARC M5-32/M6-32 Systems With Sun System Firmware 9.5.3 or Earlier may Crash Unexpectedly While Running  


Related Items
  • SPARC M5-32
  •  
  • Sun Software - Generic
  •  
  • SPARC M6-32
  •  
  • Sun Hardware - Generic
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

SPARC M6-32
SPARC M5-32
Sun Hardware - Generic
Sun Software - Generic
SPARC
___________________________________________



Date of Resolved Release: 25-Apr-2016
___________________________________________

Description

SPARC M5-32/M6-32 Systems with Sun System Firmware 9.5.3 (or earlier) configured with "expandable=true" for any HOST, and which have fewer than 32 CPUs installed and enabled, may crash unexpectedly while running.

Occurrence

This issue can occur on the following platform:

SPARC Platform

  • SPARC M5-32/M6-32 Systems With Sun System Firmware 9.5.3 or earlier

With the following configuration:

'Expandable' property set to 'true'
Less than 32 CPUs installed and running

2.1) To determine the firmware version installed on the system, use the following ILOM command from the active SP:

      -> show /System sysfw_version
      /System
      Properties:
      sysfw_version = Sun System Firmware 9.5.3 2015/11/25 09:17

2.2) This failure will not be seen when all PDOMS (physical domains) have their expandable property set to false. To determine the setting of property expandable, from ILOM command shell do the following:

      -> show / -level 2 expandable
      /HOST0
      Properties:
      expandable = true

      /HOST1
      Properties:
      expandable = true

      /HOST2
      Properties:
      expandable = true

      /HOST3
      Properties:
      expandable = true

2.3) This failure will not occur on systems with 32 enabled CPUs. To check which CPUs are active, run the following command:

      -> show /SYS/ -level 3 type=='Host Processor' requested_config_state

The following is an example output for a machine that is half-populated:

      /SYS/CMU0/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU0/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU1/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU1/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU2/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU2/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU3/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU3/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU8/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU8/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU9/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU9/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU10/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU10/CMP1
      Properties:
      requested_config_state = Enabled

      /SYS/CMU11/CMP0
      Properties:
      requested_config_state = Enabled

      /SYS/CMU11/CMP1
      Properties:
      requested_config_state = Enabled

Symptoms

If the described issue occurs, the following will occur:

Note: The following reference may prove helpful: <Document:1309092.1> - "How to use the Oracle ILOM 3.x Fault Management Shell"

3.1) A 'Fatal' event is logged on the HOST console, as in the following example:

      2016-03-04 22:53:20 8:00:0> NOTICE:
      Fatal handler Starting.

3.2) The Event log will show a 'fault.cpu.generic-sparc.chip-uc', followed by fatal error handling events, as in the following example:

      93689 Fri Mar 4 14:56:58 2016 HOST Log critical
      HOST1: cpu state data has been gathered
      93688 Fri Mar 4 14:55:54 2016 HOST Log critical
      HOST1: Fatal polled has occurred. cpu state data is being gathered.
      93687 Fri Mar 4 14:53:24 2016 System Log minor
      Host ID 1: Error Standby
      93686 Fri Mar 4 14:53:23 2016 Fault Fault critical
      Fault detected at time = Fri Mar 4 14:53:23 2016. The suspect component: /SYS/CMU8 has
      fault.cpu.generic-sparc.chip-uc with probability=100. Refer to
      http://support.oracle.com/msg/SPSUN4V-8000-84 for details.

3.3) FMA diagnoses fault code 'SPSUN4V-8001-CH', followed shortly thereafter by 'SPSUN4V-8000-84'. The FMA diagnosis fault code can be found in the SP event log (shown above) or by checking the fault diagnosis log.

  3.3a) Example:

      ->show -t -level 4 /SP/faultmgmt/ sunw-msg-id

      Target Property Value
      ---------------------------------------------------------------------------------------------------------------
      /SP/faultmgmt/0/faults/0 sunw-msg-id SPSUN4V-8000-84

OR:

  3.3b) Example:

      -> start -script /SP/faultmgmt/shell
      faultmgmtsp> fmadm faulty
      ----------------------------------------------------------------------------------------------------------------------------------
      Time                         UUID  msgid                                          Severity
      ----------------------------------------------------------------------------------------------------------------------------------
      2016-03-04/14:53:23 726bb987-767a-eced-cf58-f6ab931bdaea SPSUN4V-8000-84 Critical

      Problem Status : open
      Diag Engine : fdd 1.0
      System
      Manufacturer : Oracle Corporation
      Name : SPARC M5-32
      Part_Number : 7045605
      Serial_Number : AKxxxxxxxx
      ----------------------------------------
      Suspect 1 of 1
      Fault class : fault.cpu.generic-sparc.chip-uc
      Certainty : 100%
      Affects : /SYS/CMU8/CMP0
      Status : faulted

      FRU
      Status : faulty
      Location : /SYS/CMU8
      Manufacturer : Celestica Holdings PTE LTD
      Name : Assy CMU
      Part_Number : 7066443
      Revision : 03
      Serial_Number : xxxxxxxxxxxxxxxxxx
      Chassis
      Manufacturer : Oracle Corporation
      Name : SPARC M5-32
      Part_Number : 7045605
      Serial_Number : AKxxxxxxxx

      Description : This chip has encountered a chip-level uncorrectable error.

      Response : The system will attempt to retire affected resources.

      Impact : System performance may be affected.

      Action : Use 'fmadm faulty' to provide a more detailed view of this
      event. Please refer to the associated reference document at
      http://support.oracle.com/msg/SPSUN4V-8000-84 for the latest
      service procedures and policies regarding this diagnosis.

OR:

  3.3c) Example:

      -> start /SP/faultmgmt/shell

      faultmgmtsp> fmdump -av
      2016-03-04/14:51:21 c9e7dcbf-03dc-c925-fcef-87cb6ec173d7 SPSUN4V-8001-CH

      fault = fault.asic.switch.systemdir@/SYS/SSB0/SA/UNIT0/BANK9/INDEX0/WAY0
      certainty = 100.0 %
      FRU = /SYS/SSB0
      ASRU = /SYS/SSB0/SA/UNIT0/BANK9/INDEX0/WAY0
      resource = /SYS/SSB0/SA/UNIT0/BANK9/INDEX0/WAY0

      2016-03-04/14:53:23 726bb987-767a-eced-cf58-f6ab931bdaea SPSUN4V-8000-84

      fault = fault.cpu.generic-sparc.chip-uc@/SYS/CMU8/CMP0
      certainty = 100.0 %
      FRU = /SYS/CMU8
      ASRU = /SYS/CMU8/CMP0
      resource = /SYS/CMU8/CMP0

Workaround

There is no recommended workaround for this issue.

Resolution

This issue is addressed on the following platforms:

SPARC Platform

  • SPARC M5-32/M6-32 Servers with Firmware version 9.5.4.b (patch 22982110) or later

Patches

 <SUNPATCH:22982110>

History

25-Apr-2016: Document released, status Resolved

Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
submitter/responsible engineer listed below.

Internal Contributor/Submitter: jack.hayward@oracle.com
Internal Eng Responsible Engineer: jack.hayward@oracle.com
Oracle Knowledge Analyst: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems RPE
Internal Associated SRs: 3-11596782351, 3-11406486111, 3-11082178501, 3-10994472609,
3-10993938711, 3-11842625801, 3-11811326471, 3-12286920931, 3-11053559031, 3-11112281651,
3-12197824261, 3-10914393761, 3-11826613041, 3-11827886891
Internal Resolution Patches: System Firmware 9.5.4

References



Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback