Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-2390985.1
Update Date:2018-04-25
Keywords:

Solution Type  Sun Alert Sure

Solution  2390985.1 :   SPARC T8 and M8 Servers May Fail to Record FMA Error Telemetry and Prevent Fault Diagnosis  


Related Items
  • SPARC T8-1
  •  
  • SPARC T8-4
  •  
  • SPARC M8-8
  •  
  • Sun Software - Generic
  •  
  • SPARC T8-2
  •  
  • Sun Hardware - Generic
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
History


Applies to:

SPARC T8-1
SPARC T8-2
SPARC T8-4
SPARC M8-8
Sun Hardware - Generic
Information in this document applies to any platform.
SPARC
_____________________________________________



Date of Resolved Release: 25-Apr-2018
_____________________________________________

Description

SPARC T8 and M8 servers with firmware version 9.8.5.a or earlier may fail to record FMA error telemetry and prevent fault diagnosis.

Occurrence

This issue can occur on the following platforms:

SPARC Platform

  • SPARC T8 and M8 servers with firmware version 9.8.5.a or earlier

Note: No other platforms are affected by this issue.

To determine the firmware version installed on the system, use the following ILOM command:

      -> show /HOST sysfw_version

      /HOST
      Properties:
      sysfw_version = Sun System Firmware 9.8.5.a 2018/03/29 18:19

Symptoms

FMA telemetry collection and fault diagnosis will fail to occur if the system encounters a fatal error. The HOST will restart, but no hardware will be identified as faulty or taken out of service. Fatal errors are caused by an internal CPU fault or serious failure in the links used to connect CPUs, memory or IO.

A fatal error has occurred when a NOTICE similar to the following message appears in the HOST console log:

      2018-03-17 14:06:24 SP> NOTICE: Fatal error occurred. Collecting diagnostic information.
      2018-03-17 14:08:26 0:00:0> NOTICE:

          Fatal handler Starting.

      2018-03-17 14:08:28 0:00:0> NOTICE:

          Fatal handler finished.

A fatal error can also be recognized by the following example event log message:

      -> show /SP/logs/event/list
      ...
      3656 Sat Mar 17 14:08:13 2018 HOST Log critical
      HOST0: cpu state data has been gathered
      ...

Workaround

To work around this issue, disable HOST state capture by using the following ILOM command as an administrative user:

For T8 servers:

      -> set /HOST state_capture_on_error=disabled

For each physical domain on M8 servers:

      -> set /Servers/PDomains/PDomain_n/HOST state_capture_on_error=disabled

This workaround will take effect immediately. Neither a domain reboot nor HOST power cycle are required.

Note: Though fatal errors are rare, given the serviceability impact, customers are encouraged to implement the workaround or upgrade to the latest released firmware as soon as practical.

Resolution

This issue is addressed on the following platforms:

SPARC Platform

  • SPARC M8 and T8 Servers with firmware version 9.8.5.b or later

Once the updated firmware is installed, restore the state capture property to its default value (state_capture_on_error=enabled).

History

25-Apr-2018: Document released, status: Resolved

Questions regarding this document must be submitted to
sunalertpublication_us_grp@oracle.com and copy the
contributor/responsible engineer listed below.

Internal Contributor/Submitter: david.lafko@oracle.com
Internal Eng Responsible Engineer: jack.hayward@oracle.com
Oracle Knowledge Analyst: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems Server OS
Internal Resolution Patches: 9.8.5.b

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback