Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-2027245.1
Update Date:2017-01-03
Keywords:

Solution Type  Sun Alert Sure

Solution  2027245.1 :   SPARC T5 Series and SPARC M5-32/M6-32 Servers System Firmware Version 9.4.2.c and 9.4.2.d May Erroneously Disable CPUs  


Related Items
  • SPARC M5-32
  •  
  • SPARC T5-1B
  •  
  • Sun Software - Generic
  •  
  • Netra SPARC T5-1B Server Module
  •  
  • SPARC T5-2
  •  
  • SPARC M6-32
  •  
  • SPARC T5-4
  •  
  • SPARC T5-8
  •  
  • Sun Hardware - Generic
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
History
References


Applies to:

SPARC T5-8
SPARC T5-1B
Netra SPARC T5-1B Server Module
SPARC M5-32
SPARC M6-32
SPARC
SPARC M5-32
SPARC M6-32
______________________________________



Date of Preliminary Release: 02-Jul-2015

Date of Resolved Release: 13-Jul-2015
______________________________________

Description

For SPARC T5 Series and SPARC M5-32/M6-32 Servers system firmware, the memory retirement function supports the ability to take portions of memory DIMMs offline proactively in response to certain error conditions. With system firmware Version 9.4.2.c and 9.4.2.d, when a retirement occurs, and during subsequent boots, the firmware will also disable the perfectly functional CPUs with which that memory is associated. This can result in a loss of performance or the inability to bind resources to guest domains.

Note: Firmware 9.4.2.c and 9.4.2.d have been WITHDRAWN and are no longer available for download.

Occurrence

This issue can occur on the following platforms:

SPARC Platform

  • SPARC T5-2 Servers with Firmware version 9.4.2.c (patch 20214646)
  • SPARC T5-4/T5-8 Servers with Firmware version 9.4.2.c (patch 20214648)
  • SPARC T5-1B Servers with Firmware version 9.4.2.c (patch 20214649)
  • Netra SPARC T5-1B Servers with Firmware version 9.4.2.c (patch 20214650)
  • SPARC M5-32/M6-32 Servers with Firmware version 9.4.2.d (patch 20214652)

Note: No other systems or platforms are affected by this issue.

To determine the firmware version installed on the system, use the following ILOM command:

      -> show /HOST sysfw_version

Symptoms

Should the described issue occur, the system will report missing CPUs. This can present itself as a loss of performance, the inability to bind resources to guest domains, and most obviously, when the affected system is rebooted, the host console will show messages similar to the following:

      WARNING: CPU 29d is not available to guest

      WARNING: CPU 29e is not available to guest

      WARNING: CPU 29f is not available to guest

      WARNING: 192 CPUs in MD are not available to guest

      NOTICE: Probing PCI devices.

      ERROR: /pci@300: Invalid hypervisor argument(s). function: b4

      ERROR: /pci@300: Invalid hypervisor argument(s). function: b4

      ERROR: /pci@300: Invalid hypervisor argument(s). function: b5

      NOTICE: Finished PCI probing 

Workaround

There is no workaround for this issue.

This issue is addressed in the following releases:

SPARC Platform

  • SPARC T5-2 Servers with Firmware version 9.4.2.e (patch 21342652)
  • SPARC T5-4/T5-8 Servers with Firmware version 9.4.2.e (patch 21342653)
  • SPARC T5-1B Servers with Firmware version 9.4.2.e (patch 21342654)
  • Netra SPARC T5-1B Servers with Firmware version 9.4.2.e (patch 21342655)
  • SPARC M5-32/M6-32 Servers with Firmware version 9.4.2.e (patch 21342656)

History

02-Jul-2015: Document released, status Preliminary
13-Jul-2015: FW patches released, issue is Resolved

This regression was caused by the change for bug 20656570.

Downgrading the Firmware is NOT being advised here as it is better to install the new patches

Note that "fmadm repair <dimm>" will clear the retirement, but it is likely that the error
is persistent and the DIMM will be retired again causing the CPU to be offlined again.

It is also possible to disable the affected DIMM in ILOM. This will prevent the CMU
from being offlined, but at the cost of the DIMM.

DIMM replacement will also prevent the issue from occurring.

None of these options are being presented directly to the customer either due to the
incompleteness of the workaround or cost involved.

Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
submitter/responsible Engineer listed below.

Internal Contributor/Submitter: Grant.Gredvig@oracle.com
Internal Eng Responsible Engineer: Grant.Gredvig@oracle.com
Oracle Knowledge Analyst: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems RPE
Internal Escalation ID: 3-10906712261

References

<BUG:21299503> - REBOOT OF HOST DECONFIGURES HARDWARE WITH INVALID HV ERRORS






Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback