Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1919457.1
Update Date:2017-08-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1919457.1 :   "Energy Storage Module Is Approaching or has Reached End-of-life" ILOM error message.  


Related Items
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Sun Fire X4170 M2 Server
  •  
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Sun Fire X4270 M2 Server
  •  
  • Exadata Database Machine X2-8
  •  
  • Sun Fire X4470 Server
  •  
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Sun Fire X4540 Server
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • SPARC SuperCluster T4-4
  •  
  • Exadata Database Machine V2
  •  
  • Sun Server X2-4
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Engineered Systems HW>SN-x86: Exadata ASR
  •  




In this Document
Symptoms
Cause
Solution
References


Applies to:

Sun Fire X4170 M2 Server - Version All Versions and later
Sun Fire X4540 Server - Version All Versions and later
Sun Fire X4470 Server - Version All Versions and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Exadata Database Machine X2-8 - Version All Versions and later
x86

Symptoms

The Energy Storage Module (ESM) on a Sun Flash Accelerator F20 PCIe Card is approaching, or has reached an end-of-lifetime threshold.
This is a software implemented counter of the power-on hours of the card to identify when it is has reached its lifetime expiration, however, it has no reading of the actual ESM life remaining, and whether it is functional or not.

Each flash card contains an ESM that ensures all flash write operations complete when power to the server is turned off.
The ESM is only charging and operational when it is powered on, and it has a service life specification that includes a threshold limit on how many hours it should be powered on for before being considered to have degraded functionality.
ILOM tracks the power-on hours for the ESM on each flash card and reports it in the "UPTIME" property for the Flash card. Once the ESM reaches has been powered on beyond its service life specification threshold limit, it should be replaced.

Note: Due to a known issue with an SP reset, the system may receive SPX86-8002-RY and SPX86-8002-S3 events at the same time.

Cause

The cause for this event is most likely due to a known issue with the "power-on hours" thresholds in the current system firmware version.

Engineered Systems:

For Exadata X2-2 & X2-8 Storage Server nodes with Sun Flash Accelerator F20 PCIe Cards, ILOM 3.1.2.20.c and earlier versions the threshold is incorrectly set.

With V2 nodes, ILOM 3.0.9.19.a and earlier versions the threshold is incorrectly set.


Stand-Alone Systems:

For Servers that are not part of an Engineered System, please ensure the system has the latest firmware installed.

Solution

RESOLUTION:
-------------------

Engineered Systems:

Solution to resolve incorrect uptime firmware threshold:

For X2-2/X2-8 - Update to ILOM 3.1.2.20.e or later, contained in OS Image 12.1.2.1.0 or later.
"End-of-life" threshold is 4 years.

For V2 nodes - Update to ILOM 3.0.9.19.c or later, contained in OS Image 11.2.2.3.2 or later.
"End-of-life" threshold is 3 years.

Note: ESM's on Engineered Systems will be replaced pro-actively by Oracle, via a Preventive Maintenance SR when the system is approaching the supported lifetime.

Please see "Exadata Database Machine and Exadata Storage Server Supported Versions (Doc ID 888828.1)" for version information.
Please contact the Exadata Enterprise Support Team via Oracle Support for any questions regarding OS Image updates.

Note: This solution applies to the PCI Express Flash Accelerator F20 card only, there is no firmware solution available for the PCI Express Flash Accelerator F20 M2 card at this time.

Stand-Alone Systems:

Please ensure the latest firmware is installed on the system.
See Where to Download a Firmware Patch & Upgrade BIOS / UEFI / SP for Oracle x86 Systems (Doc ID 1485873.1)
If your system firmware is at the latest release, and the ESM modules have not been replaced within the last three years, please update your Support Request with this information.

 

WORKAROUND:
----------------------

ILOM:

Connect to the ILOM CLI as root, and set the clear_fault_action=true for any faulted Flash Accelerator F20 PCI card.

Example:

-> show -d properties -level all /SYS/MB fault_state==Faulted
 
 /SYS/MB/RISER1/PCIE1/F20CARD

    Properties:
        type = F20 Card
        ipmi_name = PCIE1/F20CARD
        fru_name = ASY,BD,PCIE_FLASH,AURA1
        fru_part_number = 511-1500-02
        fru_serial_number =
        fru_extra_1 = 50 Aura1
        fault_state = Faulted
        clear_fault_action = (none)

-> set /SYS/MB/RISER1/PCIE1/F20CARD/ clear_fault_action=true
Are you sure you want to clear /SYS/MB/RISER1/PCIE1/F20CARD (y/n)?

Verify the uptime value is reset to 0

-> show /SYS/MB/RISER1/PCIE1/F20CARD/UPTIME/

The value line in the output should show 0 Hours.

Repeat for the other three flash cards if faulted.

-> set /SYS/MB/RISER1/PCIE4/F20CARD/ clear_fault_action=true
-> set /SYS/MB/RISER2/PCIE2/F20CARD/ clear_fault_action=true
-> set /SYS/MB/RISER2/PCIE5/F20CARD/ clear_fault_action=true

Note: ILOM versions 3.0.9.19.a and earlier do not report values for PCIE4.


O/S ipmitool:

Connect to the node OS as root, and set the clear_fault_action=true for any faulted Flash Accelerator F20 PCI card.

Example:

# ipmitool sunoem cli "set /SYS/MB/RISER1/PCIE1/F20CARD/ clear_fault_action=true" "y"

Connected. Use ^D to exit.
-> set /SYS/MB/RISER1/PCIE1/F20CARD/ clear_fault_action=true
Are you sure you want to clear /SYS/MB/RISER1/PCIE1/F20CARD (y/n)? y
Set 'clear_fault_action' to 'true'

-> Session closed
Disconnected

If all 4 cards are faulted in the node, you may choose to clear them all at once.

# for RISER in RISER1/PCIE1 RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "set /SYS/MB/$RISER/F20CARD/ clear_fault_action=true" "y"; done

References

<NOTE:1306791.2> - Information Center: Oracle Exadata Database Machine
<NOTE:888828.1> - Exadata Database Machine and Exadata Storage Server Supported Versions
<NOTE:1180143.1> - SPX86-8002-S3 - Energy Storage Module has exceeded end-of-life.
<NOTE:1179934.1> - SPX86-8002-RY - Energy Storage Module is approaching end-of-life.
<NOTE:1505691.1> - Unable To Reset Flash F20 Accelerator ESM UPTIME
<NOTE:1485873.1> - Where to Download a Firmware Patch & Upgrade BIOS / UEFI / SP for Oracle x86 Systems
Integrated Lights Out Manager (ILOM) 2.0: http://docs.oracle.com/cd/E19720-01/
Oracle Integrated Lights Out Manager (ILOM) 3.0 Documentation: http://docs.oracle.com/cd/E19860-01/
Oracle Integrated Lights Out Manager (ILOM) 3.1 Documentation: http://docs.oracle.com/cd/E24707_01/
Oracle Integrated Lights Out Manager (ILOM) 3.2 Documentation: https://docs.oracle.com/cd/E37444_01

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback