Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1456741.1
Update Date:2018-04-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1456741.1 :   Sun Storage 7000 Unified Storage System: Service Processor has stopped responding to requests  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Exalogic Elastic Cloud X5-2 Hardware
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-5604756781>

Applies to:

Oracle Exalogic Elastic Cloud X2-2 Hardware - Version X2 to X2 [Release X2]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Exalogic Elastic Cloud X5-2 Hardware - Version X5 to X5 [Release X5]
7000 Appliance OS (Fishworks)

Symptoms

An Exalogic or NAS 7000 Appliance may see the below alert :

SUNW-MSG-ID: AK-8001-GU, TYPE: alert, VER: 1, SEVERITY: Minor
EVENT-TIME: Wed Apr 18 05:55:05 2012
PLATFORM: i86pc, CSN: 1122334455, HOSTNAME: node01
SOURCE: svc:/appliance/kit/akd:default, REV: 1.0
EVENT-ID: 49cca6f4-3e56-c046-d441-825c88b33e2b
DESC: The service processor has stopped responding to requests.
AUTO-RESPONSE: None.
IMPACT: Features that depend on service processor functionality, including hardware inventory, LED control, and fault diagnosis, will not function correctly while the service processor is in this state.
REC-ACTION: Restart the service processor. Contact your service provider if the problem persists.

 

A few minute later after the given alert is generated the system might post another alert and SP resumes working:

"The service processor has resumed responding to requests."

Service processor resets itself to recover from this situation.

 

You will notice a sysevent as in the bundle under fm/infolog_hival.txt or by running "fmdump -l"

 Apr 18 06:13:10.0638 resource.sysevent.EC_platform.ESC_platform_sp_reset

 

Cause

The Service Processor may have become unresponsive because of known software limitations like probing SP too frequently, temp filesystem full etc.

 

Solution

As a workaround, if the Service Processor does not come back, it can be reset using one of the methods mentioned below:

  • Login to ILOM console and reset the Service Processor to recover from this situation manually.
    -> reset /SP

 

  • Login to the CLI of the Sun ZFS Storage 7000 and reset the Service Processor.
    S7000 > confirm maintenance hardware select chassis-000 select sp reset

 

  • Use the BUI to reset the Service Processor.
    Maintenance -> Hardware -> Show Details -> SP and click on recycle button at line saying Service Processor
    If you hover the mouse pointer over the recycle button it will tell what action will be executed. A similar button in a different location on the screen is used to reboot the Appliance.

 

Resetting the Service Processor will not reboot the Appliance, it just resets the Service Processor.

 

If the problem appears again, engage Oracle Support for investigation of the issue.

 

Please also check Bug: 20859787 where the issue was not because of SP memory leak but because of the customer environment.
That is, external requests such as Oracle Ops Center/ Oracle Enterprise Manager or similar can keep the Service Processor busy and the SP may not be able to respond to akd poll request.
So if the issue is still seen after reset of SP you can consider increasing the IPMI timeout parameter "ipmi_timeout" value from 5 to 30 sec. using the attached workflow in the bug.
NOTE: Running this workflow will restart akd.

 

NOTE: The SP/BIOS version is fixed for a particular Appliance Firmware Release version. Upgraded SP/BIOS firmware is only available in the context of a Appliance Firmware Release upgrade (which has first been mandated by Fishworks Engineering).  As such, the SP/BIOS version can be downrev in terms of the latest version available for the underlying hardware/server platform.

 

References

<BUG:15631318> - SUNBT6937107 SP IS NOT RESET BASED ON SP_RESET_FATAL, SP_RESET_WARN, SP_KILL_TO,
<BUG:15672172> - SUNBT6988621-X64_3.0.14 CALLISTO: SP MEMORY LEAK OBSERVED WITH X64_3.0.14 R58793
<BUG:20859787> - SERVICE PROCESSORS CONTINUE TO STOP RESPONDING TO REQUESTS FOLLOWING SP RESET
<BUG:15708995> - SUNBT7036162-AK-8 FAN TRAYS AND POWER SUPPLIES: BOGUS ADD/REMOVE ALERTS

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback