Asset ID: |
1-79-1505297.1 |
Update Date: | 2016-02-16 |
Keywords: | |
Solution Type
Predictive Self-Healing Sure
Solution
1505297.1
:
Sun 7000 Unified Storage System: ASR Service Processor (SP) Alarm Verification
Related Items |
- Sun ZFS Storage 7420
- Sun Storage 7110 Unified Storage System
- Oracle ZFS Storage ZS3-2
- Sun Storage 7210 Unified Storage System
- Oracle ZFS Storage ZS4-4
- Sun Storage 7410 Unified Storage System
- Sun Storage 7310 Unified Storage System
- Sun ZFS Storage 7120
- Oracle ZFS Storage ZS3-4
- Sun ZFS Storage 7320
- Oracle ZFS Storage Appliance Racked System ZS4-4
- Oracle ZFS Storage ZS3-BA
|
Related Categories |
- PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
|
In this Document
Applies to:
Sun ZFS Storage 7320 - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)
Purpose
This article describes activity required by a System Administrator to verify whether a Service Processor event is transient (ignorable) or actionable.
If actionable, instructions will be provided as to how to proceed.
Details
Service Processor firmware version below 2.0.2.16 can leak memory eventually resulting in a variety of issues as listed below for 7110, 7310 and 7410.
- Cannot connect to Service Processor via serial or network
- Service Processor absent from hardware details page in BUI
- Alert: Service Processor has stopped responding to requests
- Directories, such as /SYS, missing from SP interface.
- Fans in server node running continuously at full speed
- Slow throughput to system disks (due to fan vibration)
- Time out during software upgrade (due to system disks/fan vibration)
Same applies to 7210 for Service Processor firmware version below 2.0.2.15.
Alert Example:
---------------------------------------------------------------------------------------------------
SUNW-MSG-ID: AK-8000-86, TYPE: Defect, VER: 1, SEVERITY: Major
EVENT-TIME: Tue Aug 4 10:32:13 2009
PLATFORM: i86pc, CSN: 0810QAS002, HOSTNAME: XXXX
SOURCE: appliance/kit/akd:default, REV: 1.0
EVENT-ID: 8b942adb-4213-4cf5-df69-d567f6ecab1b
DESC: The service processor needs to be reset to ensure proper functioning.
AUTO-RESPONSE: None.
IMPACT: Service Processor-controlled functionality, including LEDs, fault management, and the serial console, may not work correctly.
REC-ACTION: Click the initiate repair button.
---------------------------------------------------------------------------------------------------
There are a number of CRs for memory leaks on the Service Processor. Over time memory becomes depleted and the Service Processor becomes unresponsive and/or hangs.
When present, the issues surface somewhere between 30 and 60 days of uptime. There is some variation in the time between failures, their severity, and even whether or not they occur at a particular site. The reasons for these variations are not known at this time.
The appliance software, as of version 2009.Q3, has a mechanism to reset the Service Processor every 60 days, or sooner if it becomes unresponsive. This is sufficient to prevent the issues on the majority of systems.
For systems that experience the problems described above, use the following procedure:
First, ensure the Service Processor is responding. This is best done by resetting the Service Processor. Use one of the following two methods:
* Enter
BUI : https://hostname:215/#maintenance/hardware -- click the NAS head image on the left -- click SP (line listing: Disk Slot CPU DIMM Fan PSU SP)
(right of Service Processor are two arrows forming a circle click-it to reset SP)
CLI:> maintenance hardware select chassis-000 select sp reset
* If you have an alert by clicking on the repair.
BUI : https://hostname:215/#maintenance/problems -- click the 'Marked Repaired' button
CLI:> maintenance problems ls
CLI:> maintenance problems select problem-### markedrepaired
* Connect RJ-45 to SER MGT port and be able to remotely access the Service Processor (SP) prompt and reset SP - useful for remote support should hardware failures make accessing the BUI and CLI an issue
https://sp-ip-addess -- login as user 'root' and your BUI 'root password' --- click 'Maintenance' tab and 'Reset SP' tab and click the 'Reset SP' button
* Enter shell (implemented by Oracle NAS TSC only)
maintenance hardware select chassis-000 select sp reset
at the appliance kit shell.
This process takes some time, on the order of five minutes. (If you are hit by high speed fan issue due to the SP memory leak issue then the main external indication that the reset has completed is that the fans spin down to a normal speed). You can also monitor progress for any of these operations via a serial connection to the SP.
Next, verify that the Service Processor has been reset, via the Alert Log. You should see that the service processor either stopped, then resumed responding to requests, or simply resumed, in the case of a Service Processor that was previously unresponsive.
PLEASE NOTE: Procedures are now available for remote upgrade of the SP BIOS/ILOM firmware.
Please engage a Support Engineer to permit a remote upgrade of the SP/BIOS.
Engaging a Support Engineer
1) Please update your SR if the proposed solution did not fix the issue, a support engineer will then be assigned to assist.
OR
2) You may also phone in to have the SR picked up by the next available engineer
Without further updates we will close this SR in 14 days.
Attachments
This solution has no attachment