Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1991693.1
Update Date:2016-12-01
Keywords:

Solution Type  Technical Instruction Sure

Solution  1991693.1 :   FS System: How to verify the temperature and the health of the components of a Disk Enclosure  


Related Items
  • Oracle FS1-2 Flash Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Flash Storage>SN-EStor: FSx
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Unable to run the fscli command as administrator

Applies to:

Oracle FS1-2 Flash Storage System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

The following document explains what information to look for in order to verify the temperature and health of the components in a Disk Enclosure (DE2-24P and DE2-24C). It should be followed when the following symptoms are present:

  • There is a fault LED on a Disk Enclosure.
  • No error is reported on the Flash Storage GUI.
  • One of the following events has been reported:
    • CM_EVT_BRICK_THERMAL_STATUS_CONSERVATIVE
    • CM_EVT_BRICK_THERMAL_STATUS_POWERED_DOWN
    • CM_EVT_BRICK_THERMAL_STATUS_NORMAL

 

Solution

The first step is to collect the IOM Dumps of the Enclosure that needs to be investigated.

  • Login with the user "pillar": 
# fscli login -u pillar -oracleFs <FS1 IP address>
  • Collect the IOM dumps (one per IOM):
# fscli enclosure -download -enclosure <enclosure id> -iom [0 | 1] -ddump <target filename> -o text

Examples:

# ./fscli enclosure -download -enclosure /ENCLOSURE-09 -iom 0 -ddump dump_iom_0.txt -o text
Command Succeeded
# ./fscli enclosure -download -enclosure /ENCLOSURE-09 -iom 1 -ddump dump_iom_1.txt -o text
Command Succeeded
# 


Note: if your array has the release below 060105-033400, you may encounter the error "UNSATISFIED_REQUEST_PMI_COMMUNICATION_ERROR" with one of the IOMs.

 

For more details, please follow Document 1954866.1 FS System: How to Collect the System-Wide Diagnostic Dump From Drive Enclosure (DE2-24C or DE2-24P) IO Module (IOM).

All the examples in this document are from a healthy enclosure. Values should be compared against a Disk Enclosure reporting a fault.

Open the Dump file with a text editor.

Search for the following keyword: ddump_drvmgr

The following output provides details about the Drives state on the enclosure.

----------------------------------------------------------------------
ddump_drvmgr
Diagnostic dump for the Drive Manager service
**** Drive Manager diagnostic dump ****
 HA mode: slave                                     <-- this indicates that the IOM is not the Master
 Drives spinning up: unknown (this is the slave)
 Drive bays: 24
 Drive Index Base: 0
 Allowed drives: SAS OR SATA
 Drive power control: supported
 Enclosure power loss: no
 Pending power loss update: no

BEGIN RSync ddump for "DrvMgr":
  Device role:                        SLAVE
  Instance run state:                 RUNNING
  Sync to Slave status:               Completed
  This instance's next UID will be:   0x10000086 (slot=1 val=134)
  Total expanded transactions:        0x0 (0)
  Transaction pool capacity:          0x40 (64)
  Transaction pool free count:        0x40 (64)
  Num concurrent ACKS:                0x48 (72)
  WI Store info - UIDs of stored transactions:
...

**** Drive Bay 0 status ****                        <-- details about the 1st drive of the Enclosure
 present       : yes
 SES_info_bit  : not set
 RAID_info_byte: 0x0
 spin up time  : 0+00:00:26.641
 drive_type    : SAS
 WWN           : 5000CCA0227B3BFE
 faults        : none
 fault LED     : OFF
 array LED     : OFF
 inject        : NONE
 pending       : ONLINE                             <-- there is no fault and the drive is Online
 current       : ONLINE
 SlotA
  bypass       : 0x00
 SlotB
  bypass       : 0x00
 force off     : no

**** Drive Bay 1 status ****                        <-- next drive
 present       : yes
 SES_info_bit  : not set
 RAID_info_byte: 0x0

 

Use the following keyword: ddump_envctrl

This output provides health and temperature details for the other components of the Disk Enclosure with a hardware health summary at the end.

----------------------------------------------------------------------
ddump_envctrl
Diagnostic dump for the Environmental Control service
BEGIN RSync ddump for "env_control":
  Device role:                        SLAVE
  Instance run state:                 RUNNING
  Sync to Slave status:               Completed
  This instance's next UID will be:   0x10177065 (slot=1 val=1536101)
  Total expanded transactions:        0x0 (0)
  Transaction pool capacity:          0x10 (16)
  Transaction pool free count:        0x10 (16)
  Num concurrent ACKS:                0x18 (24)
  WI Store info - UIDs of stored transactions:
   -                      Not stored:
   -                           ERROR:
   -        Syncing M->S (new trans):
   -            Pending ack to slave:
   -          Awaiting worker thread:
   -    In pfnMaster_PerformAction():
   -     Awaiting M_ActionComplete():
   -       Syncing M->S (completion):
   - Retry Syncing M->S (completion):
   -        Syncing S->M (new trans):
   -   In pfnSlave_ActionCompleted():
   -    In RSync_SendTransToClient():
END RSync ddump for "env_control"


max num zones: 8

zone 0
  name                              : Ambient          <-- part of the Midplane
  location                          : Mp0:0
  currentTemperature                : 21.417
  faultStates.generatedFault        : 0
  faultStates.detectedFault         : 0
  faultStates.generatedPredictedFail: 0
  faultStates.detectedPredictedFail : 0
  faultStates.elementSpecificFaults : 0x0
  defaultCriticalColdTemperature    : 3
  defaultCriticalHotTemperature     : 42
  modifiedWarningColdTemperature    : 5
  modifiedNormalTemperature         : 20
  modifiedWarningHotTemperature     : 40

zone 1
  name                              : Midplane
  location                          : Mp0:1
  currentTemperature                : 30.750
  faultStates.generatedFault        : 0
  faultStates.detectedFault         : 0
  faultStates.generatedPredictedFail: 0
  faultStates.detectedPredictedFail : 0
  faultStates.elementSpecificFaults : 0x0
  defaultCriticalColdTemperature    : 5
  defaultCriticalHotTemperature     : 55
  modifiedWarningColdTemperature    : 10
  modifiedNormalTemperature         : 45
  modifiedWarningHotTemperature     : 50

zone 2
  name                              : PCM 0 inlet             <-- Power Supply 0
  location                          : PCM0:0
  currentTemperature                : 28.984
  faultStates.generatedFault        : 0
  faultStates.detectedFault         : 0
  faultStates.generatedPredictedFail: 0
  faultStates.detectedPredictedFail : 0
  faultStates.elementSpecificFaults : 0x0
  defaultCriticalColdTemperature    : 5
  defaultCriticalHotTemperature     : 55
  modifiedWarningColdTemperature    : 10
  modifiedNormalTemperature         : 45
  modifiedWarningHotTemperature     : 50
...
zone 6
  name                              : SBB Canister 0          <-- IOM 0
  location                          : SBB0:0
  currentTemperature                : 39.312
  faultStates.generatedFault        : 0
  faultStates.detectedFault         : 0
  faultStates.generatedPredictedFail: 0
  faultStates.detectedPredictedFail : 0
  faultStates.elementSpecificFaults : 0x0
  defaultCriticalColdTemperature    : 5
  defaultCriticalHotTemperature     : 62
  modifiedWarningColdTemperature    : 10
  modifiedNormalTemperature         : 52
  modifiedWarningHotTemperature     : 57
...
max num fans: 4

fan 0
  name                              : PCM 0 Fan 0             <-- Fan 0 in Power Supply 0
  currentFanSpeedRPM                : 3525
  currentFanSpeedLevel              : 1
  faultStates.generatedFault        : 0
  faultStates.detectedFault         : 0
  faultStates.generatedPredictedFail: 0
  faultStates.detectedPredictedFail : 0
  faultStates.elementSpecificFaults : 0x0
...
Summary:
--------

PCM 0 zones   : OK
PCM 0 fans    : OK

PCM 1 zones   : OK
PCM 1 fans    : OK

overall config: OK
overall zones : OK
overall fans  : OK

lastFanSpeedPID: 0
extFanCtrl: DISABLED
CurrentFanSpeedOverride: INVALID

enableCoolingBoost   : FALSE

 

Also check for the following keyword: envctrl_zone

This is a useful temperature summary of the previous output with a comparison of the current temperature against the Warning Hot Temperature threshold.

----------------------------------------------------------------------
envctrl_zone
Environmental Control temperature zones
Zone    Card    Name                Location    Temperature           Threshold State
0       Common  Ambient             Mp0:0         21.417                40      OK
1       Common  Midplane            Mp0:1         30.750                50      OK
2       Common  PCM 0 inlet         PCM0:0        29.109                50      OK
3       Common  PCM 0 hotspot       PCM0:1        36.984                65      OK
4       Common  PCM 1 inlet         PCM1:0        29.109                50      OK
5       Common  PCM 1 hotspot       PCM1:1        37.734                65      OK
6       Local   SBB Canister 0      SBB0:0        39.312                57      OK
7       Remote  SBB Canister 1      SBB1:0        44.062                57      OK

 

Please note that the logical state of the drives also need to be verified at the RAID level (scsi> chk from the RAID console), please see Document 1991213.1 FS System: How to SSH to Drive Enclosure RAID Console for more details. Also compare the physical state reported by the IOM Dump and the state on the RAID console (see PhySt and LogSt colons).

Finally, check the Dump of the other IOM generated at the beginning of the procedure and compare the results.

If you have any doubts on the output, please open a Service Request and attach the IOM Dumps of any Drive Enclosure reporting an error.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback