![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 1991693.1 : FS System: How to verify the temperature and the health of the components of a Disk Enclosure
In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:Oracle FS1-2 Flash Storage System - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform. GoalThe following document explains what information to look for in order to verify the temperature and health of the components in a Disk Enclosure (DE2-24P and DE2-24C). It should be followed when the following symptoms are present:
SolutionThe first step is to collect the IOM Dumps of the Enclosure that needs to be investigated.
# fscli login -u pillar -oracleFs <FS1 IP address>
# fscli enclosure -download -enclosure <enclosure id> -iom [0 | 1] -ddump <target filename> -o text
Examples: # ./fscli enclosure -download -enclosure /ENCLOSURE-09 -iom 0 -ddump dump_iom_0.txt -o text
Command Succeeded # ./fscli enclosure -download -enclosure /ENCLOSURE-09 -iom 1 -ddump dump_iom_1.txt -o text Command Succeeded # Note: if your array has the release below 060105-033400, you may encounter the error "UNSATISFIED_REQUEST_PMI_COMMUNICATION_ERROR" with one of the IOMs.
For more details, please follow Document 1954866.1 FS System: How to Collect the System-Wide Diagnostic Dump From Drive Enclosure (DE2-24C or DE2-24P) IO Module (IOM). All the examples in this document are from a healthy enclosure. Values should be compared against a Disk Enclosure reporting a fault. Open the Dump file with a text editor. Search for the following keyword: ddump_drvmgr The following output provides details about the Drives state on the enclosure. ----------------------------------------------------------------------
ddump_drvmgr Diagnostic dump for the Drive Manager service **** Drive Manager diagnostic dump **** HA mode: slave <-- this indicates that the IOM is not the Master Drives spinning up: unknown (this is the slave) Drive bays: 24 Drive Index Base: 0 Allowed drives: SAS OR SATA Drive power control: supported Enclosure power loss: no Pending power loss update: no BEGIN RSync ddump for "DrvMgr": Device role: SLAVE Instance run state: RUNNING Sync to Slave status: Completed This instance's next UID will be: 0x10000086 (slot=1 val=134) Total expanded transactions: 0x0 (0) Transaction pool capacity: 0x40 (64) Transaction pool free count: 0x40 (64) Num concurrent ACKS: 0x48 (72) WI Store info - UIDs of stored transactions: ... **** Drive Bay 0 status **** <-- details about the 1st drive of the Enclosure present : yes SES_info_bit : not set RAID_info_byte: 0x0 spin up time : 0+00:00:26.641 drive_type : SAS WWN : 5000CCA0227B3BFE faults : none fault LED : OFF array LED : OFF inject : NONE pending : ONLINE <-- there is no fault and the drive is Online current : ONLINE SlotA bypass : 0x00 SlotB bypass : 0x00 force off : no **** Drive Bay 1 status **** <-- next drive present : yes SES_info_bit : not set RAID_info_byte: 0x0
Use the following keyword: ddump_envctrl This output provides health and temperature details for the other components of the Disk Enclosure with a hardware health summary at the end. ----------------------------------------------------------------------
ddump_envctrl Diagnostic dump for the Environmental Control service BEGIN RSync ddump for "env_control": Device role: SLAVE Instance run state: RUNNING Sync to Slave status: Completed This instance's next UID will be: 0x10177065 (slot=1 val=1536101) Total expanded transactions: 0x0 (0) Transaction pool capacity: 0x10 (16) Transaction pool free count: 0x10 (16) Num concurrent ACKS: 0x18 (24) WI Store info - UIDs of stored transactions: - Not stored: - ERROR: - Syncing M->S (new trans): - Pending ack to slave: - Awaiting worker thread: - In pfnMaster_PerformAction(): - Awaiting M_ActionComplete(): - Syncing M->S (completion): - Retry Syncing M->S (completion): - Syncing S->M (new trans): - In pfnSlave_ActionCompleted(): - In RSync_SendTransToClient(): END RSync ddump for "env_control" max num zones: 8 zone 0 name : Ambient <-- part of the Midplane location : Mp0:0 currentTemperature : 21.417 faultStates.generatedFault : 0 faultStates.detectedFault : 0 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x0 defaultCriticalColdTemperature : 3 defaultCriticalHotTemperature : 42 modifiedWarningColdTemperature : 5 modifiedNormalTemperature : 20 modifiedWarningHotTemperature : 40 zone 1 name : Midplane location : Mp0:1 currentTemperature : 30.750 faultStates.generatedFault : 0 faultStates.detectedFault : 0 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x0 defaultCriticalColdTemperature : 5 defaultCriticalHotTemperature : 55 modifiedWarningColdTemperature : 10 modifiedNormalTemperature : 45 modifiedWarningHotTemperature : 50 zone 2 name : PCM 0 inlet <-- Power Supply 0 location : PCM0:0 currentTemperature : 28.984 faultStates.generatedFault : 0 faultStates.detectedFault : 0 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x0 defaultCriticalColdTemperature : 5 defaultCriticalHotTemperature : 55 modifiedWarningColdTemperature : 10 modifiedNormalTemperature : 45 modifiedWarningHotTemperature : 50 ... zone 6 name : SBB Canister 0 <-- IOM 0 location : SBB0:0 currentTemperature : 39.312 faultStates.generatedFault : 0 faultStates.detectedFault : 0 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x0 defaultCriticalColdTemperature : 5 defaultCriticalHotTemperature : 62 modifiedWarningColdTemperature : 10 modifiedNormalTemperature : 52 modifiedWarningHotTemperature : 57 ... max num fans: 4 fan 0 name : PCM 0 Fan 0 <-- Fan 0 in Power Supply 0 currentFanSpeedRPM : 3525 currentFanSpeedLevel : 1 faultStates.generatedFault : 0 faultStates.detectedFault : 0 faultStates.generatedPredictedFail: 0 faultStates.detectedPredictedFail : 0 faultStates.elementSpecificFaults : 0x0 ... Summary: -------- PCM 0 zones : OK PCM 0 fans : OK PCM 1 zones : OK PCM 1 fans : OK overall config: OK overall zones : OK overall fans : OK lastFanSpeedPID: 0 extFanCtrl: DISABLED CurrentFanSpeedOverride: INVALID enableCoolingBoost : FALSE
Also check for the following keyword: envctrl_zone This is a useful temperature summary of the previous output with a comparison of the current temperature against the Warning Hot Temperature threshold. ----------------------------------------------------------------------
envctrl_zone Environmental Control temperature zones Zone Card Name Location Temperature Threshold State 0 Common Ambient Mp0:0 21.417 40 OK 1 Common Midplane Mp0:1 30.750 50 OK 2 Common PCM 0 inlet PCM0:0 29.109 50 OK 3 Common PCM 0 hotspot PCM0:1 36.984 65 OK 4 Common PCM 1 inlet PCM1:0 29.109 50 OK 5 Common PCM 1 hotspot PCM1:1 37.734 65 OK 6 Local SBB Canister 0 SBB0:0 39.312 57 OK 7 Remote SBB Canister 1 SBB1:0 44.062 57 OK
Please note that the logical state of the drives also need to be verified at the RAID level (scsi> chk from the RAID console), please see Document 1991213.1 FS System: How to SSH to Drive Enclosure RAID Console for more details. Also compare the physical state reported by the IOM Dump and the state on the RAID console (see PhySt and LogSt colons). Finally, check the Dump of the other IOM generated at the beginning of the procedure and compare the results. If you have any doubts on the output, please open a Service Request and attach the IOM Dumps of any Drive Enclosure reporting an error. Attachments This solution has no attachment |
||||||||||||||||
|