![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||
Solution Type Technical Instruction Sure Solution 2111921.1 : How to clear AMBER LED on Exadata Compute nodes when Compute Node disks report offline but no fault detected by Raid controller
In this Document
Created from <SR 3-12226051943> Applies to:Exadata Database Machine X2-2 Full Rack - Version All Versions and laterExadata X3-2 Hardware - Version All Versions and later Exadata X4-2 Hardware - Version All Versions and later Exadata X5-2 Hardware - Version All Versions and later Information in this document applies to any platform. How to clear AMBER LED on Exadata Compute nodes when Compute Node disks report offline but no fault detected by Raid controller GoalAll the disks in the database node have their service and OK2RM LED's turned on: Issue reported: Amber LED in the ILOM.
========================================================================================================= HDD0/SVC | ON ================== HDD0/OK2RM | ON ================== HDD1/SVC | ON ================== HDD1/OK2RM | ON ================== HDD2/SVC | ON ================== HDD2/OK2RM | ON ================== HDD3/SVC | ON ================== HDD3/OK2RM | ON ================== ========================================================================================================= However the raid controller reports all the disks are online: Slot 00 Device 08 (HITACHI H109060SESUN600GA6901516BUEMHX ) status is: Online,
Slot 01 Device 09 (HITACHI H109060SESUN600GA6901516BURE4X ) status is: Online, Slot 02 Device 11 (HITACHI H109060SESUN600GA6901516BUYJAX ) status is: Online, Slot 03 Device 10 (HITACHI H109060SESUN600GA6901516BURE8X ) status is: Online,
SolutionWork around is provided below: 1) reset the SP first to clear the service and OK2RM LED: Run the following in ILOM command line: -> reset /SP
In the ILOM snapshot output below we can see the service and OK2RM LEDs are all off after the SP is reset: ===================================================
HDD0/SVC | OFF HDD0/OK2RM | OFF HDD1/SVC | OFF HDD1/OK2RM | OFF HDD2/SVC | OFF HDD2/OK2RM | OFF HDD3/SVC | OFF HDD3/OK2RM | OFF ===================================================
2) However from the dbmcli -e list physicaldisk output, it still shows the disks are failed. # dbmcli -e list physicaldisk
=========================================================================== 252:0 BUEMHX failed 252:1 BURE4X failed 252:2 BUYJAX failed 252:3 BURE8X failed =========================================================================== 3) In order to clear the failed status in the above dbmcli output you will need to edit the cellinit.ora file, however be sure to edit the correct file. To find the correct cellinit.ora file, issue the following command to identify the image version # imageinfo -ver
Then use the imageinfo -ver output to edit the correct file such as below: /opt/oracle/dbserver_version/dbms/deploy/config/cellinit.ora In this example imageinfo -ver reported 12.1.2.1.2.150617.1 therefore the path to the file will be as follows: /opt/oracle/dbserver_12.1.2.1.2.150617.1/dbms/deploy/config/cellinit.ora Add the following line to cellinit.ora "_cell_allow_reenable_predfail=true"
4) Restart ms # dbmcli -e alter dbserver restart services all
5) reenable physicaldisks that are marked as failed # dbmcli -e alter physicaldisk <pdid> reenable force
To identify the <pdid> and failed disks use # dbmcli -e list physicaldisk an example is: # dbmcli -e alter physicaldisk 252:0 reenable force Once all failed disks are reenabled, check the # dbmcli -e list physicaldisk output to confirm all disks are normal: # dbmcli -e list physicaldisk
========================================================================== 252:0 BUEMHX normal 252:1 BURE4X normal 252:2 BUYJAX normal 252:3 BURE8X normal ========================================================================== 6) Check dbmcli -e list alerthistory output to make sure the disks status are all changed to normal: # dbmcli -e list alerthistory
9_2 2016-02-26T14:20:38+08:00 clear "Hard disk status changed to normal. Status : NORMAL Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BUEMHX Firmware : A690 Slot Number : 0"^M 10_1 2016-02-02T19:20:43+08:00 critical "Hard disk failed. Status : FAILED Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BURE4X Firmware : A690 Slot Number : 1"^M 10_2 2016-02-26T14:20:58+08:00 clear "Hard disk status changed to normal. Status : NORMAL Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BURE4X Firmware : A690 Slot Number : 1"^M 11_1 2016-02-02T19:20:47+08:00 critical "Hard disk failed. Status : FAILED Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BUYJAX Firmware : A690 Slot Number : 2"^M 11_2 2016-02-26T14:21:09+08:00 clear "Hard disk status changed to normal. Status : NORMAL Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BUYJAX Firmware : A690 Slot Number : 2"^M 12_1 2016-02-02T19:20:51+08:00 critical "Hard disk failed. Status : FAILED Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BURE8X Firmware : A690 Slot Number : 3"^M 12_2 2016-02-26T14:21:19+08:00 clear "Hard disk status changed to normal. Status : NORMAL Manufacturer : HITACHI Model Number : H109060SESUN600G Size : 600GB Serial Number : 1516BURE8X Firmware : A690 Slot Number : 3"^M Note: Leaving "_cell_allow_reenable_predfail=true" in cellinit.ora after the disk is reenabled will not be a problem. Its just a parameter for MS to allow reenabling. By default (without this parameter) MS will throw an error when trying to change the disk which is in CRITICAL / Failed status back to normal. You can remove "_cell_allow_reenable_predfail=true" from cellinit.ora after running "dbmcli -e alter physicaldisk <pdid> reenable force" and checking if the disks are back to normal using "dbmcli -e list physicaldisk".
References: This problem is due to the following BUG 22079518, This is a workaround to fix the disks showing Amber Led on an Exadata Database Node. Attachments This solution has no attachment |
||||||||||||||
|