Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1584408.1
Update Date:2015-03-19
Keywords:

Solution Type  Technical Instruction Sure

Solution  1584408.1 :   How to rebuild a SSD disk in Firmware state: Unconfigured (bad) manually  


Related Items
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exalogic/OVCA>Oracle Exalogic>MW: Exalogic Core
  •  




In this Document
Goal
Solution
References


Created from <SR 3-7793708041>

Applies to:

Exalogic Elastic Cloud X3-2 Hardware - Version X3 to X5 [Release X3 to X5]
Linux x86-64

Goal

Accidentally customer pulled / pushed back one SSD disk in a computer node of Exalogic rack, although SSD and LSI MegaRAID itself supports hot swapping, but ideally this operation should not be performed by customer for any reason.

Events log file contains thos operation details : cnxl06xxx_megacli64-GetEvents-all_2013_09_11_13_51.out

Time: Mon Sep 9 09:48:22 2013 Event Description: Removed: PD 08(e0xfc/s1) <-- Remove
Time: Mon Sep 9 09:48:22 2013 Event Description: VD 00/0 is now DEGRADED
Time: Mon Sep 9 09:48:22 2013 Event Description: State change on PD 08(e0xfc/s1) from FAILED(11) to UNCONFIGURED_BAD(1)
Time: Mon Sep 9 09:48:49 2013 Event Description: Inserted: PD 08(e0xfc/s1) Info: enclPd=fc, scsiType=0, portMap=00, sasAddr=4433221102000000,0000000000000000 <-- Pushed back 20s later

 Eventually it will make the disk in Firmware state: Unconfigured(bad), and amber light will be On. But no fault could be identified in ILOM:

Oracle(R) Integrated Lights Out Manager

Version 3.1.2.10.a r76304

Copyright (c) 2012, Oracle and/or its affiliates. All rights reserved.

-> show faulty

Target              | Property               | Value                          
--------------------+------------------------+---------------------------------

->

 Even the adapter AutoRebuild is enabled (the default value):

[root@cnxl06cn16 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -AdpAutoRbld -Dsply -a0
Adapter 0: AutoRebuild is Enabled.

Exit Code: 0x00

 

Solution

To recover this situation, we have two solutions:

1. We can use the procedure described in Oracle Documentation : NEM Removal Causes Virtual Disk Members to Be Displayed as (FOREIGN) Unconfigured Bad, but it need reboot to WebBIOS utility.

2. Please contact Oracle Support for assistance for manual steps to recover the situation.

The following INTERNAL ONLY section of this note provides a description of the steps that will need to be performed under support supervision of Exalogic Support and Hardware Engineer 

For any help for specific command line options, please refer to: MegaRAID SAS Software User Guide from LSI.

Note: Internal Note 1383285.1 mentiones that this fault needs additional manual steps, but no details are provided

Manuel Recovery Procedure:

1. Find the issue disk & cleanup the foreign disk

[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL

Adapter #0 <-- Adapter 0 (command line option -a0)

Enclosure Device ID: 252 <-- Enclosure
Slot Number: 1 <-- Slot
...
PD Type: SATA
Raw Size: 93.160 GB [0xba52230 Sectors]
Non Coerced Size: 92.660 GB [0xb952230 Sectors]
Coerced Size: 92.200 GB [0xb866800 Sectors]
Firmware state: Unconfigured(bad) <-- Firmware state
...
Foreign State: Foreign <-- Foreign State
Foreign Secure: Drive is not secured by a foreign lock key

[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -PDMakeGood -PhysDrv[252:1] -a0

Adapter: 0: EnclId-252 SlotId-1 state changed to Unconfigured-Good.

Exit Code: 0x00
[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Scan -a0

There are 1 foreign configuration(s) on controller 0.

Exit Code: 0x00
[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgForeign -Clear -a0

Foreign configuration 0 is cleared on controller 0.

Exit Code: 0x00 

2. Replace the missing disk & rebuild the RAID

[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -a0

==============================================================================
Adapter: 0
Product Name: LSI MegaRAID SAS 9261-8i
Memory: 512MB
BBU: Present
Serial No: SV25304945
==============================================================================
Number of DISK GROUPS: 1

DISK GROUP: 0
Number of Spans: 1
SPAN: 0
Span Reference: 0x00 <-- Span Reference: 0x00 is the number of the array (strip the 0x0 part, option -array0).
Number of PDs: 2
Number of VDs: 1
Number of dedicated Hotspares: 0
Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 92.200 GB
Mirror Data         : 92.200 GB
State               : Degraded <-- RAID status Degraded
...
Physical Disk: 1 <-- Disk 1 has issue (option -row1)
<-- Information for disk 1 is blank

Exit Code: 0x00
[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[252:1] -array0 -row1 -a0

Adapter: 0: Missing PD at Array 0, Row 1 is replaced.

Exit Code: 0x00
[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -Start -PhysDrv[252:1] -a0

Started rebuild progress on device(Encl-252 Slot-1)

Exit Code: 0x00

[root@cnxl06cn16 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ProgDsply -PhysDrv [252:1] -a0
View the ongoing copyback operation. The routine continues to display progress until at least one copyback is completed or a key is pressed. 

 3. Final verification

[root@cnxl06cn16 /]# /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -a0
...
State  : Optimal
...
Firmware state: Online, Spun Up (2 SSDs all have this state)
...
  

References

<NOTE:1383285.1> - How to Remove and Replace a Failed Solid State Disk Drive on an Exalogic Compute Node
<NOTE:1364081.1> - Exalogic Compute node Tools

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback