Replacement disk managed by LSI not recognised by OS

Asset ID:	1-72-1671764.1
Update Date:	2018-02-26
Keywords:

Solution Type Problem Resolution Sure

Solution 1671764.1 : Replacement disk managed by LSI not recognised by OS

Applies to:

Sun SPARC Enterprise T5220 Server - Version All Versions and later
Windows Server - Version 2003 x64 and later
Solaris Operating System - Version 10 3/05 and later
Linux OS - Version Enterprise Linux 3.0 and later
Sun Fire X4170 M2 Server - Version All Versions and later
Information in this document applies to any platform.

Symptoms

After inserting the newly received disk in the system, the green status LED is lit yet the disk is not seen or can not be used by the O/S. Rebooting the system will still not bring the disk back to the O/S.

Considering the faulted disk is on a Solaris system, the format output might omit the new disk altogether.

AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <LSI cyl 36348 alt 2 hd 255 sec 63>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@0,0
       1. c0t1d0 <LSI cyl 36348 alt 2 hd 255 sec 63>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@1,0
       2. c0t2d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@2,0
       3. c0t3d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@3,0
       4. c0t4d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@4,0
       5. c0t5d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@5,0
       6. c0t6d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@6,0
       7. c0t7d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@7,0
       8. c0t8d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@8,0
       9. c0t9d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@9,0
      10. c0t10d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@a,0
      11. c0t12d0 <LSI-MR9261-8i-2.12-278.46GB>
          /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@d,0

Note: c0t11d0 is missing from the format output.

Changes

A disk failure occurred on the system. The failed drive was replaced with a known to work one..

Cause

There is no JBOD option for this host bus adapter (HBA) , so disks need to be configured as an individual RAID 0 simple volume.
This allows the O/S to pickup each individual drive and thus allow for software RAID schemes to be implemented.

Under Solaris, this can be clearely visible via the format output:

Each disk drive is listed as the vendor LSI and not the disk drive model.

Solution

For the disk to be seen by the O/S and then brought back to software the below steps need to be performed:

Ensure the new disk is seen by the controller
Create a RAID0 simple volume from the new disk
The disk should now be ready to be managed by the preferred software utility (depending on the O/S, a reboot may be required the procedure is not done at the adapters BIOS level)

1. Once the new disk is inserted in the system, use the MegaCli utility and ensure the disk is seen by the adapter. For more details concerning the utility. see MegaCli and sas2ircu - utility to manage Internal Raid HBA (LSI-Niwot /Erie) (Doc ID 1513610.1).

#./MegaCli -PDList -aAll

Adapter #0

Enclosure Device ID: 32
Slot Number: 0
...........................

Enclosure Device ID: 32
Slot Number: 11
Drive's position: DiskGroup: 22, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 31
WWN: 5000CCA025372E57
Sequence Number: 2
Media Error Count: 0
Other Error Count: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.464 GB [0x22cee000 Sectors]
Sector Size: 0
Firmware state: Unconfigured(good), Spun down
Device Firmware Level: A2B0
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000cca025372e55
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: HITACHI H106030SDSUN300GA2B01208NZAT1B
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature :27C (80.60 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : No

...........................
Drive has flagged a S.M.A.R.T alert : No

The new disk will have the firmware state: Unconfigured(good), Spun down. "Spun down" state is due to the fact the disk is not used in any configuration, not even as a hot spare.

The following details are required from the above output:

- Adapter No
- Enclosure Device ID
- Slot Number

2. Next, the RAID0 simple volume should be created. The general command is "MegaCli -CfgLdAdd -r(0|1|5) [E:S, E:S, ...] -aN".

./MegaCli -CfgLdAdd -r0 [32:11] -a0

Adapter 0: Created VD 11
Adapter 0: Configured the Adapter!!
Exit Code: 0x00

It may happen for the response of the system to be:

Adapter 0: Configure Adapter Failed

Exit Code: 0x54

0x54 means that the controller has LD cache pinned. When a virtual disk becomes offline or is deleted because of missing physical disks, the controller may preserves the dirty cache from the virtual disk. To check for preserved cache, the general command is "MegaCli -GetPreservedCacheList -a(0|1|..|ALL)".

./MegaCli -GetPreservedCacheList -a0

Adapter #0

Virtual Drive(Target ID 00): Missing.

Exit Code: 0x00

The general command to discard the cache is "MegaCli -DiscardPreservedCache -L(0|1|...|ALL) -a(0|1|...|ALL)".

ldnmrqfl02 # ./MegaCli -DiscardPreservedCache -L0 -a0

Adapter #0

Virtual Drive(Target ID 00): Preserved Cache Data Cleared.

Exit Code: 0x00
Once this is completed, the volume can be created.

Should this fail, there are 2 last resort options:
1. clear preserved cache from all VD's:
./MegaCli -DiscardPreservedCache -Lall -a0

2. remove the BBU to forcebly clear the cache.

3. Depending on the O/S, a system reboot may be required before the disk is visible to the O/S.

The disk should now be ready to be brought back under the preferred RAID software.

References

<NOTE:1395234.1> - How to replace an internal disk in a volume under LSI RAID controller
<NOTE:1362952.1> - How to Replace a Disk in a rpool for an x86 System
<NOTE:1002753.1> - How to Replace a Drive in Solaris[TM] ZFS
<NOTE:1386502.1> - Replacement disk not recognized on X4540

Attachments

This solution has no attachment