Sun StorEdge 3310/3320/3510/3511: How To Replace A Failed Disk In A Logical Drive:ATR:1013:3

Asset ID:	1-71-1003692.1
Update Date:	2018-01-08
Keywords:

Solution Type Technical Instruction Sure

Solution 1003692.1 : Sun StorEdge 3310/3320/3510/3511: How To Replace A Failed Disk In A Logical Drive:ATR:1013:3

Applies to:

Sun Storage 3310 Array - Version Not Applicable and later
Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3510 FC Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
All Platforms

Goal

This document describes the procedure for disk replacement in a RAID array where logical drive status is "Degraded" (no global/local spare configured) or "Good" (global/local spare replaced the bad drive).

Note: TSC Engineer needs to specify: a) Failed disk location, b) RAID Level of Logical Device RAID 0,1,5 or NRAID c) With or without a global/local spare disk d) Any special replacement instructions

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
FIELD ENGINEER/ADMINISTRATOR must be trained in the Sun StorEdge 3xxx array platform or have previous product experience.

TASK COMPLEXITY: 3

TIME ESTIMATE: 30 minutes

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS

PROBLEM OVERVIEW:
There is a failed disk in the array.

CONFIRM THE FAULT BY CHECKING:
Drive status as well as the logical drive status.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:
The array should be up and running.

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

(Tools needed)

Key for the front panel to access to the drive.
DB-9M to DB-25M Adapter, 35-00000109
Screw driver (if thumbscrew has been tightened).
"host" connected via one or more of following:

telnet access to array Firmware Application Menu through ip network.
in-band with sccli through data path.
out-of-band with sccli through ip network.
serial console such as a laptop connected to the serial port.

Steps to Follow:

Step 1 - Checks and preparation.

Check and confirm the status of the Logical Drive is NOT Failed or Incomplete.
From Firmware Application Main Menu:
1. Choose "view and edit Logical drives".
2. Under the "#FL" column, the number of failed drives is listed with the associated Logical drive.
From command line:

sccli -o <ip address> # via out-of-band connection
# or
sccli /dev/dsk/<c#t#d#s2> # via inband connection

sccli> show logical-drives

If the status of the Logical drive is Failed or Incomplete, or if more than one drive had failed in a RAID 5 Logical Drive, DO NOT ATTEMPT this procedure and contact Oracle Support.
Save the NVRAM configuration to disk for safekeeping.
From Firmware Application Main Menu:
1. Choose "system Functions".
2. Then using the arrow keys to scroll down, select "Controller maintenance".
3. Select "save NVRAM to disks".
4. Choose "Yes" to confirm when prompted. An "NVRAM Saved" message will be displayed indicating success.
5. After "NVRAM Saved" message has been displayed, hit escape to exit back to the main menu.

Step 2 - Locate the defective disk drive.

1.Find the SCSI Channel Number and SCSI Target ID of the disk to be replaced.
From Firmware Application Main Menu:
1. Select "view and edit scsi Drives".
From command line:

sccli -o <ip address> # via out-of-band connection
# or
sccli /dev/dsk/<c#t#d#s2> # via inband connection

sccli> show disks

Locate the disk drive that has a status of BAD or FAILED in the status column and take note of the Channel Number, SCSI target ID and the associated Logical Drive number.
If the disk is being proactively replaced (It has not hard failed yet), see <Document 1500079.1> Sun Storage 3000 Arrays: How to Proactively Fail and Remove a Failing Disk.
Physically locate the defective disk using the channel and ID numbers. Note: Failure to identify the correct disk drive might result in replacing the wrong disk drive and could cause a loss of data. Be sure that you have identified the correct disk drive.
Additional information on ID locations can be found in 816-7326 "Sun StorEdge Array Installation, Operation and Service Manual".
If you are uncertain of the drive location, you can flash the LED(s) to help in identifying the disk. This method only works if there is no I/O activity.
From Firmware Application Main Menu:
1. Select "view and edit scsi Drives".
2. Highlight the selected drive in question and hit return.
3. Choose "Identifying scsi drive".
4. Select "flash all But selected drive" to flash the activity LEDs of all the drives on this channel EXCEPT the problem drive.

Step 3 - Disk removal and replacement.

Unseat the failed disk identified above, let it spin down for 20 seconds and remove it from the array.
Install the replacement disk.
Drive removal and installation instructions can be found in the 816-7326 "Sun StorEdge 3000 Family FRU Installation Guide".
Check to see if the replaced drive was automatically scanned onto the bus:
From Firmware Application Main Menu:
1. Select "view and edit scsi Drives".
From command line:

sccli -o <ip address> # via out-of-band connection
# or
sccli /dev/dsk/<c#t#d#s2> # via inband connection

sccli> show disks

The drive status should show NEW_DRV or USED_DRV.
In a 3310 or 3320 array, if the replaced drive was not automatically scanned onto the bus, you may need to manually scan the replacement disk into the configuration. (This does not apply to 3510 and 3511 arrays)
From Firmware Application Main Menu:
1. Select "view and edit scsi Drives".
2. Select any disk in the list, then select "Scan scsi drive".
3. Select the SCSI Channel Number, then the SCSI Target ID of the replaced disk drive, and confirm "Yes" when prompted.

Step 4 - Decide how the newly replaced drive should be configured.

If the drive replaced was configured in a Logical Drive in:

RAID 1 or 5, with global or local spare, please proceed to Step 5A.
RAID 1 or 5, without global or local spare, please proceed to Step 5B.
RAID 0 or NRAID, please proceed to Step 5C.

Step 5A - Logical Drive in RAID 1 or 5 with Global/Local Spare.

When a spare took the place of a failed drive, the spare will no longer be considered as a spare, as it had became part of the Logical Drive. You can decide what to do with the newly replaced drive.

The newly replaced drive may be assigned as a new local or global spare. See Step 5A-AsSpare.
The newly replaced drive may be re-integrated into the Logical Drive. See Step 5A-Reintegrate.

If the physical slot location of the disks comprising the Logical Drive are not important, then it is recommended to configure the newly replaced drive as a new spare; in doing so, a copy and replace operation is avoided.

Step 5A - As Spare - Assign the newly replaced drive as a local or global spare.

From Firmware Application Main Menu:

Select "view and edit scsi Drives".
Select the newly replaced disk drive.
Select "add Global spare drive", or "add Local spare drive" and confirm yes when prompted.

From command line:

sccli> configure global-spare <ch>.<drive id>
# or
sccli> configure local-spare <ch>.<drive id> <logical drive>

Step 5A - Reintegrate - Re-integrate the newly replaced disk back into the RAID 1 or 5 Logical Drive.

Identify the previous spare disk which is now part of the Logical Drive and is going to be removed in favour of the newly replaced disk.
From Firmware Application Main Menu:
1. Select "view and edit Logical drives".
2. Select the target Logical Drive, then "copy and replace drive".
3. Select the previous spare drive, then select the replaced disk drive, confirm "yes" when prompted.
(There is no sccli command line equivalent for "copy and replace drive".)
Wait for the copy and replace to complete.
Configure the original spare disk back to a local or global spare

Step 5B - Logical Drive in RAID 1 or 5 with no Global/Local Spare.

The Logical Drive in RAID 1 or 5 would be in "Degraded" status and you will need to manually initiate a rebuild into the newly replaced drive. Refer to <Document 1601641.1> Sun Storage 3000 Arrays: How to Resolve the Degraded Status of a Redundant (RAID-5 or RAID-1) Logical Drive.

Step 5C - Logical Drive in RAID 0 or NRAID.

A RAID 0 or NRAID configuration provide no data redundancy and therefore any drive failure would have resulted in loss of data. Recovery implies deleting and re-creating the logical drive as per normal array administration procedure, then data can be restored from backup.

JBOD ARRAY

To replace failed disk in a JBOD: identify the failed drive following the steps in Identifying the Defective Disk Drive in a JBOD Array in the:

Sun StorEdge 3000 Family FRU Installation Guide

http://docs.oracle.com/cd/E19673-01/816-7326-23/index.html

Replace the identified failed drive, following the instructions in:

Removing a Defective Disk Drive in a RAID or JBOD Array

http://docs.oracle.com/cd/E19673-01/816-7326-23/ch02_drives.html#pgfId-1001110

followed by:

Installing a New Disk Drive in a RAID or JBOD Array

http://docs.oracle.com/cd/E19673-01/816-7326-23/ch02_drives.html#pgfId-999621

OBTAIN CUSTOMER ACCEPTANCE, CONFIRM THE FIX BY:
Checking the status of the disk as well as the status of the logical drive if applicable.

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
If a rebuild is necessary, it will run in the background without impact to I/O. If a rebuild isn't necessary as is the case of a drive failure in a JBOD, NRAID or RAID0, the data will either need to be restored or given back to the OS layer to be used with a disk management software (ex., Solaris Volume Manager or Veritas Volume Manager).

REFERENCE INFORMATION:

REFERENCE INFORMATION:

Sun StorEdge 3000 Family FRU Installation Guides - Section 2.2 "Replacing a Disk Drive"
3310: http://download.oracle.com/docs/cd/E19673-01/
3320: http://download.oracle.com/docs/cd/E19168-01/
3510/3511: http://download.oracle.com/docs/cd/E19487-01/

References

<NOTE:1601641.1> - Sun Storage 3000 Arrays: How to Resolve the Degraded Status of a Redundant (RAID-5 or RAID-1) Logical Drive

Attachments

This solution has no attachment