Sun Storage 7000 Unified Storage System: How to perform FCO 328 ( 600GB Hitachi Drives )

Asset ID:	1-75-1532677.1
Update Date:	2013-09-25
Keywords:

Solution Type Troubleshooting Sure

Solution 1532677.1 : Sun Storage 7000 Unified Storage System: How to perform FCO 328 ( 600GB Hitachi Drives )

Applies to:

Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Sun Storage Disk Shelf Array - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Purpose

Some HDDs may not adequately perform write operations under certain specific conditions

Condition given:

i, 'Low' environmental temperature, PLUS immediately following power on (before the HDD warms itself up)

Troubleshooting Steps

This issue may affect the following platforms

TA 7000 (RW2)
ZFS SA 7420
ZFS SA 7120

Overall Strategy

TA 7000 (RW2) To proactively replace all affected drives in order of potential impact
ZFS 7420 To proactively replace all affected drives in order of potential impact
ZFS 7120 To proactively replace all affected drives in order of potential impact

Affected Parts

Manufacturing Part #	Description
7047035 (CRU) 390-0483 350-1508	600GB - 15000 RPM SAS Disk Assembly with 1 bracket and 1 of the following disks: Hitachi HUS156060VLS600 [HUS1560SCSUN600G] (600GB - 15000 RPM - SAS Disk) Stingray 3.5" Mounting Bracket

FAQ

1. Are the drives to be replaced hot swappable?

A, Yes, they are hot swappable drives

2. Are there any precautions to be taken or possible errors that could occur that would create an outage?

A, No precautions are required and there are no expected errors that can cause an outage.

3. Is it recommended to replace one disk per pool at a time, then wait for re-silver to complete, and continue to the next drive?

A, Yes, Oracle recommends to replace disk per pool and wait for the re-silver to finish before replacing the next drive.

4. Is there an estimate time on how long the average re-sliver should take?

A, It is not possible to give an estimate time as it depends on the load of the system and the pool layout ( mirror. RAIDZ, etc.).

5. We can mix firmware levels?

A, It is possible to insert a lower rev. drive as the AR will automatically upgrade the disk F/W.

Procedure for Sun ZFS Unified Storage Appliance Hitachi 600gb Hard Disk Drive Replacement

The below steps refer to Canned Action Plan "How to Replace Sun ZFS Unified Storage Appliance 600GB Hitachi Drives FCO 328 :ATR [ID 1553539.1]"

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?

1, The system can be in normal production.

2, There should be no fault lights on the appliance and no disk locator lights flashing.

4, There should be no problems shown in the BUI, Maintenance->Problems.

5, If this is a clustered appliance, check both heads.

6, De-activate the ASR enabled asset for the NAS before performing this procedue. This will prevent additional Service Requests from being created improperly.

See:- ASR Deactivation / Reactivation via My Oracle Support [ID 1508403.1]

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Run the script (disk_sn_locator.aksh) on the affected appliance/NAS .
Instructions are included in the script. (You will find the script in this documents attachment section )

Run using an ssh session:

$ ssh root@zfssa-hostname < disk_sn_locator.aksh

The script will do the following:

Locate affected disks based on serial number
Turn on locator LED when an affected disk is discovered

If there is no output after running this script then there no affected disks to replace. The rest of this document then does not apply to the NAS/Appliance.

If affected disks are found you will have output from the script and these will need replacement.

Script example output:

# ssh root@xx.xx.x.xxx < disk_sn_locator.aksh
Pseudo-terminal will not be allocated because stdin is not a terminal.
The authenticity of host 'xx.xx.x.xxx (xx.xx.x.xxx )' can't be established.
RSA key fingerprint is 77:8d:84:3d:b9:10:01:4a:21:c2:fd:9f:c1:6c:71:01.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'xx.xx.x.xxx ' (RSA) to the list of known hosts.
Password:
host: S7320, chassis: 1243FMD02R, disk: disk-001/1238K304AL/c4t5000CCA02AAE9098d0, locate: on
host: S7320, chassis: 1243FMD02R, disk: disk-006/1238K40MYN/c4t5000CCA02AB069A4d0, locate: on
host: S7320, chassis: 1243FMD02R, disk: disk-007/1238K30E7L/c4t5000CCA02AAE94E8d0, locate: on
host: S7320, chassis: 1243FMD02R, disk: disk-008/1238K7SZBN/c4t5000CCA02AB73C80d0, locate: on
host: S7320, chassis: 1243FMD02R, disk: disk-011/1238K7T1DN/c4t5000CCA02AB73D7Cd0, locate: on
host: S7320, chassis: 1243FMD02R, disk: disk-015/1238K868JN/c4t5000CCA02AB80424d0, locate: on
host: S7320, chassis: 1243FMD02R, disk: disk-018/1238K80A7N/c4t5000CCA02AB7AAE0d0, locate: on

The following steps should be followed in replacing affected disks.
Take ESD Precautions.

1. Pull disk, and wait for resilver to complete.

Physically locate the drive by flashing locator LED.
On the drive push the storage drive release button to open the latch.
Grasp the latch and pull the drive out of the drive slot.
Wait for at least 30 seconds, before continuing with the next step.
Ensure the disk ejection lever is in the fully extended position.
After disk has pull verify the re-silver process has started by checking the BUI.

CLI - method to check the re-silver:

>configuration storage show

Wait resilver to complete. Then go to step 2.

2. Insert the replacement disk and wait for resilver to complete.

Slide the new drive into the empty drive slot until it is fully seated.
Close the latch to lock the drive in place.
Wait for resilver to complete.

CLI - method to check the re-silver:

>configuration storage show

3. Confirm appliance is in normal state

Confirm appliance is in normal state and the client data access, pool is online and
no faults (related to disks) reported in BUI. Mark Repaired any prior related disk faults.
Deactivate locator LED on the new drive via BUI/CLI.

CLI - method to check the Device Status:

>configuration storage show

If there is an UNAVAIL/faulted device present in the "zpool status" after replacement, then contact the Technical Solution Center before you replace the next disk.

4. Run the script (disk_sn_locator.aksh) for final check

5. If finish maintenance, please reactivate the ASR

6. For the Parts Return Process, please refer to "FCO A0328-1: Main Doc 1534269.1" ( References )

References

<NOTE:1534269.1> - FCO A0328-1: Proactive: Hitachi Viper C 600GB Drives experience poor Over Write of HDD heads at cold temperature (5-15 Deg C).
<NOTE:1366035.1> - Sun Storage 7000 Unified Storage System: Troubleshooting Disk Drive Failures
<NOTE:1019887.1> - Sun Storage 7000 Unified Storage System: How to collect a supportbundle using the BUI or CLI
<NOTE:1194226.1> - Oracle Shared Shell
<NOTE:1508403.1> - ASR Deactivation / Reactivation via My Oracle Support
<NOTE:1553539.1> - How to Replace Sun ZFS Unified Storage Appliance 600GB Hitachi Drives FCO 328 :ATR
<NOTE:1427034.1> - Sun Storage 7000 Unified Storage System: How to check resilvering

Attachments

This solution has no attachment