Asset ID: |
1-71-1506324.1 |
Update Date: | 2018-05-17 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1506324.1
:
How to Replace a Sun Fire X4540 HDD (Predictive Failure)
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
In this Document
Applies to:
Sun Fire X4540 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Goal
How to Replace a Sun Fire X4540 HDD (Predictive Failure).
Solution
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
No special skills required, Customer Replaceable Unit (CRU) procedure
TIME ESTIMATE: 30 minutes
TASK COMPLEXITY: 0
FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
PROBLEM OVERVIEW: A Sun Fire X4540 HDD (Predictive Failure) needs replacement
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :
Caution: To avoid overheating the server, if the server is powered on, do not leave HDD out for longer than 60 seconds at a time. Remove and replace only one HDD at a time. Replace HDD access cover as soon as the service tasks are completed. Before removing a drive, have the replacement drive ready to be installed.
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
1. Remove the drives access cover.
2. Identify the drive to be removed by checking its LEDs. If the middle LED is on (amber), the drive is faulty and should be replaced.
3. Use the operating system or management software to take HDD offline before you replace it. Not doing so could cause data loss or unexpected error messages. Instructions for Solaris using zfs follow. If the customer is running Linux or MS windows, verify with the customer they have taken the disk offline before you hot plug replace it. Once the drive has been taken off line, the left (blue) LED should turn on. This means the drive is ready to be removed and service action is allowed.
Caution: Pulling a drive that has that has not been prepared for removal can cause a loss of the drive cell memory map or loss of data in its in/out buffers.
4. Check faulted disk status in zfs
# zpool status POOLNAME
pool: POOLNAME
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
POOLNAME ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
5. Bring disk cXtYd0 offline
# zpool offline POOLNAME cXtYd0
6. Confirm zfs shows the disk is offline
# zpool status POOLNAME
pool: POOLNAME
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
POOLNAME DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 OFFLINE 0 0 0
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
7. Get cN::/dsk/cXtYd0 device name
# cfgadm -al | grep cXtYd0
c3::dsk/c3t3d0 disk connected configured unknown
8. Unconfigure the disk using cN::/dsk/cXtYd0 device name
# cfgadm -c unconfigure cN::dsk/cXtYd0
# cfgadm -al | grep cXtYd0
c3::dsk/c3t3d0 disk connected unconfigured unknown
9. Remove the drive. Lift the metal latch and remove the drive from the drive bay as shown below, or on the service label.
10. Install the new drive. Push the drive into the bay until it stops, and make sure the drive is fully engaged with the connector on the drive backplane.
11. Make sure the metal handle is properly seated.
12. Replace HDD access cover.
After about 5 to 20 seconds, the disk should autoreconfigure from "unconfigured" to "configured"
13. Check to see if the disk has auto reconfigured (If not, see step #15)
# cfgadm -al | grep c3t3d0
c3::dsk/c3t3d0 disk connected configured unknown
The disk should may auto ONLINE in zfs. (If not, see step #15)
14. Check zfs status
# zpool status POOLNAME
pool: POOLNAME
state: ONLINE
scrub: resilver completed after 0h0m with 0 errors on Fri Aug 17 11:22:50 2012
config:
NAME STATE READ WRITE CKSUM
POOLNAME ONLINE 0 0 0
raidz1 ONLINE 0 0 0
c3t0d0 ONLINE 0 0 0
c3t1d0 ONLINE 0 0 0
c3t2d0 ONLINE 0 0 0
c3t3d0 ONLINE 0 0 0 3K resilvered
c3t4d0 ONLINE 0 0 0
c3t5d0 ONLINE 0 0 0
c3t6d0 ONLINE 0 0 0
c3t7d0 ONLINE 0 0 0
errors: No known data errors
15. If the system did not auto reconfigure, the two commands needed in this example are:
# cfgadm -c configure c3::dsk/c3t3d0
# zpool online POOLNAME c3t3d0
WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
For hot plug, configure the drive and verify drive availability.
Use appropriate software commands to re-activate/re-sync mirror if manual intervention is required
PARTS NOTE:
Note: Before removing a drive, have the replacement drive ready to be installed.
REFERENCE INFORMATION:
See the section "To replace a hard drive (CRU)" in the
Sun Fire X4500/X4540 Server Service Manual
http://download.oracle.com/docs/cd/E19121-01/sf.x4500/819-4359-19/index.html
"To replace a hard drive (CRU)" section
http://docs.oracle.com/cd/E19121-01/sf.x4500/819-4359-19/CH3-maint.html#50647083_22785
References
<NOTE:1002753.1> - How to Replace a Drive in Solaris[TM] ZFS
Attachments
This solution has no attachment