Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1506324.1
Update Date:2018-05-17
Keywords:

Solution Type  Technical Instruction Sure

Solution  1506324.1 :   How to Replace a Sun Fire X4540 HDD (Predictive Failure)  


Related Items
  • Sun Fire X4540 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Applies to:

Sun Fire X4540 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

 How to Replace a Sun Fire X4540 HDD (Predictive Failure).

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
No special skills required, Customer Replaceable Unit (CRU) procedure

TIME ESTIMATE: 30 minutes

TASK COMPLEXITY: 0

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: A Sun Fire X4540 HDD (Predictive Failure) needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

Caution: To avoid overheating the server, if the server is powered on, do not leave HDD out for longer than 60 seconds at a time. Remove and replace only one HDD at a time. Replace HDD access cover as soon as the service tasks are completed. Before removing a drive, have the replacement drive ready to be installed.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

1. Remove the drives access cover.

2. Identify the drive to be removed by checking its LEDs. If the middle LED is on (amber), the drive is faulty and should be replaced.

3. Use the operating system or management software to take HDD offline before you replace it. Not doing so could cause data loss or unexpected error messages.  Instructions for Solaris using zfs follow.  If the customer is running Linux or MS windows, verify with the customer they have taken the disk offline before you hot plug replace it.  Once the drive has been taken off line, the left (blue) LED should turn on. This means the drive is ready to be removed and service action is allowed.

Caution: Pulling a drive that has that has not been prepared for removal can cause a loss of the drive cell memory map or loss of data in its in/out buffers.

4. Check faulted disk status in zfs

# zpool status POOLNAME
  pool: POOLNAME
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

errors: No known data errors

 
5. Bring disk cXtYd0 offline

# zpool offline POOLNAME cXtYd0

6. Confirm zfs shows the disk is offline

# zpool status POOLNAME
  pool: POOLNAME
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME    DEGRADED     0     0     0
          raidz1    DEGRADED     0     0     0
            c3t0d0  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  OFFLINE      0     0     0
            c3t4d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

errors: No known data errors

 
7. Get cN::/dsk/cXtYd0 device name

# cfgadm -al | grep cXtYd0
c3::dsk/c3t3d0                 disk         connected    configured   unknown

8. Unconfigure the disk using cN::/dsk/cXtYd0 device name

# cfgadm -c unconfigure cN::dsk/cXtYd0

# cfgadm -al | grep cXtYd0
c3::dsk/c3t3d0                 disk         connected    unconfigured unknown

9. Remove the drive. Lift the metal latch and remove the drive from the drive bay as shown below, or on the service label.

10. Install the new drive. Push the drive into the bay until it stops, and make sure the drive is fully engaged with the connector on the drive backplane.

11. Make sure the metal handle is properly seated.

12. Replace HDD access cover.

After about 5 to 20 seconds, the disk should autoreconfigure from "unconfigured" to "configured"

13. Check to see if the disk has auto reconfigured  (If not, see step #15)

# cfgadm -al | grep c3t3d0
c3::dsk/c3t3d0                 disk         connected    configured   unknown

The disk should may auto ONLINE in zfs.  (If not, see step #15)

14.  Check zfs status

# zpool status POOLNAME
  pool: POOLNAME
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Fri Aug 17 11:22:50 2012
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME    ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c3t2d0  ONLINE       0     0     0
            c3t3d0  ONLINE       0     0     0  3K resilvered
            c3t4d0  ONLINE       0     0     0
            c3t5d0  ONLINE       0     0     0
            c3t6d0  ONLINE       0     0     0
            c3t7d0  ONLINE       0     0     0

errors: No known data errors

 
15. If the system did not auto reconfigure, the two commands needed in this example are:
# cfgadm -c configure c3::dsk/c3t3d0
# zpool online POOLNAME c3t3d0


WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
For hot plug, configure the drive and verify drive availability.
Use appropriate software commands to re-activate/re-sync mirror if manual intervention is required

PARTS NOTE:
Note: Before removing a drive, have the replacement drive ready to be installed.

REFERENCE INFORMATION:

See the section "To replace a hard drive (CRU)" in the

Sun Fire X4500/X4540 Server Service Manual
http://download.oracle.com/docs/cd/E19121-01/sf.x4500/819-4359-19/index.html

"To replace a hard drive (CRU)" section
http://docs.oracle.com/cd/E19121-01/sf.x4500/819-4359-19/CH3-maint.html#50647083_22785

References

<NOTE:1002753.1> - How to Replace a Drive in Solaris[TM] ZFS

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback