Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1506299.1
Update Date:2018-05-17
Keywords:

Solution Type  Technical Instruction Sure

Solution  1506299.1 :   How to Replace a Sun Fire X4500 HDD (Predictive Failure)  


Related Items
  • Sun Fire X4500 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  


How to Replace a Hard Drive in an x4500 (Predictive Failure)

In this Document
Goal
Solution
References


Created from <SR HOW>

Applies to:

Sun Fire X4500 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replace a Sun Fire X4500 HDD (Predictive Failure).

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
No special skills required, Customer Replaceable Unit (CRU) procedure

TIME ESTIMATE: 30 minutes

TASK COMPLEXITY: 0

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: A Sun Fire X4500 HDD (Predictive Failure) needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

Caution: To avoid overheating the server, if the server is powered on, do not leave HDD out for longer than 60 seconds at a time. Remove and replace only one HDD at a time. Replace HDD access cover as soon as the service tasks are completed. Before removing a drive, have the replacement drive ready to be installed.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

1. Remove the drives access cover.

2. Identify the drive to be removed by checking its LEDs. If the middle LED is on (amber), the drive is faulty and should be replaced.

3. Use the operating system or management software to take HDD offline before you replace it.  Not doing so could cause data loss or unexpected error messages.  Instructions for Solaris zfs follow.  For Linux or MS Windows, conffirm with the customer that the disk has been offlined in the OS before hot plug replacement is preformed.  Once the drive has been taken off line, the left (blue) LED should turn on. This means the drive is ready to be removed and service action is allowed.

Caution: Pulling a drive that has that has not been prepared for removal can cause a loss of the drive cell memory map or loss of data in its in/out buffers.

4. Check faulted disk status in zfs

# zpool status POOLNAME
  pool: POOLNAME
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c2t3d0  FAULTED      0     0     0

errors: No known data errors

5. Bring disk cXtYd0 offline

# zpool offline POOLNAME cXtYd0

6. Confirm zfs shows the disk is offline

# zpool status POOLNAME
  pool: POOLNAME
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME      DEGRADED     0     0     0
          raidz1    DEGRADED     0     0     0
            c1t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c2t3d0  OFFLINE      0     0     0

errors: No known data errors

7. Get sataB/C device name

# cfgadm | grep sata | grep disk cXt3Y0
sata2/3::dsk/c2t3d0            disk         connected    configured   ok

8. Unconfigure the disk using sataB/C name

# cfgadm -c unconfigure sataB/C
Unconfigure the device at: /devices/pci@1,0/pci1022,7458@3/pci11ab,11ab@1:3
This operation will suspend activity on the SATA device
Continue (yes/no)? yes

9. Confirm disk is unconfigured and ready for replacement

# cfgadm | grep sata | grep sataB/C
sata2/3                        disk         connected    unconfigured ok

10. The blue OK to Remove (OK2RM) led will now be on.

11. Remove the drive. Lift the metal latch and remove the drive from the drive bay as shown below, or on the service label.

12. Install the new drive. Push the drive into the bay until it stops, and make sure the drive is fully engaged with the connector on the drive backplane.

13. Make sure the metal handle is properly seated.

14. Replace HDD access cover.

15. Reconfigure and check status

# cfgadm -c configure sataB/C

# cfgadm | grep sataB/C
sata2/3::dsk/c2t3d0            disk         connected    configured   ok

16.  Disk may still be offline in zfs

# zpool status POOLNAME
  pool: POOLNAME
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME      DEGRADED     0     0     0
          raidz1    DEGRADED     0     0     0
            c1t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c2t3d0  OFFLINE      0     0     0

errors: No known data errors

17.  Online in zfs

# zpool online POOLNAME cXtYd0
Bringing device c2t3d0 online

18.  Confirm disk is back online

#  zpool status POOLNAME
  pool: POOLNAME
 state: ONLINE
 scrub: resilver completed with 0 errors on Fri Aug 17 07:33:10 2012
config:

        NAME        STATE     READ WRITE CKSUM
        POOLNAME      ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c4t1d0  ONLINE       0     0     0
            c5t2d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0

errors: No known data errors

# exit

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
For hot plug, configure the drive and verify drive availability.
Use appropriate software commands to re-activate/re-sync mirror if manual intervention is required

PARTS NOTE:
Note: Before removing a drive, have the replacement drive ready to be installed.

REFERENCE INFORMATION:

See the section "To replace a hard drive (CRU)" in the

Sun Fire X4500/X4540 Server Service Manual
http://download.oracle.com/docs/cd/E19121-01/sf.x4500/819-4359-19/index.html

"To replace a hard drive (CRU)" section
http://docs.oracle.com/cd/E19121-01/sf.x4500/819-4359-19/CH3-maint.html#50647083_22785

References

<NOTE:1002753.1> - How to Replace a Drive in Solaris[TM] ZFS

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback