How to Replace a VSM6 Mirrored rpool Drive (Internal server drive):ATR:1533969.1:3

Asset ID:	1-71-1533969.1
Update Date:	2018-04-05
Keywords:

Solution Type Technical Instruction Sure

Solution 1533969.1 : How to Replace a VSM6 Mirrored rpool Drive (Internal server drive):ATR:1533969.1:3

Applies to:

StorageTek Virtual Storage Manager System 6 (VSM6) - Version All Versions and later
Oracle Solaris on SPARC (64-bit)

Goal

Field procedure to replace a VSM6 Mirrored rpool Drive (internal server drive).

Solution

DISPATCH INSTRUCTIONS
   WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: VSM6 trained, T4 server, Solaris 11
   TIME ESTIMATE: 90 minutes
   TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
   PROBLEM OVERVIEW: How to replace VSM6 Mirrored rpool Drive.

        1. Connect to server that requires maintenance.
            Refer to VSM6 Installation, Configuration ad Service Guide.
        2. Logon as vsmadm (default password = changeme! on 6.1 and higher password=vsm6admin)
        3. Switch to root user by entering the following:
            # su
            when prompted for password, enter default password of changeme! on 6.1 or higher password is vsm6root

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

VSM6 Mirrored rpool Drive Steps Overview

For complete details on replacing Mirrored rpool Drive, refer to VSM6 Installation, Configuration and Service Guide.

Summary of steps:

Determine which drive is faulted
Replace faulted component
Resilver new drive
Verify mirror has been re-established.
Verify the new device is in the eeprom for boot device.

The example outputs are shortened to keep the size of the document to minimum.
For more detail, if needed, see the above VSM6 Installation, Configuration and Service Guide.

1. Check the zpool status to identify the failed or failing drive.
# zpool status rpool
   Output will be similar to the following:
     pool: rpool
     state: DEGRADED
     status: One or more devices are unavailable in response to persistent errors.

2. Find the device using the format command to identify the disk and which slot top or bottom.
   # sudo bash
   #echo|format|head -10
   Output will be similar to the following:
      Searching for disks...done
      AVAILABLE DISK SELECTIONS:
      0. c0t5000CCA0253657A8d0 <drive not available>

3. Detach the disk from the rpool. This breaks the mirroring.
   # zpool detach rpool < bad disk name >
   For example:
   # zpool detach rpool c0t5000CCA0253657A8d0s0

4. Use cfgadm to locate the disk and then unconfigure it.
    This removes the device links and turns on the blue Blue LED ready to remove light.
   # cfgadm -al|head -20

   Output will be similar to the following:
      .
      .
      c3 scsi-sas connected configured unknown
      c3::w5000cca0253657a9,0 disk-path connected configured unknown
      .
      .

   # cfgadm -c unconfigure c3::w5000cca0253657a9,0
     NOTE: Now the drive's Blue LED should be on and you can swap it with a new one.

5. Physically remove (do NOT replace yet) the suspect drive.

6. List the server hard drives (only one drive should report at this time).
echo|format|head -10

Example output:
root@vsmpriv2:/home/vsmadm# echo|format|head -10

Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c0t5000CCA0254634DCd0 <HITACHI-H106030SDSUN300G-A2B0 cyl 46873 alt 2 hd 20 sec 625>
   /scsi_vhci/disk@g5000cca0254634dc
   /dev/chassis//SYS/SASBP/HDD0/disk
1. c0t5000CCA012B754FCd0 <drive not available>
   /scsi_vhci/disk@g5000cca012b754fc
   /dev/chassis//SYS/SASBP/HDD1/disk

Note: The unplugged hard drive shows as <drive not available>

7. Capture the formatting details of the working (good) hard drive into a file that may be used later if the replacement drive formatting is not correct:
prtvtoc /dev/rdsk/XXXXXXXXXXXXXXXXXXXXXs2 > /var/tmp/disk_vtoc1
where XXXXXXXXXXXXXXXXXXXX = the disk ID of the working (good) disk.

In this example the working disk is: c0t5000CCA0254634DCd0
Please note the need to add ‘s2’ (slice two) to the disk ID value.

Example: prtvtoc /dev/rdsk/c0t5000CCA0254634DCd0s2 > /var/tmp/disk_vtoc1

8. Physically plug in the replacement drive.

9. Run the following script to rebuild the device links for the new disk:
# /opt/vsm/bin/reset_cluster.pl

Output will be similar to the following:
devfsadm -C
/usr/cluster/bin/cldev clear
Updating shared devices on node 1
Updating shared devices on node 2
devfsadm
/usr/cluster/bin/cldev populate
Configuring DID devices
did instance 50 created.
did subpath vsmpriv1:/dev/rdsk/c0t5000CCA03C437E04d0 created
for instance 50.
Configuring the /dev/global directory (global devices)
obtaining access to all attached disks
/usr/cluster/bin/cldev repair
/usr/cluster/bin/cldev refresh

10. Run the following script to set up the disk and mirror it back to the current running boot disk.
# /opt/vsm/bin/vsm6mirror_setup

Output will be similar to the following:
pool: rpool
state: ONLINE
.
.
scan: resilvered 96.6G in 0h17m with 0 errors on Fri Oct 4 11:44:13 2013
.
.
Drives in rpool ...
c0t5000CCA03C42F3BCd0s0
c0t5000CCA03C42F3BCd0s0
Drive candidate: c0t5000CCA03C437E04d0
Drive c0t5000CCA03C437E04d0 available for mirroring...
Drive candidate: c0t5000CCA03C42F3BCd0
.
.
c0t5000CCA03C42F3BCd0s0 ONLINE 0 0 0
c0t5000CCA03C437E04d0s0 ONLINE 0 0 0
errors: No known data errors
Resilver completed. Executing installboot command...

NOTE:
See the TROUBLESHOOTING portion of these instructions if the script fails because the disk is not formatted correctly.
This may show up with a failure message similar to the following (will have different drive ID values):
cannot attach c0t5000CCA06E05B16Cd0s0 to c0t5000CCA03C6F75C4d0s0: device is too small

2nd NOTE:
This script resilvers the new drive and can take over 30 minutes. An error occurs if the disk you put in is unsupported and not recognized.
If you have performed the steps provided in the TROUBLESHOOTING portion of these instructions and are still getting the same failure, remove the unsupported disk and install a supported disk.
Then start again with the reset_cluster.pl script at the beginning.

11. Verify the correct devices are now listed in Open Boot Prom boot-device variable.
The output should correlate with what you see in the zpool status rpool command.
# eeprom boot-device

TROUBLESHOOTING:

If the replacement drive is not formatted correctly the script will not be able to attach the disk to the rpool.
This failure message will look similar to the following (but with different drive ID information):
cannot attach c0t5000CCA06E05B16Cd0s0 to c0t5000CCA03C6F75C4d0s0: device is too small

To resolve this type of problem perform the following steps:

1. List the drives now in the server:
echo|format|head -10

Expect to see something like the following:
root@vsmpriv2:/home/vsmadm# echo|format|head -10
Searching for disks...done

AVAILABLE DISK SELECTIONS:
0. c0t5000CCA0254634DCd0 <HITACHI-H106030SDSUN300G-A2B0 cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca0254634dc
/dev/chassis//SYS/SASBP/HDD0/disk
1. c0t5000CCA03C42F3BCd0 <HITACHI-H106030SDSUN300G-A2B0 cyl 46873 alt 2 hd 20 sec 625>
/scsi_vhci/disk@g5000cca03c42f3bc
/dev/chassis//SYS/SASBP/HDD1/disk

2. Identify the ID of the new drive in the output from the command above.

NOTE:
You can scroll up in your putty session to the point when the drives were listed with only one of the drives plugged in.
The new drive will be the one in the location that previously reported as: <drive not available>

3. Format the new drive using the formatting information that was captured previously.

fmthard -s /var/tmp/disk_vtoc1 > /dev/rdsk/XXXXXXXXXXXXXXXXXXXXXs2
where XXXXXXXXXXXXXXXXXXXX = the disk ID of the new disk.
Note the requirement to again add ‘s2’ (slice two) to the end of the new disk ID.

Example: fmthard -s /var/tmp/disk_vtoc1 > /dev/rdsk/c0t5000CCA03C42F3BCd0s2

4. Return to step 10 of the original replacement instructions above, repeat that step and continue.

Attachments

This solution has no attachment