Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1492368.1
Update Date:2015-03-25
Keywords:

Solution Type  Technical Instruction Sure

Solution  1492368.1 :   SuperCluster T4-4 and T5-8 Disk Replacement Guide  


Related Items
  • SPARC SuperCluster T4-4
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
  •  
  • _Old GCS Categories>ST>Server>Engineered Systems>SPARC SuperCluster>Solaris 11 OS
  •  
  • _Old GCS Categories>ST>Server>Engineered Systems>SPARC SuperCluster>Solaris 10 OS
  •  


Replacement of internal hard drives in SPARC SuperCluster T4-4

In this Document
Goal
Solution
 1) Confirm zpool has faulted device:
 2) Detach the failed drive form zpool:
 3) Determine Primary LDOM location:
 4) Check if vdisk device exists on this drive from PRIMARY LDOM:
 5) Turn on locator beacon from PRIMARY LDOM:
 6) Determine the LDOM using the vdisk:
 7) Log into Logical domain and detach disk:
 8) Remove the vdisk and vdisk service device:
 9) Find device path of drive to use in cfgadm you 
 10) Unconfigure failed drive:
 11) Remove failed drive and insert replacement
 12) Configure in new device:
 13) Update new disk Table of contents from existing zpool member. Ensure you use slice 2  in this command:
 14) Attach new disk slice 0 to the rpool:
 15) Confirm zpool resilvered and drive status is online:
 16) Clear Fault Management:
  17) If LDOM is running Solaris 10 and pool is the root pool such as "rpool" or "BIpool" install bootblock on new slice:
  18) Update boot-device with WWN from new drive
 19) Reconfigure virtual disk:
 20) Connect to Guest LDOM  and reattach disk to zpool, mark vdev as repaired and reinstall bootblock
 21) Turn off locator beacon from PRIMARY LDOM:
 22) Example workflow of 3 LDOM config and the loss of HDD5


Applies to:

SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)

Goal

Steps to replace internal hard drive in SuperCluster configurations 

Please note as other consideration are discovered in various LDom configurations we will be updating this note. So please do not rely on a strict hard copy of it.

 

Solution

1) Confirm zpool has faulted device:

As seen from `zpool status` in LDOM where failure is being reported

root@orlscdb02:~# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: resilvered 86.5K in 0h0m with 0 errors on Tue Apr 10 10:50:08 2012
config:

        NAME                         STATE     READ WRITE CKSUM
        u01-pool                    DEGRADED     0     0     0
          mirror-0                   DEGRADED     0     0     0
            c0t5000CCA0125099FCd0s0  UNAVAIL      0     0     0  cannot open
            c0t5000CCA0124C1244d0s0  ONLINE       0     0     0

 Drive showing unavailable in `format`

 format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t5000CCA0123FE614d0 <SUN600G cyl 64986 alt 2 hd 27 sec 668>  solaris
          /scsi_vhci/disk@g5000cca0123fe614
          /dev/chassis//SYS/MB/HDD0/disk
       1. c0t5000CCA012507088d0 <SUN600G cyl 64986 alt 2 hd 27 sec 668>  solaris
          /scsi_vhci/disk@g5000cca012507088
          /dev/chassis//SYS/MB/HDD1/disk
       2. c0t5000CCA0124C1244d0 <SUN600G cyl 64986 alt 2 hd 27 sec 668>  solaris
          /scsi_vhci/disk@g5000cca0124c1244
          /dev/chassis//SYS/MB/HDD2/disk
       3. c0t5000CCA0125099FCd0 <drive not available>   <-- device unavailable
          /scsi_vhci/disk@g5000cca0125099fc
          /dev/chassis//SYS/MB/HDD3/disk
       4. c0t5000C5003BE98F4Bd0 <SUN600G cyl 64986 alt 2 hd 27 sec 668>  solaris
          /scsi_vhci/disk@g5000c5003be98f4b
       5. c0t600144F0A6B3046D00005033D9520001d0 <SUN-ZFS Storage 7320-1.0 cyl 19501 alt 2 hd 254 sec 254>
          /scsi_vhci/ssd@g600144f0a6b3046d00005033d9520001
Specify disk (enter its number):

2) Detach the failed drive form zpool:

Remove drive from zpool in LDOM where failure is being reported

 zpool detach rpool c0t5000CCA0125099FCd0s0

3) Determine Primary LDOM location:

virtinfo -a
Domain role: LDoms guest I/O service root
Domain name: ssccn1-app2
Domain UUID: cac06c45-fb37-e0e7-d3cc-8e988be3f16b
Control domain: orlscdb01   <-- PRIMARY LDOM
Chassis serial#: 1139BDY8C1


4) Check if vdisk device exists on this drive from PRIMARY LDOM:

In the example bellow we see that slice 1 of the same drive is also being used al a virtual disk device

ldm list-services | grep 5000CCA0125099FC
                                    vdisk2                                         /dev/dsk/c0t5000CCA0125099FCd0s1

5) Turn on locator beacon from PRIMARY LDOM:

                  

/opt/ipmitool/bin/ipmitool sunoem cli "set /SYS/LOCATE value=fast_blink"

 

If no vdisk device exists skip to step 9 section 

 

6) Determine the LDOM using the vdisk:

root@orlscdb01:~# ldm list -o disk
NAME
primary

VDS
    NAME             VOLUME         OPTIONS          MPGROUP        DEVICE
    primary-vds0     vol1                                           /dev/dsk/c0t5000C5003BE98F4Bd0s1

------------------------------------------------------------------------------
NAME
ssccn1-app1

DISK
    NAME             VOLUME                      TOUT ID   DEVICE  SERVER         MPGROUP
    vdisk1           vol1@primary-vds0                0    disk@0  primary
    vdisk2           vol1@service-vds0                1    disk@1  ssccn1-app2

------------------------------------------------------------------------------
NAME
ssccn1-app2

VDS
    NAME             VOLUME         OPTIONS          MPGROUP        DEVICE
    service-vds0     vol1                                           /dev/dsk/c0t5000CCA0125099FCds1

7) Log into Logical domain and detach disk:

root@orlscdb01:~# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 1.33M in 0h0m with 0 errors on Mon Sep 17 15:32:09 2012
config:

        NAME                                         STATE     READ WRITE CKSUM
        rpool                                         DEGRADED       0     0     0
          mirror-0                                   DEGRADED       0     0     0
            c2d0s0                                   ONLINE         0     0     0 
            c2d1s0                                   UNAVAIL           0     0     0 cannot open

orlsccldm01:~# zpool detach rpool c2t1s0

8) Remove the vdisk and vdisk service device:

ldm remove-vdisk vdisk2 ssccn1-app1

ldm remove-vdsdev vol1@servivce-vds0

9) Find device path of drive to use in cfgadm you 

cfgadm -als "match=partial,select=class(scsi):ap_id(c):type(disk)" -v                                   Ap_Id                          Receptacle   Occupant     Condition  Information
When         Type         Busy     Phys_Id
c3::w5000c5003be90f9d,0        connected    configured   unknown    Client Device: /dev/dsk/c0t5000C5003BE90F9Fd0s0(sd4)
unavailable  disk-path    n        /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@1:scsi::w5000c5003be90f9d,0
c4::w5000c5003bea1185,0        connected    configured   unknown    Client Device: /dev/dsk/c0t5000C5003BEA1187d0s0(sd5)
unavailable  disk-path    n        /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@2:scsi::w5000c5003bea1185,0
c5::w5000c5003be9f9ed,0        connected    configured   unknown    Client Device: /dev/dsk/c0t5000C5003BE9F9EFd0s0(sd6)
unavailable  disk-path    n        /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@4:scsi::w5000c5003be9f9ed,0
c6::w5000cca0125099fd,0        connected    configured   unknown    Client Device: /dev/dsk/c0t5000CCA0125099FCds0(sd7)
unavailable  disk-path    n        /devices/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/iport@8:scsi::w5000cca0125099fd,0

 

10) Unconfigure failed drive:

cfgadm -c unconfigure  c6::w5000cca0125099fd,0

11) Remove failed drive and insert replacement

12) Configure in new device:

 

root@orlscdb02:~# cfgadm -als "match=partial,select=class(scsi):ap_id(c):type(disk)"
Ap_Id                          Type         Receptacle   Occupant     Condition
c3::w5000cca0123fe615,0        disk-path    connected    configured   unknown
c4::w5000cca012507089,0        disk-path    connected    configured   unknown
c5::w5000cca0124c1245,0        disk-path    connected    configured   unknown
c6::w5000c5003be98f49,0        disk-path    connected    unconfigured   unknown

root@orlscdb02:~# cfgadm -c configure c6::w5000c5003be98f49,0

root@orlscdb02:~# cfgadm -als "match=partial,select=class(scsi):ap_id(c):type(disk)"
Ap_Id                          Type         Receptacle   Occupant     Condition
c3::w5000cca0123fe615,0        disk-path    connected    configured   unknown
c4::w5000cca012507089,0        disk-path    connected    configured   unknown
c5::w5000cca0124c1245,0        disk-path    connected    configured   unknown
c6::w5000c5003be98f49,0        disk-path    connected    configured   unknown

13) Update new disk Table of contents from existing zpool member. Ensure you use slice 2  in this command:

root@orlscdb02:~# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: resilvered 86.5K in 0h0m with 0 errors on Tue Apr 10 10:50:08 2012
config:

        NAME                         STATE     READ WRITE CKSUM
        rpool                    DEGRADED     0     0     0
          mirror-0                   DEGRADED     0     0     0
            c0t5000CCA0124C1244d0s0  ONLINE       0     0     0

 

12a) format -L vtoc -d c0t5000C5003BE98F4Bd0

12b) prtvtoc /dev/rdsk/c0t5000CCA0124C1244d0s2 | fmthard -s - /dev/rdsk/c0t5000C5003BE98F4Bd0s2

14) Attach new disk slice 0 to the rpool:

  zpool attach rpool c0t5000CCA0124C1244d0s0 c0t5000C5003BE98F4Bd0s0

15) Confirm zpool resilvered and drive status is online:

 

root@orlscdb02:/var/tmp# zpool status u01-pool
  pool: rpool
 state: ONLINE
  scan: resilvered 4.51G in 0h2m with 0 errors on Mon Sep 17 17:41:47 2012
config:

        NAME                         STATE     READ WRITE CKSUM
        rpool                    ONLINE       0     0     0
          mirror-0                   ONLINE       0     0     0
            c0t5000CCA0124C1244d0s0  ONLINE       0     0     0
            c0t5000C5003BE98F4Bd0s0  ONLINE       0     0     0

errors: No known data errors

16) Clear Fault Management:

 

TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Sep 17 17:11:26 898c4ff2-a2f3-c10e-e320-dfa2bea031f2  ZFS-8000-FD    Major

Host        : orlscdb01
Platform    : ORCL,SPARC-T4-4   Chassis_id  :
Product_sn  :

Fault class : fault.fs.zfs.vdev.io
Affects     : zfs://pool=data_pool/vdev=6cd04585cf229e6
                  faulted and taken out of service
FRU         : "/SYS/MB/HDD3" (hc://:product-id=ORCL,SPARC-T4-4:product-sn=1139BDY8C1:server-id=orlscdb01:chassis-id=1139BDY8C1:serial=001129P17DWE--------6XR17DWE:devid=id1,sd@n5000c5003be98f4b:part=SEAGATE-ST960005SSUN600G:revision=0606/chassis=0/motherboard=0/hba=0/bay=3/disk=0)
                  faulty

Description : The number of I/O errors associated with a ZFS device exceeded
                     acceptable levels.  Refer to http://sun.com/msg/ZFS-8000-FD
              for more information.

Response    : The device has been offlined and marked as faulted.  An attempt
                     will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.

 

fmadm repaired /SYS/MB/HDD3

 17) If LDOM is running Solaris 10 and pool is the root pool such as "rpool" or "BIpool" install bootblock on new slice:

installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t5000C5003BE98F4Bd0s0

 18) Update boot-device with WWN from new drive

 

#eeprom boot-device=
 "/pci@400/pci@1/pci@0/pci@0/LSI,sas@0/disk@w5000ccA0124c1244,0:a
  /pci@400/pci@1/pci@0/pci@0/LSI,sas@0/disk@w5000c5003be98F4b,0:a disk net"

 

If a Vdisk was present on the replaced device and step 6 was followed, you must now recreate the vdisk and add it back to the guest LDOM

 

19) Reconfigure virtual disk:

   ldm add-vdsdev /dev/dsk/c0t5000C5003BE98F4Bd0s1 vol1@service-vds0

   ldm add-vdisk timeout=1 id=0 vdisk2 vol1@service-vds0 ssccn1-app1

 

20) Connect to Guest LDOM  and reattach disk to zpool, mark vdev as repaired and reinstall bootblock

zpool attach rpool c2d0s0 c2d1s0

orlscclldm01:~# zpool status
  pool: rpool
 state: ONLINE
  scan: resilvered 1.33M in 0h0m with 0 errors on Mon Sep 17 15:32:09 2012
config:

        NAME                                         STATE     READ WRITE CKSUM
        rpool                                        ONLINE       0     0     0
          mirror-0                                   ONLINE       0     0     0
            c2d0s0                                   ONLINE       0     0     0  
            c2d1s0                                   ONLINE       0     0     0

fmadm repaired zfs://pool=rpool/vdev=2cd55479f321bdae

installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c2d2s0

21) Turn off locator beacon from PRIMARY LDOM:

/opt/ipmitool/bin/ipmitool sunoem cli "set /SYS/LOCATE value=off"

 

22) Example workflow of 3 LDOM config and the loss of HDD5

PRIMARY APP1 APP2
(4) Find vdiskdevice (7) log into to LDOM and detach zpool (1) Confirm failure
(5) Find vdisk / LDOM   (2) Zpool Detach
(6) LDM list to find console port   (3) Find Primary Domain
(8) Remove vdisk   (10) Find disk WWN
(9) Remove vdiskdev   (11) cfgadm unconfigure bad disk
(17) Create new vdiskdv   (12) cfgadm configure new disk
(18) Add vdisk to LDOM   (13) update VTOC with fmthard
  (19) zpool attach (14) zpool attach
  (20) update boot block if sol 10 (15) Clear fault with fmadm
  (21) Clear fault with fmadm (16) update boot device with eeprom

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback