![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1011391.1 : Replacing a drive on an Sun Fire[TM] X4500 that has not been explicitly failed by ZFS
PreviouslyPublishedAs 215625 To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Sun x86 Systems Applies to:Sun Fire X4500 Server - Version Not Applicable and laterSolaris Operating System - Version 10 3/05 HW1 and later Sun Fire X4540 Server - Version Not Applicable and later Oracle Solaris on x86-64 (64-bit) All Platforms SymptomsThere are instances when an Sun Fire[TM] X4500's drive firmware SMART (Self-Monitoring Analysis and Reporting Technology) predictively fails out a disk and reports it to fmadm. Running cfgadm -c unconfigure is how the service manual recommends replacing the drive. However, since the drive was still healthy according to ZFS, the command will fail with the following: root@th12 # cfgadm -c unconfigure sata1/7::dsk/c1t7d0
Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:7 This operation will suspend activity on the SATA device Continue (yes/no) yes cfgadm: Hardware specific failure: Failed to unconfig device at ap_id: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:7
For patched Solaris 10, ZFS now recognizes the predictive failure reported via SMART and faults the disk automatically. The fix is in the Solaris 10 patch for <Bug: 15662359> activate ZPOOL_CONFIG_FRU in ON Patches: 149637-04 - x86 Solaris 11.x has already implemented the feature. ChangesN/A Cause.N/A SolutionSince the drive was still healthy according to ZFS it needs to be offlined, cfgadm unconfigured, physically replaced, cfgadm configured, and finally zpool replaced 1. Prior to replacing the drive, cfgadm -alv , will show the following output:root@th12 # cfgadm -alv
Ap_Id Receptacle Occupant Condition Information When Type Busy Phys_Id sata0/0::dsk/c0t0d0 connected configured ok Mod: HITACHI HDS7250SASUN500G 0627K7KP8F FRev: K2AOAJ0A SN: KRVN67ZAJ7KP8F unavailable disk n /devices/pci@0,0/pci1022,7458@1/pci11ab,11ab@1:0 sata0/1::dsk/c0t1d0 connected configured ok Mod: HITACHI HDS7250SASUN500G 0628KB06EF FRev: K2AOAJ0A SN: KRVN65ZAJB06EF unavailable disk n /devices/pci@0,0/pci1022,7458@1/pci11ab,11ab@1:1 (output ommitted) sata1/7::dsk/c1t7d0 connected configured ok Mod: HITACHI HDS7250SASUN500G 0628K8RH1D FRev: K2AOAJ0A SN: KRVN63ZAJ8RH1D unavailable disk n /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:7
2. fmadm and fmdump will show the drives as faulty:root@th12 # fmadm faulty
STATE RESOURCE / UUID -------- ---------------------------------------------------------------------- degraded hc:///:serial=KRVN63ZAJ8RH1D/component=sata1/7 665c1b1a-7405-6f8a-adc5-be4e32dc9232 -------- ---------------------------------------------------------------------- root@th12 # fmdump TIME UUID SUNW-MSG-ID Dec 01 00:23:09.5984 665c1b1a-7405-6f8a-adc5-be4e32dc9232 DISK-8000-0X root@th12 # fmdump -v TIME UUID SUNW-MSG-ID Dec 01 00:23:09.5984 665c1b1a-7405-6f8a-adc5-be4e32dc9232 DISK-8000-0X 100% fault.io.disk.predictive-failure Problem in: hc:///:serial=KRVN63ZAJ8RH1D:part=HITACHI-HDS7250SASUN500G-628K8RH1D:revision=K2AOAJ0A/motherboard=0/hostbridge=0/ pcibus=0/pcidev=2/pcifn=0/pcibus=2/pcidev=1/pcifn=0/sata-port=7/disk=0 Affects: hc:///:serial=KRVN63ZAJ8RH1D/component=sata1/7 FRU: hc:///component=HD_ID_45 3. Format will show the following:root@th12 # format
Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 /pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@0,0 1. c0t1d0 /pci@0,0/pci1022,7458@1/pci11ab,11ab@1/disk@1,0 (output ommitted) 14. c1t6d0 /pci@0,0/pci1022,7458@2/pci11ab,11ab@1/disk@6,0
15. c1t7d0 /pci@0,0/pci1022,7458@2/pci11ab,11ab@1/disk@7,0 Specify disk (enter its number): 15
selecting c1t7d0
[disk formatted]
/dev/dsk/c1t7d0s0 is part of active ZFS pool zpool1. Please see zpool(1M). FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk fdisk - run the fdisk program repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels inquiry - show vendor, product and revision volname - set 8-character volume name format> p
PARTITION MENU:
0 - change `0' partition 1 - change `1' partition 2 - change `2' partition 3 - change `3' partition 4 - change `4' partition 5 - change `5' partition 6 - change `6' partition select - select a predefined table modify - modify a predefined partition table name - name the current table print - display the current table label - write partition map and label to the disk partition> p
Current partition table (original): 4. The zfs commands zpool will show that the pool is healthy and online:root@th12 # zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT zpool1 20.8T 1.14M 20.8T 0% ONLINE - root@th12 # zpool status zpool1
pool: zpool1 state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM zpool1 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 errors: No known data errors 5. In order to replace the drive, you need to offline the drive in zfs:root@th12 # zpool offline zpool1 c1t7d0 6. The zpool status command will show the following after the drive has been offlined.root@th12 # zpool status zpool1
pool: zpool1 state: DEGRADED status: One or more devices has been taken offline by the adminstrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scrub: none requested config: NAME STATE READ WRITE CKSUM zpool1 DEGRADED 0 0 0 raidz ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 OFFLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 errors: No known data errors 7. Now that the drive has been offlined from the zfs pool, it can be removed from dynamically reconfigured from OS control by running the following command:root@th12 # cfgadm -c unconfigure sata1/7::dsk/c1t7d0
Unconfigure the device at: /devices/pci@0,0/pci1022,7458@2/pci11ab,11ab@1:7 This operation will suspend activity on the SATA device Continue (yes/no) yes Dec 5 14:20:02 th12 sata: NOTICE: /pci@0,0/pci1022,7458@2/pci11ab,11ab@1: 8. Notice that the drive no longer shows up in cfgadmroot@th12 # cfgadm -al | grep t7 9. At this point the drive is safe to remove. You should see the drive's blue light lit up indicating that it is safe to remove it. Physically Replace Drive.10. Once the drive has been physically replaced, you can configure it back into OS control by running the following command:root@th12 # cfgadm -c configure sata1/7::dsk/c1t7d0 11. Notice that cfgadm now shows the drive again:root@th12 # cfgadm -al | grep t7 sata0/7::dsk/c0t7d0 disk connected configured ok 12. You can now put the drive into zfs control by running the below substituting your drive's c#t#d# for c1t7d0 in the example below.root@th12 # zpool replace zpool1 c1t7d0 c1t7d0 13. The pool is now healthy again:root@th12 # zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT zpool1 20.8T 1.38M 20.8T 0% ONLINE - Notice the message that in the zpool status command below under scrub: root@th12 # zpool status zpool1
pool: zpool1 state: ONLINE scrub: resilver completed with 0 errors on Tue Dec 5 14:22:46 2006 config: NAME STATE READ WRITE CKSUM zpool1 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t0d0 ONLINE 0 0 0 c1t0d0 ONLINE 0 0 0 c4t0d0 ONLINE 0 0 0 c6t0d0 ONLINE 0 0 0 c7t0d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t1d0 ONLINE 0 0 0 c1t1d0 ONLINE 0 0 0 c4t1d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c7t1d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t2d0 ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c4t2d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 c7t2d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t3d0 ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c4t3d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t4d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 c4t4d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c7t4d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t5d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 c4t5d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c7t5d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t6d0 ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c4t6d0 ONLINE 0 0 0 c5t6d0 ONLINE 0 0 0 c6t6d0 ONLINE 0 0 0 c7t6d0 ONLINE 0 0 0 raidz ONLINE 0 0 0 c0t7d0 ONLINE 0 0 0 c1t7d0 ONLINE 0 0 0 c4t7d0 ONLINE 0 0 0 c5t7d0 ONLINE 0 0 0 c6t7d0 ONLINE 0 0 0 c7t7d0 ONLINE 0 0 0 errors: No known data errors 14. Use the fmadm command to repair the status of the drive in the fault management service:root@th12 # fmadm repair 665c1b1a-7405-6f8a-adc5-be4e32dc9232 Previously Published As 88150 Attachments This solution has no attachment |
||||||||||||
|