![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Problem Resolution Sure Solution 2077709.1 : Oracle ZFS Storage Appliance: HGST 900GB (A600) ASC/ASCQ 44/0b cause 'fault.io.scsi.cmd.disk.dev.rqs.derr' faults
In this Document
Applies to:Sun ZFS Storage 7120 - Version All Versions and laterSun ZFS Storage 7320 - Version All Versions and later Oracle ZFS Storage ZS3-2 - Version All Versions and later Sun ZFS Storage 7420 - Version All Versions and later Oracle ZFS Storage ZS3-4 - Version All Versions and later 7000 Appliance OS (Fishworks) SymptomsFMA events: --------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Aug 13 10:53:12 5af9817e-1552-ecd9-f396-bb9368f5f346 DISK-8000-3E Critical Fault class : fault.io.scsi.cmd.disk.dev.rqs.derr Certainty : 100% Affects : dev:///:devid=id1,sd@n5000cca043aa5380//scsi_vhci/disk@g5000cca043aa5380 Status : faulted and taken out of service FRU Location : "DISK 0" Aug 13 14:56:34 14d0d208-a638-6c8d-b66a-b4057f7c5979 DISK-8000-3E Critical Fault class : fault.io.scsi.cmd.disk.dev.rqs.derr Certainty : 100% Affects : dev:///:devid=id1,sd@n5000cca043aa4690//scsi_vhci/disk@g5000cca043aa4690 Status : faulted and taken out of service FRU Location : "DISK 2" Aug 13 15:00:38 445b22c6-8ace-e176-80e0-bb8814e080ca ZFS-8000-8A Critical Name : "zfs://pool=21717533f56d1783/pool_name=Nonprod_FS_Pool" Status : faulty Description : A file or directory in pool 'Nonprod_FS_Pool' could not be read due to corrupt data.
alert.ak.txt: Thu Aug 13 10:53:59 2015
nvlist version: 0 class = alert.fs.zfs.pool.spare.activated source = svc:/appliance/kit/akd:default zpool_name = Nonprod_FS_Pool zpool_guid = 2409836141641275267 link = label = 1416NM4022/HDD 5 uuid = 078cb9f9-6a8b-6575-b7df-bfe6bd636d2b Thu Aug 13 10:54:00 2015 nvlist version: 0 class = alert.fs.zfs.pool.resilver.start source = svc:/appliance/kit/akd:default zpool_name = Nonprod_FS_Pool zpool_guid = 2409836141641275267 link = uuid = 9e1bd787-996c-6f77-fc44-8c31269651b5 Thu Aug 13 14:56:34 2015 nvlist version: 0 version = 0x0 class = list.suspect code = DISK-8000-3E diag-time = 1439477794 297194 de = (embedded nvlist) nvlist version: 0 version = 0x1 scheme = fmd authority = (embedded nvlist) nvlist version: 0 version = 0x1 system-mfg = unknown system-name = unknown system-part = unknown system-serial = unknown sys-comp-mfg = Oracle-Corporation sys-comp-name = SUN-FIRE-X4470-M2-SERVER sys-comp-part = 32656808+69+1 sys-comp-serial = 1416NMJ00N server-name = SOMZSPPRD001 host-id = 00000000 (end authority) mod-name = eft mod-version = 1.16 (end de) fault-list-sz = 0x1 __case_state = 0x1 topo-uuid = 5e6077f8-2593-ed01-e92e-bfd7f08a2eec fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = fault.io.scsi.cmd.disk.dev.rqs.derr certainty = 0x64 resource = (embedded nvlist) nvlist version: 0 version = 0x1 scheme = hc hc-root = fru-serial = 001406A0N0ER--------KVK0N0ER devid = id1,sd@n5000cca043aa4690 fru-part = HITACHI-H109090SESUN900G fru-revision = A600 authority = (embedded nvlist) nvlist version: 0 chassis-mfg = Oracle-Corporation chassis-name = ORACLE-DE2-24P chassis-part = 32656808+73+1 chassis-serial = 1416NM4022 (end authority) hc-list-sz = 0x3 hc-list = (array of embedded nvlists) (start hc-list[0]) nvlist version: 0 hc-name = ses-enclosure hc-id = 0 (end hc-list[0]) (start hc-list[1]) nvlist version: 0 hc-name = bay hc-id = 2 (end hc-list[1]) (start hc-list[2]) nvlist version: 0 hc-name = disk hc-id = 0 (end hc-list[2]) hc-specific = (embedded nvlist) nvlist version: 0 ascq = 0xb ######## asc = 0x44 ######## ena = 0xb9ffad2e42a11c09 ######## key = 0x4 ######## (end hc-specific) (end resource) asru = (embedded nvlist) nvlist version: 0 scheme = dev version = 0x0 device-path = /scsi_vhci/disk@g5000cca043aa4690 devid = id1,sd@n5000cca043aa4690 (end asru) fru = (embedded nvlist) nvlist version: 0 version = 0x1 scheme = hc hc-root = fru-part = HITACHI-H109090SESUN900G fru-revision = A600 authority = (embedded nvlist) nvlist version: 0 chassis-mfg = Oracle-Corporation chassis-name = ORACLE-DE2-24P chassis-part = 32656808+73+1 chassis-serial = 1416NM4022 (end authority) hc-list = (array of embedded nvlists) (start hc-list[0]) nvlist version: 0 hc-name = ses-enclosure hc-id = 0 (end hc-list[0]) (start hc-list[1]) nvlist version: 0 hc-name = bay hc-id = 2 (end hc-list[1]) (start hc-list[2]) nvlist version: 0 hc-name = disk hc-id = 0 (end hc-list[2]) hc-list-sz = 0x3 devid = id1,sd@n5000cca043aa4690 fru-serial = 001406A0N0ER--------KVK0N0ER (end fru) ident_node = (embedded nvlist) nvlist version: 0 version = 0x1 scheme = hc hc-root = fru-serial = 001406A0N0ER--------KVK0N0ER fru-part = HITACHI-H109090SESUN900G fru-revision = A600 authority = (embedded nvlist) nvlist version: 0 chassis-mfg = Oracle-Corporation chassis-name = ORACLE-DE2-24P chassis-part = 32656808+73+1 chassis-serial = 1416NM4022 (end authority) hc-list = (array of embedded nvlists) (start hc-list[0]) nvlist version: 0 hc-name = ses-enclosure hc-id = 0 (end hc-list[0]) (start hc-list[1]) nvlist version: 0 hc-name = bay hc-id = 2 (end hc-list[1]) (start hc-list[2]) nvlist version: 0 hc-name = disk hc-id = 0 (end hc-list[2]) hc-list-sz = 0x3 devid = id1,sd@n5000cca043aa4690 (end ident_node) location = 1416NM4022/HDD 2 (end fault-list[0]) fault-status = 0x1 severity = Critical source = appliance/kit/akd:default uuid = 14d0d208-a638-6c8d-b66a-b4057f7c5979 link = Thu Aug 13 15:00:38 2015 nvlist version: 0 version = 0x0 class = list.suspect code = ZFS-8000-8A diag-time = 1439478038 544780 de = (embedded nvlist) nvlist version: 0 version = 0x1 scheme = fmd authority = (embedded nvlist) nvlist version: 0 version = 0x1 system-mfg = unknown system-name = unknown system-part = unknown system-serial = unknown sys-comp-mfg = Oracle-Corporation sys-comp-name = SUN-FIRE-X4470-M2-SERVER sys-comp-part = 32656808+69+1 sys-comp-serial = 1416NMJ00N server-name = SOMZSPPRD001 host-id = 00000000 (end authority) mod-name = zfs-diagnosis mod-version = 1.0 (end de) fault-list-sz = 0x1 __case_state = 0x1 topo-uuid = 5e6077f8-2593-ed01-e92e-bfd7f08a2eec fault-list = (array of embedded nvlists) (start fault-list[0]) nvlist version: 0 version = 0x0 class = fault.fs.zfs.object.corrupt_data certainty = 0x64 asru = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x21717533f56d1783 pool_name = Nonprod_FS_Pool (end asru) fru = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x21717533f56d1783 pool_name = Nonprod_FS_Pool (end fru) resource = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = zfs pool = 0x21717533f56d1783 pool_name = Nonprod_FS_Pool (end resource) (end fault-list[0]) Fri Aug 14 03:16:57 2015 nvlist version: 0 class = alert.ak.xmlrpc.hardware.disk.removed source = svc:/appliance/kit/akd:default chassis_uuid = c4a8de40-3e3d-4e90-a8de-c31c5ab52ca6 chassis_label = 1416NM4022 fru = hc://:chassis-serial=1416NM4022/ses-enclosure=0/bay=0/disk=0 fru_label = HDD 0 uuid = cb043694-6583-ccae-a9df-ae1b4867b75d link = Fri Aug 14 14:42:48 2015 nvlist version: 0 class = alert.fs.zfs.pool.resilver.finish source = svc:/appliance/kit/akd:default zpool_name = Nonprod_FS_Pool zpool_guid = 2409836141641275267 link = 9e1bd787-996c-6f77-fc44-8c31269651b5 uuid = 72f82deb-2dc8-6ba1-c804-90d1ce36c6bd Fri Aug 14 17:09:30 2015 nvlist version: 0 class = alert.ak.xmlrpc.hardware.disk.added source = svc:/appliance/kit/akd:default chassis_uuid = c4a8de40-3e3d-4e90-a8de-c31c5ab52ca6 chassis_label = 1416NM4022 fru = hc://:chassis-serial=1416NM4022/ses-enclosure=0/bay=0/disk=0 fru_label = HDD 0 uuid = ddf7539c-9ad3-ea1c-aae4-9ad7531e4492 link = Fri Aug 14 17:09:38 2015 nvlist version: 0 class = alert.fs.zfs.pool.resilver.start source = svc:/appliance/kit/akd:default zpool_name = Nonprod_FS_Pool zpool_guid = 2409836141641275267 link = uuid = bab11faf-053b-4032-ad7b-e2b5fda77cc0
NOTE: The indicative symptoms for this issue are:
class = fault.io.scsi.cmd.disk.dev.rqs.derr and ascq = 0xb asc = 0x44
Storage-TSC analysis: NAS:<bundle>/hw$ grep fault hw.aksh
chassis-001 1416NM4022 faulted Oracle Oracle Storage DE2-24P 1416NM4022 10000 disk-000 HDD 0 faulted HITACHI H109090SESUN900G 001406A0NW4R KVK0NW4R 10000 disk-002 HDD 2 faulted HITACHI H109090SESUN900G 001406A0N0ER KVK0NOER 10000
Aug 13 10:53:24 SOMZSPPRD001 /scsi_vhci/disk@g5000cca043aa5380 (sd63): Command Timeout on path pmcs4/disk@w5000cca043aa5382,0
Aug 13 14:56:46 SOMZSPPRD001 /scsi_vhci/disk@g5000cca043aa4690 (sd62): Command Timeout on path pmcs4/disk@w5000cca043aa4692,0
pool: Nonprod_FS_Pool
state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Aug 13 10:53:34 2015 5.42T scanned 182G resilvered at 15.6M/s, 26.61% done, 37h1m to go config: NAME STATE READ WRITE CKSUM Nonprod_FS_Pool DEGRADED 0 0 5.86M raidz1-0 DEGRADED 0 0 11.7M c0t5000CCA043AA5B88d0 ONLINE 0 0 11.7M c0t5000CCA043AA4690d0 UNAVAIL 30 3.45K 2 spare-2 DEGRADED 0 0 0 c0t5000CCA043AA5380d0 UNAVAIL 13 273 0 c0t5000CCA0439F85F8d0 DEGRADED 0 0 0 (resilvering) c0t5000CCA043AC7444d0 ONLINE 0 0 11.7M raidz1-1 ONLINE 0 0 0 c0t5000CCA043ADC3B8d0 ONLINE 0 0 0 c0t5000CCA043ADE770d0 ONLINE 0 0 0 c0t5000CCA043B5AE74d0 ONLINE 0 0 0 c0t5000CCA043B7AA6Cd0 ONLINE 0 0 0 logs mirror-2 ONLINE 0 0 0 c0t5000A7203009967Ed0 ONLINE 0 0 0 c0t5000A72030099692d0 ONLINE 0 0 0 cache c0t5001E8200272AA64d0 ONLINE 0 0 0 spares c0t5000CCA0439F85F8d0 INUSE device details: c0t5000CCA043AA4690d0 UNAVAIL external device fault status: FMA has faulted this device. action: Run 'fmadm faulty' for more information. Clear the errors using 'fmadm repaired'. see: http://support.oracle.com/msg/FMD-8000-58 for recovery c0t5000CCA043AA5380d0 UNAVAIL external device fault status: FMA has faulted this device. action: Run 'fmadm faulty' for more information. Clear the errors using 'fmadm repaired'. see: http://support.oracle.com/msg/FMD-8000-58 for recovery c0t5000CCA0439F85F8d0 DEGRADED scrub/resilver needed status: ZFS detected errors on this device. The device is missing some data that is recoverable. errors: Permanent errors have been detected in the following files: ........
Right after HDD 2 was faulted, FMD began generating checksum errors. NAS:<bundle>/fm$ grep checksum errlog.txt | head
Aug 13 14:56:36.2844 ereport.fs.zfs.checksum Aug 13 14:56:36.2844 ereport.fs.zfs.checksum Aug 13 14:56:36.7195 ereport.fs.zfs.checksum Aug 13 14:56:36.7195 ereport.fs.zfs.checksum Aug 13 14:56:36.7048 ereport.fs.zfs.checksum Aug 13 14:56:36.7047 ereport.fs.zfs.checksum Aug 13 14:56:36.7521 ereport.fs.zfs.checksum Aug 13 14:56:36.7520 ereport.fs.zfs.checksum Aug 13 14:56:36.7366 ereport.fs.zfs.checksum Aug 13 14:56:36.7366 ereport.fs.zfs.checksum
NAS:<bundle>/fm$ grep checksum errlog.txt | tail
Aug 14 00:21:25.1626 ereport.fs.zfs.checksum Aug 14 00:21:25.1820 ereport.fs.zfs.checksum Aug 14 00:21:25.2035 ereport.fs.zfs.checksum Aug 14 00:21:25.2035 ereport.fs.zfs.checksum Aug 14 00:21:25.1466 ereport.fs.zfs.checksum Aug 14 00:21:25.2708 ereport.fs.zfs.checksum Aug 14 00:21:25.2759 ereport.fs.zfs.checksum Aug 14 00:21:25.2708 ereport.fs.zfs.checksum Aug 14 00:21:25.2759 ereport.fs.zfs.checksum Aug 14 00:21:25.2162 ereport.fs.zfs.checksum
NOTE: The checksum errors were the result of losing two drives in the same vdev.
NAS:<bundle>/fm$ grep "Aug 1[3-4]" errlog.txt | grep checksum | wc -l
2921713
The errors stopped several hours before HDD 0 was replaced. So, somehow in the course of the resilver, the system must have unfaulted HDD 2. Tail of debug.sys: Aug 13 06:01:12 SOMZSPPRD001 smbsrv: [ID 421734 kern.notice] NOTICE: [SOMZSPPRD001\kamwai.chui]: export share not found
Aug 13 06:01:12 SOMZSPPRD001 klmmod: [ID 710424 kern.notice] NOTICE: Received NLM_FREE_ALL (BNLHQADEV001) from 123.456.070.209 Aug 13 06:16:51 SOMZSPPRD001 klmmod: [ID 710424 kern.notice] NOTICE: Received NLM_FREE_ALL (BNLHQADEV001) from 123.456.070.209 Aug 13 06:47:05 SOMZSPPRD001 smbsrv: [ID 421734 kern.notice] NOTICE: [SOMZSPPRD001\kamwai.chui]: export share not found Aug 13 06:52:51 SOMZSPPRD001 smbsrv: [ID 421734 kern.notice] NOTICE: [SOMZSPPRD001\kamwai.chui]: export share not found Aug 13 10:53:10 SOMZSPPRD001 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 Aug 13 10:53:24 SOMZSPPRD001 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): Aug 13 10:53:24 SOMZSPPRD001 /scsi_vhci/disk@g5000cca043aa5380 (sd63): Command Timeout on path pmcs4/disk@w5000cca043aa5382,0 Aug 13 10:53:24 SOMZSPPRD001 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 Aug 13 10:53:26 SOMZSPPRD001 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g5000cca043aa5380 (sd63): Aug 13 10:53:26 SOMZSPPRD001 SYNCHRONIZE CACHE command failed (5) Aug 13 10:54:05 SOMZSPPRD001 /usr/lib/ndmp/ndmpd[2167]: [ID 828132 daemon.error] [0] No device attached. Aug 13 10:54:24 SOMZSPPRD001 genunix: [ID 631017 kern.notice] NOTICE: Device: already retired: /scsi_vhci/disk@g5000cca043aa5380 Aug 13 14:56:33 SOMZSPPRD001 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 Aug 13 14:56:46 SOMZSPPRD001 scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): Aug 13 14:56:46 SOMZSPPRD001 /scsi_vhci/disk@g5000cca043aa4690 (sd62): Command Timeout on path pmcs4/disk@w5000cca043aa4692,0 Aug 13 14:56:46 SOMZSPPRD001 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 Aug 13 14:56:49 SOMZSPPRD001 last message repeated 3 times Aug 13 14:56:50 SOMZSPPRD001 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/disk@g5000cca043aa4690 (sd62): Aug 13 14:56:50 SOMZSPPRD001 SYNCHRONIZE CACHE command failed (5) Aug 13 14:57:02 SOMZSPPRD001 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset 0x1 Aug 13 14:58:55 SOMZSPPRD001 genunix: [ID 631017 kern.notice] NOTICE: Device: already retired: /scsi_vhci/disk@g5000cca043aa4690
CauseThe error of 04/440B (UEC # F33E) was not seen in the Event log, but was recorded in the Flash log. The 4/44/0B error is a combo driver watchdog error. The combo driver is a chip that drives both the spindle motor and the servo voice coil motor.
SolutionThe drive vendor has confirmed they have fixes for this issue in place in current A720 code. Please engage Oracle Support via a Service Request to provide access to Appliance Firmware Micro Release AK 2013.1 Update 4.11 which contains the fix for this issue.
***Checked for relevance on 30-MAY-2018*** Attachments This solution has no attachment |
||||||||||||||||
|