![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2000097.1 : Oracle ZFS Storage Appliance: AKD Hung After Reboot in 'mt_config_fini()'
In this Document
Created from <SR 3-10392143441> Applies to:Sun ZFS Storage 7420 - Version All Versions and laterSun ZFS Storage 7320 - Version All Versions and later Sun ZFS Storage 7120 - Version All Versions and later Oracle ZFS Storage ZS3-2 - Version All Versions and later Oracle ZFS Storage ZS3-4 - Version All Versions and later 7000 Appliance OS (Fishworks) SymptomsDuring normal operation, the passive head is rebooted. After rebooted, it became hung trying to join into cluster. The active head is not responding - BUI or CLI, also AKD daemon is unresponsive. AKD core shows no threads and user is not able to disable AKD.
If you take akd core > ::akx_class > 0x1a::findstack -v
That means that process is blocked on kernel, so it is necessary take an NMI to identify issue. From NMI @ .
@ /cores_data/pool-1/data5/3-10392143441/tds-2015-03-10/zfs.3-10392143441_ak.d20 @ 40702-c23f-c956-b55a-9554c234f1e3/core @ . @ . @ 2* threads trying to get a mutex (1 user, 1 kernel) @ longest sleeping 1 days 14 minutes 48.004423684 seconds earlier @ 137 threads sleeping on a shuttle (door) (137 user, 0 kernel) @ 1* stopped threads holding locks (1 user, 0 kernel) @ . @ 2* threads in biowait() (0 user, 2 kernel) @ 3* procs with SIGKILL posted (see "tlist killed") @ 6* threads with procs with SIGKILL posted (6 user, 0 kernel) @ . @ . @ . @ CAT(vmcore.1/11X)> proc |grep akd @ 0xfffff600410d2020 452 449 0 26497024 25460736 6668288 @ 2 /usr/lib/ak/akd @ 0xfffff6004e348000 449 1 0 487575552 244240384 216395776 @ 3 /usr/lib/ak/akd @ CAT(vmcore.1/11X)> @ CAT(vmcore.1/11X)> proc 449 @ addr PID PPID RUID/UID size RSS swresv lwpcnt command @ ================== ====== ====== ========== ========== ======== ======== ====== ===== @ 0xfffff6004e348000 449 1 0 487575552 244240384 216395776 3 /usr/lib/ak/akd @ thread: 0xfffff6005064bc00 state: slp wchan: 0xfffff6004e3480b6 @ sobj: condition var (from genunix:exitlwps+0x139) @ thread: 0xfffff600433f3880 state: slp wchan: 0xfffff6002c246a4c @ sobj: condition var (from genunix:ndi_devi_enter+0x6f) @ thread: 0xfffff600507313c0 state: slp wchan: 0xfffff6009d684048 @ sobj: condition var (from genunix:mt_config_fini+0x2f) @ . @ ## oldest thread is @ . @ cmd: /usr/lib/ak/akd @ fmri: svc:/appliance/kit/akd:default @ t_wchan: 0xfffff6009d684048 sobj: condition var (from genunix:mt_config_fini+0x2f) @ t_procp: 0xfffff6004e348000 @ p_as: 0xfffff600415209c8 size: 487575552 RSS: 244240384 @ a_hat: 0xfffff600410d4910 @ p_zone: 0xfffffffffbd0b240 (global) @ t_stk: 0xffffff00b9ee3f10 sp: 0xffffff00b9ee3940 t_stkbase: @ 0xffffff00b9edf000 @ t_pri: 100 (RT) pctcpu: 0.000000 @ t_transience: 0 t_wkld_flags: 0 @ t_lwp: 0xfffff6006653e8c0 t_tid: 178 @ lwp_regs: 0xffffff00b9ee3f10 @ lwp_ap: 0xffffff00b9ee3ed0 @ t_mstate: LMS_SLEEP ms_prev: LMS_SYSTEM @ ms_state_start: 1 days 14 minutes 48.000371650 seconds earlier @ ms_start: 1 days 16 minutes 41.854947479 seconds earlier @ t_cpupart: 0xfffffffffbc8bb90(0) last CPU: 13 @ idle: 87288000370338 hrticks (1d14m48.000370338s) @ start: Mon Mar 9 11:13:32 2015 @ age: 87378 seconds (1 days 16 minutes 18 seconds) @ t_state: TS_SLEEP @ t_flag: 0x1000 (T_LWPREUSE) @ t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT) @ t_schedflag: 3 (TS_LOAD|TS_DONT_SWAP) @ t_acflag: 1 (TA_NO_PROCESS_LOCK) @ p_flag: 0x42300902 @ (SEXITING|SKILLED|SEXTKILLED|SEXITLWPS|SHOLDFORK|SMSACCT|SMSFORK) @ . @ pc: unix:_resume_from_idle+0xf5 resume_return: addq $0x8,%rsp @ . @ unix:_resume_from_idle+0xf5 resume_return() @ unix:swtch - frame recycled @ void genunix:cv_wait+0x60((kcondvar_t *)0xfffff6009d684048, (kmutex_t *)0xfffff6009d684040) @ int genunix:mt_config_fini+0x2f((struct mt_config_handle *)0xfffff6009d684040) @ int genunix:config_grand_children+0x3c((dev_info_t *)0xfffff6000014ec28, (uint_t)0x4004048, (major_t)0xffffffff) @ int genunix:devi_config_common+0xdf((dev_info_t *)0xfffff6000014ec28, (int)0x4004048, (major_t)0xffffffff) @ int genunix:ndi_devi_config+0x1a((dev_info_t *)0xfffff6000014ec28, (int)0x4004048) @ di_off_t devinfo:di_copytree+0x64((struct dev_info *)0xfffff6000014ec28, (di_off_t *)0xfffff6002c6d9028, (struct di_state *)0xfffff6009d34b928) @ di_off_t devinfo:di_snapshot+0x1e6((struct di_state *)0xfffff6009d34b928) @ di_off_t devinfo:di_snapshot_and_clean+0x23((struct di_state *)0xfffff6009d34b928) @ int devinfo:di_cache_update+0x3b((struct di_state *)0xfffff6009d34b928) @ int devinfo:di_cache_lookup+0x8b((struct di_state *)0xfffff6009d34b928) @ int devinfo:di_ioctl+0x4c8((dev_t)0x5800000002, (int)0x10df00, (intptr_t)0xf149dcb0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int *)0xffffff00b9ee3de4) @ int genunix:cdev_ioctl+0x6e((dev_t)0x5800000002, (int)0x10df00, (intptr_t)0xf149dcb0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int *)0xffffff00b9ee3de4) @ int specfs:spec_ioctl+0x5d((struct vnode *)0xfffff600718ee180, (int)0x10df00, (intptr_t)0xf149dcb0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int *)0xffffff00b9ee3de4, (caller_context_t *)0) @ int genunix:fop_ioctl+0xd6((vnode_t *)0xfffff600718ee180, (int)0x10df00, @ (intptr_t)0xf149dcb0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int @ *)0xffffff00b9ee3de4, (caller_context_t *)0) @ int genunix:ioctl+0x188((int)4, (int)0x10df00, (intptr_t)0xf149dcb0) @ unix:_sys_sysenter_post_swapgs+0x149() @ -- switch to user thread's user stack -- @ . @ . @ CAT(vmcore.1/11X)> sdump 0xfffff6009d684040 mt_config_handle @ struct mt_config_handle { @ kmutex_t mtc_lock = { @ void *[1] _opaque = [ NULL ] @ } @ kcondvar_t mtc_cv = { @ ushort_t _opaque = 1 @ } @ int mtc_thr_count = 1 ----------------------> awaiting one thread finish @ dev_info_t *mtc_pdip = 0xfffff6000014ec28 (*genunix(bss):top_devinfo) @ rootnex#-1 /i86pc @ dev_info_t **mtc_fdip = NULL @ major_t mtc_parmajor = 0xffffffff @ major_t mtc_major = 0xffffffff @ int mtc_flags = 0x4004048 @ int mtc_op = 0 @ int mtc_error = 0 @ struct brevq_node **mtc_brevqp = NULL @ } @ CAT(vmcore.1/11X)> @ . @ . @ Looking information about device @ . @ CAT(vmcore.1/11X)> sdump -ot 0xfffff6000014ec28 dev_info @ devi_flags,devi_busy_thread,devi_cv,devi_lock,devi_node_name @ 0x68 kmutex_t devi_lock = { @ 0x68 void *[1] _opaque = [ NULL ] @ } @ 0xc8 char *devi_node_name = 0xfffff600004e92d0 "i86pc" @ 0x154 kcondvar_t devi_cv = { @ 0x154 ushort_t _opaque = 2 @ } @ 0x1a0 uint_t devi_flags = 3 @ 0x1a8 void *devi_busy_thread = 0xfffff600433f3880 @ CAT(vmcore.1/11X)> @ . @ ## busy thread 0xfffff600433f3880 @ . @ unix:_resume_from_idle+0xf5 resume_return() @ unix:swtch - frame recycled @ void genunix:cv_wait+0x60((kcondvar_t *)0xfffff6002c246a4c, (kmutex_t @ *)0xfffff6002c246960) @ void genunix:ndi_devi_enter+0x6f((dev_info_t *)0xfffff6002c2468f8, (int @ *)0xfffff60072fdb410) @ di_off_t devinfo:di_copynode+0x7e7((struct dev_info *)0xfffff6002c246bc8, @ (struct di_stack *)0xfffff60072fdb000, (struct di_state *)0xfffff60073cdc128) @ di_off_t devinfo:di_copytree+0xc8((struct dev_info *)0xfffff6000014ec28, @ (di_off_t *)0xfffff6002c6d5028, (struct di_state *)0xfffff60073cdc128) @ di_off_t devinfo:di_snapshot+0x1e6((struct di_state *)0xfffff60073cdc128) @ di_off_t devinfo:di_snapshot_and_clean+0x23((struct di_state @ *)0xfffff60073cdc128) @ int devinfo:di_ioctl+0x491((dev_t)0x5800000006, (int)0xdf07, @ (intptr_t)0xfb94abc0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int @ *)0xffffff00b9c30de4) @ int genunix:cdev_ioctl+0x6e((dev_t)0x5800000006, (int)0xdf07, @ (intptr_t)0xfb94abc0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int @ *)0xffffff00b9c30de4) @ int specfs:spec_ioctl+0x5d((struct vnode *)0xfffff60075140500, (int)0xdf07, @ (intptr_t)0xfb94abc0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int @ *)0xffffff00b9c30de4, (caller_context_t *)0) @ int genunix:fop_ioctl+0xd6((vnode_t *)0xfffff60075140500, (int)0xdf07, @ (intptr_t)0xfb94abc0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int @ *)0xffffff00b9c30de4, (caller_context_t *)0) @ int genunix:ioctl+0x188((int)0x18a, (int)0xdf07, (intptr_t)0xfb94abc0) @ unix:_sys_sysenter_post_swapgs+0x149() @ -- switch to user thread's user stack -- @ . @ CAT(vmcore.1/11X)> sdump 0xfffff6002c2468f8 dev_info @ devi_flags,devi_busy_thread,devi_cv,devi_lock,devi_node_name @ kmutex_t devi_lock = { @ void *[1] _opaque = [ NULL ] @ } @ char *devi_node_name = 0xfffff6002be410d0 "storage" @ kcondvar_t devi_cv = { @ ushort_t _opaque = 1 @ } @ uint_t devi_flags = 3 @ void *devi_busy_thread = 0xffffff00bae18c20 @ CAT(vmcore.1/11X)> @ . @ . @ Looking this thread @ CAT(vmcore.1/11X)> thread 0xffffff00bae18c20 @ ==== kernel thread: 0xffffff00bae18c20 PID: 0 ==== @ cmd: sched(genunix:mt_config_thread) @ t_wchan: 0xfffff6004157ebc0 sobj: semaphore (from genunix:biowait+0x7a) @ t_procp: 0xfffffffffbc36270 (proc_sched) @ p_as: 0xfffffffffbc38240 (kas) @ p_zone: 0xfffffffffbd0b240 (global) @ t_stk: 0xffffff00bae18c20 sp: 0xffffff00bae182c0 t_stkbase: @ 0xffffff00bae14000 @ t_pri: 60 (SYS) pctcpu: 0.000000 @ t_transience: 10 (TRANSIENT) t_wkld_flags: 0 @ t_cpupart: 0xfffffffffbc8bb90(0) last CPU: 9 @ idle: 87287850152729 hrticks (1d14m47.850152729s) @ start: Mon Mar 9 11:15:26 2015 @ age: 87264 seconds (1 days 14 minutes 24 seconds) @ t_state: TS_SLEEP @ t_flag: 0x10008 (T_TALLOCSTK|T_PUSHPAGE) @ t_proc_flag: 0 (none set) @ t_schedflag: 3 (TS_LOAD|TS_DONT_SWAP) @ t_acflag: 0 (none set) @ p_flag: 1 (SSYS) @ . @ pc: unix:_resume_from_idle+0xf5 resume_return: addq $0x8,%rsp @ . @ unix:_resume_from_idle+0xf5 resume_return() @ unix:swtch - frame recycled @ void genunix:sema_p+0x1d6((ksema_t *)0xfffff6004157ebc0) @ int genunix:biowait+0x7a((struct buf *)0xfffff6004157eb00) @ int genunix:default_physio+0x33b((int (*)())0xfffffffff7b74a30, (struct buf @ *)0xfffff6004157eb00, (dev_t)0x6d00000700, (int)0x40, (int @ (*)())0xfffffffff79af484, (struct uio *)0xffffff00bae18490) @ int genunix:physio+0x25((int (*)())0xfffffffff7b74a30, (struct buf @ *)0xfffff6004157eb00, (dev_t)0x6d00000700, (int)0x40, (int @ (*)())0xfffffffff79af484, (struct uio *)0xffffff00bae18490) @ int scsi:scsi_uscsi_handle_cmd+0x2b4((dev_t)0x6d00000700, (enum uio_seg)1, @ (struct uscsi_cmd *)0xfffff60050094c00, (int (*)())0xfffffffff7b74a30, @ (struct buf *)0, (void *)0xfffff6007971f880, (int)0) @ int sd:sd_ssc_send+0x2a1((sd_ssc_t *)0xfffff6007971f800, (struct uscsi_cmd @ *)0xffffff00bae185f0, (int)0x80000000, (enum uio_seg)1, (int)1) @ int sd:sd_send_scsi_MODE_SENSE+0x1c1((sd_ssc_t *)0xfffff6007971f800, @ (int)0xa, (uchar_t *)0xfffff60078c9c450, (size_t)0x2a, (uchar_t)0, @ (uchar_t)0x2a, (int)1) @ void sd:sd_set_mmc_caps+0x85((sd_ssc_t *)0xfffff6007971f800) @ int sd:sd_unit_attach+0xceb((dev_info_t *)0xfffff6002c246358) @ int sd:sdattach+0x19((dev_info_t *)0xfffff6002c246358, (ddi_attach_cmd_t)0) @ int genunix:devi_attach+0xa1((dev_info_t *)0xfffff6002c246358, @ (ddi_attach_cmd_t)0) @ int genunix:attach_node+0xaa((dev_info_t *)0xfffff6002c246358) @ int genunix:i_ndi_config_node+0xcf((dev_info_t *)0xfffff6002c246358, @ (ddi_node_state_t)6, (uint_t)0) @ int genunix:i_ddi_attachchild+0x3e((dev_info_t *)0xfffff6002c246358) @ int genunix:devi_attach_node+0xda((dev_info_t *)0xfffff6002c246358, @ (uint_t)0x4004048, (int *)0xffffff00bae189cc) @ int genunix:config_immediate_children+0xda((dev_info_t *)0xfffff6002c2468f8, @ (uint_t)0x4004048, (major_t)0xffffffffffffffff) @ int genunix:ndi_busop_bus_config+0x126((dev_info_t *)0xfffff6002c2468f8, @ (uint_t)0x4004048, (ddi_bus_config_op_t)2, (void *)0xffffffff, (dev_info_t @ **)0, (clock_t)0) @ int scsa2usb:scsa2usb_scsi_bus_config+0xb3((dev_info_t *)0xfffff6002c2468f8, @ (uint_t)0x4004048, (ddi_bus_config_op_t)2, (void *)0xffffffff, (dev_info_t @ **)0) @ int scsi:scsi_hba_bus_config+0xbc((dev_info_t *)0xfffff6002c2468f8, @ (uint_t)0x4004048, (ddi_bus_config_op_t)2, (void *)0xffffffff, (dev_info_t @ **)0) @ int genunix:devi_config_common+0x99((dev_info_t *)0xfffff6002c2468f8, @ (int)0x4004048, (major_t)0xffffffff) @ void genunix:mt_config_thread+0x53((void *)0xfffff60046e9ea78) @ unix:thread_start+8() @ -- end of kernel thread's stack -- @ . @ CAT(vmcore.1/11X)> @ . @ . @ ## biowait @ . @ CAT(vmcore.1/11X)> buf 0xfffff6004157eb00 @ buf @ 0xfffff6004157eb00 @ b_edev: 109,1792(sd28,0) b_blkno: 0x0 @ b_flags: 0x200063 (BUSY|DONE|PHYS|READ|SHADOW) @ b_addr: 0xfffff60078c9c450 @ b_bcount: 42 b_bufsize: 0 @ b_dip: 0xfffff6002c246358 sd#28 @ /i86pc/pci@0,0/pci108e,484c@1d,2/device@1/storage@2/disk@0,0 @ b_shadow: 0xfffff60050041308 (struct page **) @ CAT(vmcore.1/11X)> @ . @ . @ CAT(vmcore.1/11X)> dev busy @ . @ Scanning for busy devices: @ No busy/hanging devices found @ Scanning for threads in biowait: @ . @ 2 matching threads found @ in biowait() @ . @ threads in biowait() by device: @ count device (thread: max idle time) @ 1 109,1792(sd28,0) (0xffffff00bae18c20: 1 days 5 minutes 29.792139485 seconds) @ 1 109,1728(sd27,0) (0xffffff00bae1ec20: 1 days 5 minutes 29.792104307 seconds) @ . @ Scanning for procs with aio: @ CAT(vmcore.1/11X)> @ . @ . @ ## so it seems that akd was hung awaiting on scsi command in usb device @ . @ CAT(vmcore.1/11X)> dev state sd 28 @ sd_state: 0xfffff60029c4ece8 @ n_items: 32 array: 0xfffff60012a02580 item size: 1520, next: 0x0 @ adaptive mutex: owner: MUTEX_NO_OWNER waiters: false @ sd28 @ 0xfffff60045e56080(sd_lun) @ /i86pc/pci@0,0/pci108e,484c@1d,2/device@1/storage@2/disk@0,0 @ 0xfffff6002c246358 name: sd@0,0 instance #: 28 @ scsi device @ 0xfffff6002c03ca80 hba_tran: 0xfffff6002c377c80 @ target: 0 lun: 0 sub_lun: 0 @ scsi inquiry data @ 0xfffff6002b673e70 : @ dtype: 5 qualifier: 0 removable: 1 @ ANSI version: 0 ECMA version: 0 ISO version: 0 length: 31 @ response format: 2 TERM I/O Proc msg: 0 async event notification: 0 @ scsi support: @ soft rst: 0 cmdque: 0 linked cmds: 0 sync xfer: 0 @ 16 bit xfers: 0 32 bit xfers: 0 relative addr: 0 @ vendor id: KVM product id: vmDisk-CD @ revision: 0.01 serial #: @ sd_lun @ 0xfffff60045e56080, un_sd: 0xfffff6002c03ca80 @ throttle: 3, saved_throttle: 3, busy_throttle: 0 @ . @ un_rqs_bp: 0x0, un_rqs_pktp: 0x0 un_sense_isbusy: 0 @ Last Request Sense Packet (using un_rqs_pktp): @ un_ncmds_in_driver: 0, un_ncmds_in_transport: 0 @ open counts: @ layered (none) @ regular (none) @ Geometry is NOT valid @ Packet Flags for Tagged Queueing: @ . @ Last pkt reason: @ CMD_TRAN_ERR - unspecified transport error @ . @ State: @ SD_STATE_NORMAL @ Last state: @ SD_STATE_NORMAL @ SCSI State Change Translation: @ No state change @ . @ Reservation status: @ SD_RELEASE @ CAT(vmcore.1/11X)> @ . @ . @ vendor id: KVM product id: vmDisk-CD @ . @ . @ ## problem with usb connection (KVM device) @ . @ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@2" 1 "scsa2usb" @ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@2/disk@0,0" 28 "sd" @ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@3" 2 "scsa2usb" @ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@3/disk@0,0" 27 "sd" @ . @ . @ .
Changes
CauseConsole is redirect to KVM switch. That is not supported SolutionRemote administration should be done via serial connection to the SP initially (to configure SP IP address) and then via ssh to the ILOM. Oracle do not support attaching a KVM (nor do we support the JavaRconsole connection via the ILOM BUI) to administer the ZFSSA.
Checked for Currency 09-OCT-2017
References<BUG:20836598> - AKD HUNG AFTER REBOOT ON MT_CONFIG_FINIAttachments This solution has no attachment |
||||||||||||||||||||
|