Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2000097.1
Update Date:2018-03-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  2000097.1 :   Oracle ZFS Storage Appliance: AKD Hung After Reboot in 'mt_config_fini()'  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-10392143441>

Applies to:

Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

During normal operation, the passive head is rebooted. After rebooted, it became hung trying to join into cluster.

The active head is not responding - BUI or CLI, also AKD daemon is unresponsive.

AKD core shows no threads and user is not able to disable AKD.

 

If you take akd core

> ::akx_class
@ ADDR     NAME           DATA     LOCKED
@ .
@ fd7010b8 fct            98dc088  0x1a
@ .
@ fd82b200 workflow       98eb0c0  0xcb

> 0x1a::findstack -v
@ stack pointer for thread 1a: 0
@ .

 

That means that process is blocked on kernel, so it is necessary take an NMI to identify issue.

From NMI

@ .
@ /cores_data/pool-1/data5/3-10392143441/tds-2015-03-10/zfs.3-10392143441_ak.d20
@ 40702-c23f-c956-b55a-9554c234f1e3/core
@ .
@ .
@    2*  threads trying to get a mutex (1 user, 1 kernel)
@           longest sleeping 1 days 14 minutes 48.004423684 seconds earlier
@   137   threads sleeping on a shuttle (door) (137 user, 0 kernel)
@     1*  stopped threads holding locks (1 user, 0 kernel)
@ .
@     2*  threads in biowait() (0 user, 2 kernel)
@     3*  procs with SIGKILL posted (see "tlist killed")
@     6*  threads with procs with SIGKILL posted (6 user, 0 kernel)
@ .
@ .
@ .
@ CAT(vmcore.1/11X)> proc |grep akd
@ 0xfffff600410d2020    452    449          0   26497024 25460736  6668288      
@ 2 /usr/lib/ak/akd
@ 0xfffff6004e348000    449      1          0  487575552 244240384 216395776    
@   3 /usr/lib/ak/akd
@ CAT(vmcore.1/11X)>
@ CAT(vmcore.1/11X)> proc 449
@        addr         PID    PPID   RUID/UID     size      RSS        swresv    lwpcnt command
@ ================== ====== ====== ========== ========== ========   ========    ====== =====
@ 0xfffff6004e348000    449      1          0  487575552 244240384 216395776        3 /usr/lib/ak/akd
@         thread: 0xfffff6005064bc00  state: slp   wchan: 0xfffff6004e3480b6  
@ sobj: condition var (from genunix:exitlwps+0x139)
@         thread: 0xfffff600433f3880  state: slp   wchan: 0xfffff6002c246a4c  
@ sobj: condition var (from genunix:ndi_devi_enter+0x6f)
@         thread: 0xfffff600507313c0  state: slp   wchan: 0xfffff6009d684048  
@ sobj: condition var (from genunix:mt_config_fini+0x2f)
@ .
@ ## oldest thread is
@ .
@ cmd: /usr/lib/ak/akd
@ fmri: svc:/appliance/kit/akd:default
@ t_wchan: 0xfffff6009d684048  sobj: condition var (from genunix:mt_config_fini+0x2f)
@ t_procp: 0xfffff6004e348000
@    p_as: 0xfffff600415209c8  size: 487575552  RSS: 244240384
@       a_hat: 0xfffff600410d4910
@    p_zone: 0xfffffffffbd0b240 (global)
@ t_stk: 0xffffff00b9ee3f10  sp: 0xffffff00b9ee3940  t_stkbase:
@ 0xffffff00b9edf000
@ t_pri: 100 (RT)  pctcpu: 0.000000
@ t_transience: 0  t_wkld_flags: 0
@ t_lwp: 0xfffff6006653e8c0  t_tid: 178
@    lwp_regs: 0xffffff00b9ee3f10
@    lwp_ap:   0xffffff00b9ee3ed0
@    t_mstate: LMS_SLEEP  ms_prev: LMS_SYSTEM
@    ms_state_start: 1 days 14 minutes 48.000371650 seconds earlier
@    ms_start: 1 days 16 minutes 41.854947479 seconds earlier
@ t_cpupart: 0xfffffffffbc8bb90(0)  last CPU: 13
@ idle: 87288000370338 hrticks (1d14m48.000370338s)
@ start: Mon Mar  9 11:13:32 2015
@ age: 87378 seconds (1 days 16 minutes 18 seconds)
@ t_state:     TS_SLEEP
@ t_flag:      0x1000 (T_LWPREUSE)
@ t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT)
@ t_schedflag: 3 (TS_LOAD|TS_DONT_SWAP)
@ t_acflag:    1 (TA_NO_PROCESS_LOCK)
@ p_flag:      0x42300902
@ (SEXITING|SKILLED|SEXTKILLED|SEXITLWPS|SHOLDFORK|SMSACCT|SMSFORK)
@ .
@ pc:      unix:_resume_from_idle+0xf5 resume_return:  addq   $0x8,%rsp
@ .
@ unix:_resume_from_idle+0xf5 resume_return()
@ unix:swtch - frame recycled
@ void genunix:cv_wait+0x60((kcondvar_t *)0xfffff6009d684048, (kmutex_t *)0xfffff6009d684040)
@ int genunix:mt_config_fini+0x2f((struct mt_config_handle *)0xfffff6009d684040)
@ int genunix:config_grand_children+0x3c((dev_info_t *)0xfffff6000014ec28, (uint_t)0x4004048, (major_t)0xffffffff)
@ int genunix:devi_config_common+0xdf((dev_info_t *)0xfffff6000014ec28, (int)0x4004048, (major_t)0xffffffff)
@ int genunix:ndi_devi_config+0x1a((dev_info_t *)0xfffff6000014ec28, (int)0x4004048)
@ di_off_t devinfo:di_copytree+0x64((struct dev_info *)0xfffff6000014ec28, (di_off_t *)0xfffff6002c6d9028, (struct di_state *)0xfffff6009d34b928)
@ di_off_t devinfo:di_snapshot+0x1e6((struct di_state *)0xfffff6009d34b928)
@ di_off_t devinfo:di_snapshot_and_clean+0x23((struct di_state *)0xfffff6009d34b928)
@ int devinfo:di_cache_update+0x3b((struct di_state *)0xfffff6009d34b928)
@ int devinfo:di_cache_lookup+0x8b((struct di_state *)0xfffff6009d34b928)
@ int devinfo:di_ioctl+0x4c8((dev_t)0x5800000002, (int)0x10df00, (intptr_t)0xf149dcb0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int *)0xffffff00b9ee3de4)
@ int genunix:cdev_ioctl+0x6e((dev_t)0x5800000002, (int)0x10df00, (intptr_t)0xf149dcb0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int *)0xffffff00b9ee3de4)
@ int specfs:spec_ioctl+0x5d((struct vnode *)0xfffff600718ee180, (int)0x10df00, (intptr_t)0xf149dcb0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int *)0xffffff00b9ee3de4, (caller_context_t *)0)
@ int genunix:fop_ioctl+0xd6((vnode_t *)0xfffff600718ee180, (int)0x10df00,
@ (intptr_t)0xf149dcb0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int
@ *)0xffffff00b9ee3de4, (caller_context_t *)0)
@ int genunix:ioctl+0x188((int)4, (int)0x10df00, (intptr_t)0xf149dcb0)
@ unix:_sys_sysenter_post_swapgs+0x149()
@ -- switch to user thread's user stack --
@ .
@ .
@ CAT(vmcore.1/11X)>  sdump 0xfffff6009d684040 mt_config_handle
@ struct mt_config_handle {
@    kmutex_t mtc_lock = {
@       void *[1] _opaque = [ NULL ]
@    }
@    kcondvar_t mtc_cv = {
@       ushort_t _opaque = 1
@    }
@    int mtc_thr_count = 1 ----------------------> awaiting one thread finish
@    dev_info_t *mtc_pdip = 0xfffff6000014ec28 (*genunix(bss):top_devinfo)
@ rootnex#-1 /i86pc
@    dev_info_t **mtc_fdip = NULL
@    major_t mtc_parmajor = 0xffffffff
@    major_t mtc_major = 0xffffffff
@    int mtc_flags = 0x4004048
@    int mtc_op = 0
@    int mtc_error = 0
@    struct brevq_node **mtc_brevqp = NULL
@ }
@ CAT(vmcore.1/11X)>
@ .
@ .
@ Looking information about device
@ .
@ CAT(vmcore.1/11X)> sdump -ot 0xfffff6000014ec28 dev_info
@ devi_flags,devi_busy_thread,devi_cv,devi_lock,devi_node_name
@  0x68      kmutex_t devi_lock = {
@  0x68         void *[1] _opaque = [ NULL ]
@            }
@  0xc8      char *devi_node_name = 0xfffff600004e92d0 "i86pc"
@ 0x154      kcondvar_t devi_cv = {
@ 0x154         ushort_t _opaque = 2
@            }
@ 0x1a0      uint_t devi_flags = 3
@ 0x1a8      void *devi_busy_thread = 0xfffff600433f3880
@ CAT(vmcore.1/11X)>
@ .
@ ## busy thread 0xfffff600433f3880
@ .
@ unix:_resume_from_idle+0xf5 resume_return()
@ unix:swtch - frame recycled
@ void genunix:cv_wait+0x60((kcondvar_t *)0xfffff6002c246a4c, (kmutex_t
@ *)0xfffff6002c246960)
@ void genunix:ndi_devi_enter+0x6f((dev_info_t *)0xfffff6002c2468f8, (int
@ *)0xfffff60072fdb410)
@ di_off_t devinfo:di_copynode+0x7e7((struct dev_info *)0xfffff6002c246bc8,
@ (struct di_stack *)0xfffff60072fdb000, (struct di_state *)0xfffff60073cdc128)
@ di_off_t devinfo:di_copytree+0xc8((struct dev_info *)0xfffff6000014ec28,
@ (di_off_t *)0xfffff6002c6d5028, (struct di_state *)0xfffff60073cdc128)
@ di_off_t devinfo:di_snapshot+0x1e6((struct di_state *)0xfffff60073cdc128)
@ di_off_t devinfo:di_snapshot_and_clean+0x23((struct di_state
@ *)0xfffff60073cdc128)
@ int devinfo:di_ioctl+0x491((dev_t)0x5800000006, (int)0xdf07,
@ (intptr_t)0xfb94abc0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int
@ *)0xffffff00b9c30de4)
@ int genunix:cdev_ioctl+0x6e((dev_t)0x5800000006, (int)0xdf07,
@ (intptr_t)0xfb94abc0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int
@ *)0xffffff00b9c30de4)
@ int specfs:spec_ioctl+0x5d((struct vnode *)0xfffff60075140500, (int)0xdf07,
@ (intptr_t)0xfb94abc0, (int)0x100001, (struct cred *)0xfffff600504eaa70, (int
@ *)0xffffff00b9c30de4, (caller_context_t *)0)
@ int genunix:fop_ioctl+0xd6((vnode_t *)0xfffff60075140500, (int)0xdf07,
@ (intptr_t)0xfb94abc0, (int)0x100001, (cred_t *)0xfffff600504eaa70, (int
@ *)0xffffff00b9c30de4, (caller_context_t *)0)
@ int genunix:ioctl+0x188((int)0x18a, (int)0xdf07, (intptr_t)0xfb94abc0)
@ unix:_sys_sysenter_post_swapgs+0x149()
@ -- switch to user thread's user stack --
@ .
@ CAT(vmcore.1/11X)> sdump 0xfffff6002c2468f8 dev_info
@ devi_flags,devi_busy_thread,devi_cv,devi_lock,devi_node_name
@    kmutex_t devi_lock = {
@       void *[1] _opaque = [ NULL ]
@    }
@    char *devi_node_name = 0xfffff6002be410d0 "storage"
@    kcondvar_t devi_cv = {
@       ushort_t _opaque = 1
@    }
@    uint_t devi_flags = 3
@    void *devi_busy_thread = 0xffffff00bae18c20
@ CAT(vmcore.1/11X)>
@ .
@ .
@ Looking this thread
@ CAT(vmcore.1/11X)> thread 0xffffff00bae18c20
@ ==== kernel thread: 0xffffff00bae18c20  PID: 0 ====
@ cmd: sched(genunix:mt_config_thread)
@ t_wchan: 0xfffff6004157ebc0  sobj: semaphore (from genunix:biowait+0x7a)
@ t_procp: 0xfffffffffbc36270 (proc_sched)
@    p_as: 0xfffffffffbc38240 (kas)
@    p_zone: 0xfffffffffbd0b240 (global)
@ t_stk: 0xffffff00bae18c20  sp: 0xffffff00bae182c0  t_stkbase:
@ 0xffffff00bae14000
@ t_pri: 60 (SYS)  pctcpu: 0.000000
@ t_transience: 10 (TRANSIENT)  t_wkld_flags: 0
@ t_cpupart: 0xfffffffffbc8bb90(0)  last CPU: 9
@ idle: 87287850152729 hrticks (1d14m47.850152729s)
@ start: Mon Mar  9 11:15:26 2015
@ age: 87264 seconds (1 days 14 minutes 24 seconds)
@ t_state:     TS_SLEEP
@ t_flag:      0x10008 (T_TALLOCSTK|T_PUSHPAGE)
@ t_proc_flag: 0 (none set)
@ t_schedflag: 3 (TS_LOAD|TS_DONT_SWAP)
@ t_acflag:    0 (none set)
@ p_flag:      1 (SSYS)
@ .
@ pc:      unix:_resume_from_idle+0xf5 resume_return:  addq   $0x8,%rsp
@ .
@ unix:_resume_from_idle+0xf5 resume_return()
@ unix:swtch - frame recycled
@ void genunix:sema_p+0x1d6((ksema_t *)0xfffff6004157ebc0)
@ int genunix:biowait+0x7a((struct buf *)0xfffff6004157eb00)
@ int genunix:default_physio+0x33b((int (*)())0xfffffffff7b74a30, (struct buf
@ *)0xfffff6004157eb00, (dev_t)0x6d00000700, (int)0x40, (int
@ (*)())0xfffffffff79af484, (struct uio *)0xffffff00bae18490)
@ int genunix:physio+0x25((int (*)())0xfffffffff7b74a30, (struct buf
@ *)0xfffff6004157eb00, (dev_t)0x6d00000700, (int)0x40, (int
@ (*)())0xfffffffff79af484, (struct uio *)0xffffff00bae18490)
@ int scsi:scsi_uscsi_handle_cmd+0x2b4((dev_t)0x6d00000700, (enum uio_seg)1,
@ (struct uscsi_cmd *)0xfffff60050094c00, (int (*)())0xfffffffff7b74a30,
@ (struct buf *)0, (void *)0xfffff6007971f880, (int)0)
@ int sd:sd_ssc_send+0x2a1((sd_ssc_t *)0xfffff6007971f800, (struct uscsi_cmd
@ *)0xffffff00bae185f0, (int)0x80000000, (enum uio_seg)1, (int)1)
@ int sd:sd_send_scsi_MODE_SENSE+0x1c1((sd_ssc_t *)0xfffff6007971f800,
@ (int)0xa, (uchar_t *)0xfffff60078c9c450, (size_t)0x2a, (uchar_t)0,
@ (uchar_t)0x2a, (int)1)
@ void sd:sd_set_mmc_caps+0x85((sd_ssc_t *)0xfffff6007971f800)
@ int sd:sd_unit_attach+0xceb((dev_info_t *)0xfffff6002c246358)
@ int sd:sdattach+0x19((dev_info_t *)0xfffff6002c246358, (ddi_attach_cmd_t)0)
@ int genunix:devi_attach+0xa1((dev_info_t *)0xfffff6002c246358,
@ (ddi_attach_cmd_t)0)
@ int genunix:attach_node+0xaa((dev_info_t *)0xfffff6002c246358)
@ int genunix:i_ndi_config_node+0xcf((dev_info_t *)0xfffff6002c246358,
@ (ddi_node_state_t)6, (uint_t)0)
@ int genunix:i_ddi_attachchild+0x3e((dev_info_t *)0xfffff6002c246358)
@ int genunix:devi_attach_node+0xda((dev_info_t *)0xfffff6002c246358,
@ (uint_t)0x4004048, (int *)0xffffff00bae189cc)
@ int genunix:config_immediate_children+0xda((dev_info_t *)0xfffff6002c2468f8,
@ (uint_t)0x4004048, (major_t)0xffffffffffffffff)
@ int genunix:ndi_busop_bus_config+0x126((dev_info_t *)0xfffff6002c2468f8,
@ (uint_t)0x4004048, (ddi_bus_config_op_t)2, (void *)0xffffffff, (dev_info_t
@ **)0, (clock_t)0)
@ int scsa2usb:scsa2usb_scsi_bus_config+0xb3((dev_info_t *)0xfffff6002c2468f8,
@ (uint_t)0x4004048, (ddi_bus_config_op_t)2, (void *)0xffffffff, (dev_info_t
@ **)0)
@ int scsi:scsi_hba_bus_config+0xbc((dev_info_t *)0xfffff6002c2468f8,
@ (uint_t)0x4004048, (ddi_bus_config_op_t)2, (void *)0xffffffff, (dev_info_t
@ **)0)
@ int genunix:devi_config_common+0x99((dev_info_t *)0xfffff6002c2468f8,
@ (int)0x4004048, (major_t)0xffffffff)
@ void genunix:mt_config_thread+0x53((void *)0xfffff60046e9ea78)
@ unix:thread_start+8()
@ -- end of kernel thread's stack --
@ .
@ CAT(vmcore.1/11X)>
@ .
@ .
@ ## biowait
@ .
@ CAT(vmcore.1/11X)> buf 0xfffff6004157eb00
@ buf @ 0xfffff6004157eb00
@    b_edev:   109,1792(sd28,0)   b_blkno:   0x0
@    b_flags:  0x200063 (BUSY|DONE|PHYS|READ|SHADOW)
@    b_addr:   0xfffff60078c9c450
@    b_bcount: 42                 b_bufsize: 0
@    b_dip:    0xfffff6002c246358 sd#28
@ /i86pc/pci@0,0/pci108e,484c@1d,2/device@1/storage@2/disk@0,0
@    b_shadow: 0xfffff60050041308 (struct page **)
@ CAT(vmcore.1/11X)>
@ .
@ .
@ CAT(vmcore.1/11X)> dev busy
@ .
@ Scanning for busy devices:
@ No busy/hanging devices found
@ Scanning for threads in biowait:
@ .
@   2 matching threads found
@     in biowait()
@ .
@ threads in biowait() by device:
@ count   device (thread: max idle time)
@     1   109,1792(sd28,0) (0xffffff00bae18c20: 1 days 5 minutes 29.792139485 seconds)
@     1   109,1728(sd27,0) (0xffffff00bae1ec20: 1 days 5 minutes 29.792104307 seconds)
@ .
@ Scanning for procs with aio:
@ CAT(vmcore.1/11X)>
@ .
@ .
@ ## so it seems that akd was hung awaiting on scsi command in usb device
@ .
@ CAT(vmcore.1/11X)> dev state sd 28
@ sd_state: 0xfffff60029c4ece8
@ n_items: 32 array: 0xfffff60012a02580 item size: 1520, next: 0x0
@ adaptive mutex:  owner: MUTEX_NO_OWNER  waiters: false
@ sd28 @ 0xfffff60045e56080(sd_lun)
@     /i86pc/pci@0,0/pci108e,484c@1d,2/device@1/storage@2/disk@0,0
@     0xfffff6002c246358   name: sd@0,0   instance #: 28
@       scsi device @ 0xfffff6002c03ca80  hba_tran: 0xfffff6002c377c80
@       target: 0  lun: 0  sub_lun: 0
@       scsi inquiry data @ 0xfffff6002b673e70 :
@         dtype: 5  qualifier: 0  removable: 1
@         ANSI version: 0  ECMA version: 0  ISO version: 0  length: 31
@         response format: 2  TERM I/O Proc msg: 0  async event notification: 0
@         scsi support:
@           soft rst: 0  cmdque: 0  linked cmds: 0  sync xfer: 0
@           16 bit xfers: 0  32 bit xfers: 0  relative addr: 0
@         vendor id: KVM       product id: vmDisk-CD
@         revision: 0.01  serial #:
@       sd_lun @ 0xfffff60045e56080, un_sd: 0xfffff6002c03ca80
@         throttle: 3, saved_throttle: 3, busy_throttle: 0
@ .
@         un_rqs_bp: 0x0, un_rqs_pktp: 0x0 un_sense_isbusy: 0
@         Last Request Sense Packet (using un_rqs_pktp):
@         un_ncmds_in_driver: 0, un_ncmds_in_transport: 0
@         open counts:
@           layered (none)
@           regular (none)
@         Geometry is NOT valid
@         Packet Flags for Tagged Queueing:
@ .
@         Last pkt reason:
@         CMD_TRAN_ERR - unspecified transport error
@ .
@         State:
@         SD_STATE_NORMAL
@         Last state:
@         SD_STATE_NORMAL
@         SCSI State Change Translation:
@             No state change
@ .
@         Reservation status:
@         SD_RELEASE
@ CAT(vmcore.1/11X)>
@ .
@ .
@  vendor id: KVM       product id: vmDisk-CD
@ .
@ .
@ ## problem with usb connection (KVM device)
@ .
@ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@2" 1 "scsa2usb"
@ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@2/disk@0,0" 28 "sd"
@ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@3" 2 "scsa2usb"
@ "/pci@0,0/pci108e,484c@1d,2/device@1/storage@3/disk@0,0" 27 "sd"
@ .
@ .
@ .

 

Changes



Cause

 Console is redirect to KVM switch. That is not supported

Solution

Remote administration should be done via serial connection to the SP initially (to configure SP IP address) and then via ssh to the ILOM.

Oracle do not support attaching a KVM (nor do we support the JavaRconsole connection via the ILOM BUI) to administer the ZFSSA.

 

Checked for Currency 09-OCT-2017

 

References

<BUG:20836598> - AKD HUNG AFTER REBOOT ON MT_CONFIG_FINI

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback