Asset ID: |
1-72-2286032.1 |
Update Date: | 2018-01-17 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2286032.1
:
Panic Due To mutex_destroy: not owner When Attempting To Offline, Hotplug, Or Unconfigure An Emulex Fibre Channel HBA Shutdown Due To Adapter Hardware Error
Related Items |
- SPARC M6-32
- Solaris Operating System
- SPARC T5-4
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Storage Drivers>SN-DK: Storage Drivers
|
In this Document
Created from <SR 3-15089395409>
Applies to:
Solaris Operating System - Version 11 and later SPARC M6-32 - Version All Versions and later SPARC T5-4 - Version All Versions and later Information in this document applies to any platform.
Symptoms
Panic due to mutex_destory: not owner occurs when executing luxadm -e offline, hotplug disable, or cfgadm -c unconfigure against an Emulex HBA which has been shutdown due to an adapter hardware error.
Panic example during luxadm -e offline:
release: 5.11 (64-bit) version: 11.3 usr/src: 37843:73160471f302:0.175.3.5.0.4.0:S11.3SRU5.4+0 usr/closed: 2902:9c0e467f5fc6:0.175.3.5.0.4.0:S11.3SRU5.4+0 machine: sun4v node name: server1 hw_provider: Oracle Corporation system type: ORCL,SPARC-M6-32 (SPARC-M6) dump_conflags: 0x10100 (DUMP_KERNEL|DUMP_ZFS) on /dev/zvol/dsk/rpool/dump(64G) dump_uuid: 266b2fc6-16f9-4e69-9181-c7c03303c1bb time of crash: Thu Jun 8 08:29:39 EDT 2017 age of system: 299 days 1 hours 18 minutes 41 seconds panic CPU: 782 (24 CPUs, 64G memory, 4 nodes) panic string: mutex_destroy: not owner, lp=c4008fc01928 owner=c400a2ddc380 thread=c400eeeb1780
==== panic thread: 0xc400eeeb1780 ==== CPU: 782 ==== ==== panic user (LWP_SYS) thread: 0xc400eeeb1780 PID: 2110 on CPU: 782 ==== cmd: luxadm -e offline /devices/pci@ac0/pci@1/pci@0/pci@8/SUNW,emlxs@0 fmri: svc:/network/ssh:default t_procp: 0xc40209337360 p_as: 0xc4009c035060 size: 3039232 RSS: 1867776 a_hat: 0xc4019e1ec480 cnum: CPU768:1512/3058 cpusran: 1,768,781,782 p_zone: 0x2089dfe8 (global) t_stk: 0x2a110739ad0 sp: 0x205105f1 t_stkbase: 0x2a110732000 t_pri: 0 (TS) pctcpu: 3.724039 t_transience: 0 t_wkld_flags: 0 t_lwp: 0xc4009500e938 t_tid: 1 machpcb: 0x2a110739ad0 lwp_ap: 0x2a110739bc0 t_mstate: LMS_SYSTEM ms_prev: LMS_SYSTEM ms_state_start: 0.000154365 seconds earlier ms_start: 2.852405107 seconds earlier t_cpupart: 0x20511430(0) last CPU: 782 idle: 172912 nsec (0.000172912s) start: Thu Jun 8 08:29:37 2017 age: 2 seconds (2 seconds) t_state: TS_ONPROC t_flag: 0x1800 (T_PANIC|T_LWPREUSE) t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT) t_schedflag: 0x8013 (TS_LOAD|TS_DONT_SWAP|TS_SIGNALLED|TS_WKLD_PERM) t_acflag: 3 (TA_NO_PROCESS_LOCK|TA_BATCH_TICKS) p_flag: 0x4a004000 (SEXECED|SMSACCT|SAUTOLPG|SMSFORK)
pc: unix:panicsys+0x40: call unix:setjmp
void unix:panicsys+0x40((const char *)0x10119ea8, (va_list)0x2a110738ec8, (struct regs *)0x20510fa0, (int)1, 0x9980001600, , , , , , , , 0x10119ea8, 0x2a110738ec8) unix:vpanic_common+0x78(0x10119ea8, 0x2a110738ec8, 0, 1, 0xf, 5) void unix:panic+0x1c((const char *)0x10119ea8, (void *)0x10119f58, 0xc4008fc01928, 0xc400a2ddc380, 0xc400eeeb1780, 0, ...) unix:mutex_panic((char *)0x10119f58, (mutex_impl_t *)0xc4008fc01928) - frame recycled void unix:mutex_destroy+0x120((kmutex_t *)0xc4008fc01928) void emlxs:emlxs_lock_destroy+0xb4((emlxs_hba_t *)0xc4008fc00000) void emlxs:emlxs_driver_remove+0x138((dev_info_t *)0x4001242c090, (uint32_t)0xffffffff, (uint32_t)0) int emlxs:emlxs_hba_detach+0x234((dev_info_t *)0x4001242c090) int emlxs:emlxs_detach+0x19c((dev_info_t *)0x4001242c090, (ddi_detach_cmd_t)0) int genunix:devi_detach+0x9c((dev_info_t *), (ddi_detach_cmd_t)0) int genunix:detach_node+0x68((dev_info_t *)0x4001242c090, (uint_t)0x41000) int genunix:i_ndi_unconfig_node+0x140((dev_info_t *)0x4001242c090, (ddi_node_state_t)4, (uint_t)0x41000) int genunix:i_ddi_detachchild+0x18((dev_info_t *)0x4001242c090, (uint_t)0x41000) int genunix:devi_detach_node+0xe4((dev_info_t *)0x4001242c090, (uint_t), (int *)0) int genunix:ndi_devi_unconfig_one+0x16c((dev_info_t *)0x4001242cf40, (char *)0xc40209e23600, (dev_info_t **)0, (int)0x41000) int genunix:ndi_devctl_device_offline+0xe0((dev_info_t *)0x4001242cf40, (struct devctl_iocdata *)0xc4022de860f0, (uint_t)0) int genunix:ndi_devctl_ioctl+0xd0((dev_info_t *), (int), (intptr_t), (int), (uint_t)) pcie:pcie_ioctl((dev_info_t *)0x4001242cf40, (dev_t)0xa300005dff, (int)0xdc0007, (intptr_t)0xf797b8bc, (int)0x100403, (cred_t *)0xc400a7788a10, (int *)0x2a110739acc) - frame recycled int pcieb:pcieb_ioctl+0x5c((dev_t), (int), (intptr_t)0xf797b8bc, (int)0x100403, (cred_t *)0xc400a7788a10, (int *)0x2a110739acc) specfs:spec_ioctl((struct vnode *)0xc400b302d780, (int)0xdc0007, (intptr_t)0xf797b8bc, (int)0x100403, (struct cred *)0xc400a7788a10, (int *)0x2a110739acc, (caller_context_t *)0) - frame recycled int genunix:fop_ioctl+0xd0((vnode_t *)0xc400b302d780, (int)0xdc0007, (intptr_t)0xf797b8bc, (int)0x100403, (cred_t *), (int *)0x2a110739acc, (caller_context_t *)0) int genunix:ioctl+0x16c((int), (int), (intptr_t)) unix:_syscall_no_proc_exit32+0x78() -- switch to user thread's user stack --
Another panic example during luxadm -e offline:
release: 5.11 (64-bit) version: 11.3 usr/src: 23960:e53f774d67dc:2352.8+53 usr/closed: 2071:626a763a22f9:0.175.3.6.0.5.0:S11.3SRU6.5+2 machine: sun4v node name: server2 hw_provider: Oracle Corporation system type: ORCL,SPARC-T5-4 (SPARC-T5) dump_conflags: 0x10100 (DUMP_KERNEL|DUMP_ZFS) on /dev/zvol/dsk/rpool/dump(256G) dump_uuid: time of crash: Fri Jul 7 06:40:11 EDT 2017 age of system: 117 days 15 hours 57 minutes 30 seconds panic CPU: 111 (512 CPUs, 1023G memory, 4 nodes) panic string: mutex_destroy: not owner, lp=c40655801928 owner=c40655e85880 thread=c40e37aac200
==== panic thread: 0xc40e37aac200 ==== CPU: 111 ==== ==== panic user (LWP_SYS) thread: 0xc40e37aac200 PID: 39249 on CPU: 111 ==== cmd: luxadm -e offline /devices/pci@300/pci@1/pci@0/pci@6/SUNW,emlxs@0,1 fmri: svc:/site/openssh:default t_procp: 0xc40697a36440 p_as: 0xc406868ecf78 size: 3096576 RSS: 1941504 a_hat: 0xc40686483280 cnum: CPU0:7883/30649 cpusran: 11,111 p_zone: 0x208a1de8 (global) t_stk: 0x2a100dc3ad0 sp: 0x20514671 t_stkbase: 0x2a100dba000 t_pri: 0 (TS) pctcpu: 2.943396 t_transience: 0 t_wkld_flags: 0 t_lwp: 0xc40987d70670 t_tid: 1 machpcb: 0x2a100dc3ad0 lwp_ap: 0x2a100dc3bc0 t_mstate: LMS_SYSTEM ms_prev: LMS_SYSTEM ms_state_start: 0.000635018 seconds earlier ms_start: 2.642434864 seconds earlier t_cpupart: 0x205154b0(0) last CPU: 111 idle: 9957870 nsec (0.009957870s) start: Fri Jul 7 06:40:08 2017 age: 3 seconds (3 seconds) t_state: TS_ONPROC t_flag: 0x1800 (T_PANIC|T_LWPREUSE) t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT) t_schedflag: 0x8013 (TS_LOAD|TS_DONT_SWAP|TS_SIGNALLED|TS_WKLD_PERM) t_acflag: 3 (TA_NO_PROCESS_LOCK|TA_BATCH_TICKS) p_flag: 0x4a004000 (SEXECED|SMSACCT|SAUTOLPG|SMSFORK)
pc: unix:panicsys+0x40: call unix:setjmp
void unix:panicsys+0x40((const char *)0x1011abe8, (va_list)0x2a100dc2ec8, (struct regs *)0x20515020, (int)1, 0x9080001601, , , , , , , , 0x1011abe8, 0x2a100dc2ec8) unix:vpanic_common+0x78(0x1011abe8, 0x2a100dc2ec8, 0, 0x208531d0, 0xf, 0xd) void unix:panic+0x1c((const char *)0x1011abe8, (void *)0x1011ac98, 0xc40655801928, 0xc40655e85880, 0xc40e37aac200, 0, ...) unix:mutex_panic((char *)0x1011ac98, (mutex_impl_t *)0xc40655801928) - frame recycled void unix:mutex_destroy+0x120((kmutex_t *)0xc40655801928) void emlxs:emlxs_lock_destroy+0xb4((emlxs_hba_t *)0xc40655800000) void emlxs:emlxs_driver_remove+0x138((dev_info_t *)0x4003d5104b8, (uint32_t)0xffffffff, (uint32_t)0) int emlxs:emlxs_hba_detach+0x234((dev_info_t *)0x4003d5104b8) int emlxs:emlxs_detach+0x19c((dev_info_t *)0x4003d5104b8, (ddi_detach_cmd_t)0) int genunix:devi_detach+0x9c((dev_info_t *), (ddi_detach_cmd_t)0) int genunix:detach_node+0x68((dev_info_t *)0x4003d5104b8, (uint_t)0x41000) int genunix:i_ndi_unconfig_node+0x140((dev_info_t *)0x4003d5104b8, (ddi_node_state_t)4, (uint_t)0x41000) int genunix:i_ddi_detachchild+0x18((dev_info_t *)0x4003d5104b8, (uint_t)0x41000) int genunix:devi_detach_node+0xe4((dev_info_t *)0x4003d5104b8, (uint_t), (int *)0) int genunix:ndi_devi_unconfig_one+0x16c((dev_info_t *)0x4003d515650, (char *)0xc40f00c85880, (dev_info_t **)0, (int)0x41000) int genunix:ndi_devctl_device_offline+0xe0((dev_info_t *)0x4003d515650, (struct devctl_iocdata *)0xc408f80620c8, (uint_t)0) int genunix:ndi_devctl_ioctl+0xd0((dev_info_t *), (int), (intptr_t), (int), (uint_t)) pcie:pcie_ioctl((dev_info_t *)0x4003d515650, (dev_t)0xa5000003ff, (int)0xdc0007, (intptr_t)0xf336b75c, (int)0x100403, (cred_t *)0xc413b3139648, (int *)0x2a100dc3acc) - frame recycled int pcieb:pcieb_ioctl+0x5c((dev_t), (int), (intptr_t)0xf336b75c, (int)0x100403, (cred_t *)0xc413b3139648, (int *)0x2a100dc3acc) specfs:spec_ioctl((struct vnode *)0xc406bd81dd40, (int)0xdc0007, (intptr_t)0xf336b75c, (int)0x100403, (struct cred *)0xc413b3139648, (int *)0x2a100dc3acc, (caller_context_t *)0) - frame recycled int genunix:fop_ioctl+0xd0((vnode_t *)0xc406bd81dd40, (int)0xdc0007, (intptr_t)0xf336b75c, (int)0x100403, (cred_t *), (int *)0x2a100dc3acc, (caller_context_t *)0) int genunix:ioctl+0x16c((int), (int), (intptr_t)) unix:_syscall_no_proc_exit32+0x78() -- switch to user thread's user stack --
Panic example during hotplug disable:
release: 5.11 (64-bit) version: 11.3 usr/src: 39744:88d1bc1a9331:0.175.3.8.0.6.0:S11.3SRU8.6+0 usr/closed: 2989:c08b3135365f:0.175.3.8.0.6.0:S11.3SRU8.6+0 machine: sun4v node name: server3 hw_provider: Oracle Corporation system type: ORCL,SPARC-M6-32 (SPARC-M6) dump_conflags: 0x10100 (DUMP_KERNEL|DUMP_ZFS) on /dev/zvol/dsk/dump-pool/dump(946G) cluster_bflgs: 0x3 (CLUSTER_CONFIGURED|CLUSTER_BOOTED) pxfs_software_mount_level: v1 (consolidated version) current node: 4 dump_uuid: b28be786-42d5-456d-9a08-b9d3172cad3c time of crash: Thu Jun 1 11:00:46 PDT 2017 age of system: 11 days 13 hours 3 minutes 2 seconds panic CPU: 216 (320 CPUs, 2T memory, 4 nodes) panic string: mutex_destroy: not owner, lp=c40babc01928 owner=c40bab6d0a80 thread=c40bc9c85540
==== panic thread: 0xc40bc9c85540 ==== CPU: 216 ==== ==== panic user (LWP_SYS) thread: 0xc40bc9c85540 PID: 1103 on CPU: 216 ==== cmd: /usr/lib/hotplugd uname: door_create_func fmri: svc:/system/hotplug:default t_procp: 0xc40bc92d6268 p_as: 0xc40bba584b00 size: 6152192 RSS: 3334144 a_hat: 0xc40bba6e6200 cnum: CPU0:365/4907 CPU80:389/32633 CPU160:175/2281 CPU240:183/17509 cpusran: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319 p_zone: 0x2089e1f8 (global) t_stk: 0x2a117545ad0 sp: 0x20510671 t_stkbase: 0x2a11753e000 t_pri: 0 (TS) pctcpu: 4.975992 t_transience: 0 t_wkld_flags: 0 t_lwp: 0xc40baf7c6120 t_tid: 2 machpcb: 0x2a117545ad0 lwp_ap: 0x2a117545bc0 t_mstate: LMS_SYSTEM ms_prev: LMS_SYSTEM ms_state_start: 0.000460680 seconds earlier ms_start: 11 days 13 hours 1.014267038 seconds earlier t_cpupart: 0x205114b0(0) last CPU: 216 idle: 9057023 nsec (0.009057023s) start: Sat May 20 22:00:07 2017 age: 997239 seconds (11 days 13 hours 39 seconds) t_state: TS_ONPROC t_flag: 0x1800 (T_PANIC|T_LWPREUSE) t_proc_flag: 0x100 (TP_MSACCT) t_schedflag: 0x8013 (TS_LOAD|TS_DONT_SWAP|TS_SIGNALLED|TS_WKLD_PERM) t_acflag: 3 (TA_NO_PROCESS_LOCK|TA_BATCH_TICKS) p_flag: 0x42000000 (SMSACCT|SMSFORK)
pc: unix:panicsys+0x40: call unix:setjmp
void unix:panicsys+0x40((const char *)0x1011ac18, (va_list)0x2a117544ee8, (struct regs *)0x20511020, (int)1, 0x9980001601, , , , , , , , 0x1011ac18, 0x2a117544ee8) unix:vpanic_common+0x78(0x1011ac18, 0x2a117544ee8, 0, 0x2084f010, 0x1f, 0x11) void unix:panic+0x1c((const char *)0x1011ac18, (void *)0x1011acc8, 0xc40babc01928, 0xc40bab6d0a80, 0xc40bc9c85540, 0, ...) unix:mutex_panic((char *)0x1011acc8, (mutex_impl_t *)0xc40babc01928) - frame recycled void unix:mutex_destroy+0x120((kmutex_t *)0xc40babc01928) void emlxs:emlxs_lock_destroy+0xb4((emlxs_hba_t *)0xc40babc00000) void emlxs:emlxs_driver_remove+0x138((dev_info_t *)0x4007dd0c418, (uint32_t)0xffffffff, (uint32_t)0) int emlxs:emlxs_hba_detach+0x234((dev_info_t *)0x4007dd0c418) int emlxs:emlxs_detach+0x19c((dev_info_t *)0x4007dd0c418, (ddi_detach_cmd_t)0) int genunix:devi_detach+0x9c((dev_info_t *), (ddi_detach_cmd_t)0) int genunix:detach_node+0x68((dev_info_t *)0x4007dd0c418, (uint_t)0x40010) int genunix:i_ndi_unconfig_node+0x140((dev_info_t *)0x4007dd0c418, (ddi_node_state_t)4, (uint_t)0x40010) int genunix:i_ddi_detachchild+0x18((dev_info_t *)0x4007dd0c418, (uint_t)0x40010) int genunix:devi_detach_node+0xe4((dev_info_t *)0x4007dd0c418, (uint_t)0x40010, (int *)0) int genunix:ndi_devi_offline+0x17c((dev_info_t *)0x4007dd0c418, (uint_t)0x10) int genunix:ddihp_cn_change_children_state+0x178((ddi_hp_cn_handle_t *)0xc40ba0a40fb0, (boolean_t)0) int genunix:ddihp_cn_pre_change_state+0x38((ddi_hp_cn_handle_t *)0xc40ba0a40fb0, (ddi_hp_cn_change_state_arg_t *)0x2a117545938) int genunix:ddihp_connector_ops+0x24((ddi_hp_cn_handle_t *)0xc40ba0a40fb0, (ddi_hp_op_t), (void *)0x2a117545938, (void *)) int genunix:ddihp_modctl+0x240((int)2, (char *)0xc41352766080, (char *)0xc40f9d230a00, (uintptr_t)0xf870bcec, (uintptr_t)0) int genunix:modctl_hp+0xf4((int)0, (const char *)0x9b5d28, (char *)0x9b5d80, (uintptr_t)0xf870bcec, (uintptr_t)0) int genunix:modctl+0x520((int), (uintptr_t), (uintptr_t)0x9b5d28, (uintptr_t)0x9b5d80, (uintptr_t)0xf870bcec, (uintptr_t)) unix:_syscall_no_proc_exit32+0x78() -- switch to user thread's user stack --
Changes
Prior to attempting to offline the Emulex HBA, an adapter hardware error occurred on Emulex HBA. Status indicates a parity error occurred. On first occurrence of the adapter hardware error where status indicates a parity error occurred, recommended action is to clear the FMA event and reboot the system to recover the Emulex HBA. Upon subsequent occurrence, the Emulex HBA should be replaced.
Jun 2 14:07:41 server1 emlxs: [ID 349649 kern.info] [13.1223]emlxs20: ERROR: 420: Adapter hardware error. (HS_FFER1 cleared) Jun 2 14:07:41 server1 emlxs: [ID 349649 kern.info] [13.123D]emlxs20: ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x20000000 status1=0x1e78 status2=0x168200) Jun 2 14:07:41 server1 emlxs: [ID 349649 kern.info] [ 5.0401]emlxs20: NOTICE: 710: Link down. Jun 2 14:07:44 server1 emlxs: [ID 349649 kern.info] [ 6.0994]emlxs20:WARNING: 231: Adapter shutdown. (Reboot required.) Jun 2 14:08:14 server1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical Jun 2 14:08:14 server1 EVENT-TIME: Fri Jun 2 14:08:14 EDT 2017 Jun 2 14:08:14 server1 PLATFORM: SPARC M6-32, CSN: AK00123456, HOSTNAME: server1 Jun 2 14:08:14 server1 SOURCE: eft, REV: 1.16 Jun 2 14:08:14 server1 EVENT-ID: d0187abd-f2ac-45ae-affe-ead56f002cfc Jun 2 14:08:14 server1 DESC: A problem was detected for a PCIEX device. Jun 2 14:08:14 server1 AUTO-RESPONSE: One or more device instances may be disabled Jun 2 14:08:14 server1 IMPACT: Loss of services provided by the device instances associated with this fault Jun 2 14:08:14 server1 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis. Jun 2 14:08:17 server1 genunix: [ID 846333 kern.warning] WARNING: constraints forbid retire: /pci@ac0/pci@1/pci@0/pci@8/SUNW,emlxs@0
Jul 6 03:59:34 server2 emlxs: [ID 349649 kern.info] [13.1223]emlxs1: ERROR: 420: Adapter hardware error. (HS_FFER1 cleared) Jul 6 03:59:34 server2 emlxs: [ID 349649 kern.info] [13.123D]emlxs1: ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x20000000 status1=0x1e78 status2=0x167200) Jul 6 03:59:37 server2 emlxs: [ID 349649 kern.info] [ 6.0994]emlxs1:WARNING: 231: Adapter shutdown. (Reboot required.) Jul 6 03:59:40 server2 emlxs: [ID 349649 kern.info] [13.0315]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.) Jul 6 04:00:07 server2 genunix: [ID 408114 kern.info] /pci@300/pci@1/pci@0/pci@6/SUNW,emlxs@0,1 (emlxs1) down Jul 6 04:00:07 server2 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical Jul 6 04:00:07 server2 EVENT-TIME: Thu Jul 6 04:00:07 EDT 2017 Jul 6 04:00:07 server2 PLATFORM: SPARC T5-4, CSN: AK00123456, HOSTNAME: server2 Jul 6 04:00:07 server2 SOURCE: eft, REV: 1.16 Jul 6 04:00:07 server2 EVENT-ID: 08a975e3-6839-4d1f-a600-acc5706b8c34 Jul 6 04:00:07 server2 DESC: A problem was detected for a PCIEX device. Jul 6 04:00:07 server2 AUTO-RESPONSE: One or more device instances may be disabled Jul 6 04:00:07 server2 IMPACT: Loss of services provided by the device instances associated with this fault Jul 6 04:00:07 server2 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis. Jul 6 04:00:08 server2 SC Alert: [ID 526677 daemon.alert] Fault | critical: Fault detected at time = Thu Jul 6 04:00:07 2017. The suspect component: /SYS/RCSA/PCIE1/CAR has fault.io.pciex.device-interr with probability=100. Refer to http://support.oracle.com/msg/PCIEX-8000-0A for details. Jul 6 04:00:11 server2 genunix: [ID 408114 kern.info] /pci@300/pci@1/pci@0/pci@6/SUNW,emlxs@0,1/fp@0,0 (fp21) offline Jul 6 04:00:11 server2 emlxs: [ID 349649 kern.info] [ B.0503]emlxs1: ERROR: 111: Driver detach failed. (detach: Driver busy. Driver dump active.) Jul 6 04:00:11 server2 genunix: [ID 912663 kern.notice] NOTICE: Device: offline failed (ignored): /pci@300/pci@1/pci@0/pci@6/SUNW,emlxs@0,1
May 31 02:24:14 server3 emlxs: [ID 349649 kern.info] [13.1228]emlxs1: ERROR: 420: Adapter hardware error. (HS_FFER1 cleared) May 31 02:24:14 server3 emlxs: [ID 349649 kern.info] [13.1242]emlxs1: ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x20000000 status1=0x1e78 status2=0x168200) May 31 02:24:14 server3 emlxs: [ID 349649 kern.info] [ 5.0401]emlxs1: NOTICE: 710: Link down. May 31 02:24:17 server3 emlxs: [ID 349649 kern.info] [ 6.0999]emlxs1:WARNING: 231: Adapter shutdown. (Reboot required.) May 31 02:24:20 server3 emlxs: [ID 349649 kern.info] [13.0318]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.) May 31 02:24:47 server3 genunix: [ID 408114 kern.info] /pci@380/pci@1/pci@0/pci@8/SUNW,emlxs@0,1 (emlxs1) down May 31 02:24:47 server3 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical May 31 02:24:47 server3 EVENT-TIME: Wed May 31 02:24:47 PDT 2017 May 31 02:24:47 server3 PLATFORM: unknown, CSN: unknown, HOSTNAME: server3 May 31 02:24:47 server3 SOURCE: eft, REV: 1.16 May 31 02:24:47 server3 EVENT-ID: e7b7843e-f11b-449a-8506-ae8f7c3f6bd3 May 31 02:24:47 server3 DESC: A problem was detected for a PCIEX device. May 31 02:24:47 server3 AUTO-RESPONSE: One or more device instances may be disabled May 31 02:24:47 server3 IMPACT: Loss of services provided by the device instances associated with this fault May 31 02:24:47 server3 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis. May 31 02:24:58 server3 genunix: [ID 846333 kern.warning] WARNING: constraints forbid retire: /pci@380/pci@1/pci@0/pci@8/SUNW,emlxs@0,1
Cause
Upon execution of luxadm -e offline, hotplug disable, or cfgadm -c unconfigure for a SLI3 HBA, the emlxs driver attempts destroy of all locks for detach. When emlxs attempts mutex_destroy of SLI4 EMLXS_QUE_LOCK for SLI3 HBA, panic occurred since the queue lock is specific to SLI4 HBA only.
Solution
Workaround: An outage should be scheduled to clear the FMA event and reboot the system to recover the Emulex HBA.
NOTE: Broadcom ECD is insisting OneCommand Manager version 11.1.218.18-1 or later must be installed to capture valid firmware dumps from Emulex HBAs. When an event such as a parity error triggers a firmware dump, the firmware dump files will automatically be stored in /opt/ELXocm/Dump by the OneCommand Manager daemon /opt/ELXocm/elxhbamgrd.
Fix for the emlxs driver has been integrated into Solaris 11.3 SRU 11.3.28.4.0 (and greater) which will confine mutex_init and mutex_destroy of SLI specific locks to that specific HBA only.
SLI3 specific lock confined to SLI3 HBA only. SLI4 specific lock confined to SLI4 HBA only.
References <BUG:25354689> - HOST PANICKED WITH "MUTEX_DESTROY: NOT OWNER" WHEN RUNNING "HOTPLUG DISABLE" <NOTE:1602837.1> - FC HBA Emlxs ERROR: 420: Adapter Hardware Error. (Host Error Attention: Status=0x40000000 <NOTE:2025952.1> - reboot is required to clear emulex hba issue
Attachments
This solution has no attachment
|