Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2271511.1
Update Date:2017-06-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  2271511.1 :   Solaris 10 or 11 May Panic with BAD TRAP: type=34 When Running luxadm probe  


Related Items
  • Solaris x64/x86 Operating System
  •  
  • Solaris Operating System
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Storage Drivers>SN-DK: Storage Drivers
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-12279128491>

Applies to:

Sun SPARC Enterprise M5000 Server - Version All Versions and later
Solaris Operating System - Version 10 3/05 and later
Solaris x64/x86 Operating System - Version 10 3/05 and later
Information in this document applies to any platform.

Symptoms

This Solaris 10 M5000 server has one Oracle 4GB Qlogic FC HBA dual port,
both ports connected to the SAN to access some EMC storage arrays

c1 qlc0 2100001b3213XXXX SG-XPCIE2FC-QF4 1.24 CONNECTED /pci@0,600000/pci@0/pci@9/SUNW,qlc@0
c2 qlc1 2101001b3233XXXX SG-XPCIE2FC-QF4 1.24 CONNECTED /pci@0,600000/pci@0/pci@9/SUNW,qlc@0,1

 

EMC LUNs are under mpxio control, ie:

10. c5t600601605CA13600C4D014649FB9E311d0 <DGC-VRAID-0533 cyl 34814 alt 2 hd 128 sec 16>
/scsi_vhci/ssd@g600601605ca13600c4d014649fb9e311
11. c5t600601605CA13600D80FE8E797B9E311d0 <DGC-VRAID-0533 cyl 49150 alt 2 hd 64 sec 12>
/scsi_vhci/ssd@g600601605ca13600d80fe8e797b9e311

....


When customer was running "luxadm probe" there was a panic and reboot :

Mar 1 16:37:05 server01 qlc: [ID 274063 kern.info] 2487=>QEL qlc(1):: qlc_24xx_status_error, check condition sense data, d_id=80300h, lun=0h
Mar 1 16:37:05 server01 70h 0h 5h 0h 0h 0h 0h ah 0h 0h 0h 0h 25h 0h 0h 0h 0h 0h
Mar 1 16:37:05 server01 qlc: [ID 723013 kern.info] 2488=>QEL qlc(1):: qlc_24xx_status_error, check condition sense data, d_id=80300h, lun=0h
Mar 1 16:37:05 server01 70h 0h 2h 0h 0h 0h 0h ah 0h 0h 0h 0h 4h 3h 0h 0h 0h 0h
Mar 1 16:37:05 server01 qlc: [ID 871608 kern.info] 2489=>QEL qlc(1):: qlc_24xx_status_error, check condition sense data, d_id=80300h, lun=0h
Mar 1 16:37:05 server01 70h 0h 2h 0h 0h 0h 0h ah 0h 0h 0h 0h 4h 3h 0h 0h 0h 0h
Mar 1 16:37:05 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0/ssd@w50060168086016f8,0 (ssd78):
Mar 1 16:37:05 server01 drive offline
Mar 1 16:37:05 server01 qlc: [ID 467002 kern.info] 2490=>QEL qlc(1):: qlc_24xx_status_error, check condition sense data, d_id=80300h, lun=0h
Mar 1 16:37:05 server01 70h 0h 2h 0h 0h 0h 0h ah 0h 0h 0h 0h 4h 3h 0h 0h 0h 0h
Mar 1 16:37:05 server01 qlc: [ID 615597 kern.info] 2491=>QEL qlc(1):: qlc_24xx_status_error, check condition sense data, d_id=80300h, lun=0h
Mar 1 16:37:05 server01 70h 0h 2h 0h 0h 0h 0h ah 0h 0h 0h 0h 4h 3h 0h 0h 0h 0h
Mar 1 16:37:05 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@0,600000/pci@0/pci@9/SUNW,qlc@0,1/fp@0,0/ssd@w50060168086016f8,0 (ssd78):
Mar 1 16:37:05 server01 drive offline
Mar 1 16:37:05 server01 qlc: [ID 463837 kern.info] 2492=>QEL qlc(1):: qlc_24xx_status_error, check condition sense data, d_id=80300h, lun=0h
Mar 1 16:37:05 server01 70h 0h 5h 0h 0h 0h 0h ah 0h 0h 0h 0h 25h 0h 0h 0h 0h 0h
Mar 1 16:37:05 server01 genunix: [ID 153160 kern.warning] WARNING: Page83 data not standards compliant DGC LUNZ 0533
Mar 1 16:37:06 server01 qlc: [ID 645980 kern.info] 146=>QEL qlc(0):: qlc_fc_services, FC_ELS_MALFORMED, cnt=420h, size=104h
Mar 1 16:37:06 server01 qlc: [ID 499708 kern.info] 147=>QEL qlc(0):: qlc_transport, failed, rval = eh
Mar 1 16:37:06 server01 qlc: [ID 477413 kern.info] 2493=>QEL qlc(1):: qlc_fc_services, FC_ELS_MALFORMED, cnt=420h, size=104h
Mar 1 16:37:06 server01 qlc: [ID 847193 kern.info] 2494=>QEL qlc(1):: qlc_transport, failed, rval = eh
Mar 1 16:37:12 server01 qlc: [ID 646236 kern.info] 148=>QEL qlc(0):: qlc_fc_services, FC_ELS_MALFORMED, cnt=420h, size=104h
Mar 1 16:37:12 server01 qlc: [ID 194107 kern.info] 149=>QEL qlc(0):: qlc_transport, failed, rval = eh
Mar 1 16:37:12 server01 qlc: [ID 477669 kern.info] 2495=>QEL qlc(1):: qlc_fc_services, FC_ELS_MALFORMED, cnt=420h, size=104h
Mar 1 16:37:12 server01 qlc: [ID 541592 kern.info] 2496=>QEL qlc(1):: qlc_transport, failed, rval = eh
Mar 1 16:37:17 server01 qlc: [ID 645216 kern.info] 150=>QEL qlc(0):: qlc_fc_services, FC_ELS_MALFORMED, cnt=420h, size=104h
Mar 1 16:37:17 server01 qlc: [ID 582066 kern.info] 151=>QEL qlc(0):: qlc_transport, failed, rval = eh
Mar 1 16:37:17 server01 qlc: [ID 477925 kern.info] 2497=>QEL qlc(1):: qlc_fc_services, FC_ELS_MALFORMED, cnt=420h, size=104h
Mar 1 16:37:17 server01 qlc: [ID 235991 kern.info] 2498=>QEL qlc(1):: qlc_transport, failed, rval = eh
Mar 1 16:37:21 server01 unix: [ID 836849 kern.notice]
Mar 1 16:37:21 server01 ^Mpanic[cpu0]/thread=3000a7946a0:
Mar 1 16:37:21 server01 unix: [ID 799565 kern.notice] BAD TRAP: type=34 rp=2a10607b4e0 addr=e mmu_fsr=0
Mar 1 16:37:21 server01 unix: [ID 100000 kern.notice]
Mar 1 16:37:21 server01 unix: [ID 839527 kern.notice] luxadm:
Mar 1 16:37:21 server01 unix: [ID 123557 kern.notice] alignment error:
Mar 1 16:37:21 server01 unix: [ID 381800 kern.notice] addr=0xe
Mar 1 16:37:21 server01 unix: [ID 101969 kern.notice] pid=18293, pc=0x11a2988, sp=0x2a10607ad81, tstate=0x9980001607, context=0xb6f
Mar 1 16:37:21 server01 unix: [ID 743441 kern.notice] g1-g7: e, 0, 0, 0, 0, 0, 3000a7946a0
Mar 1 16:37:21 server01 unix: [ID 100000 kern.notice]
Mar 1 16:37:21 server01 genunix: [ID 723222 kern.notice] 000002a10607b200 unix:die+9c (34, 2a10607b4e0, e, 0, 2a10607b2c0, c1e00000)
Mar 1 16:37:21 server01 genunix: [ID 179002 kern.notice] %l0-3: 00000000c0800000 0000000000000034 0000000000000000 0000000000000030
Mar 1 16:37:21 server01 %l4-7: 0000000000000020 0000000000000000 0000000000000007 000000000109d400
Mar 1 16:37:21 server01 genunix: [ID 723222 kern.notice] 000002a10607b2e0 unix:trap+69c (2a10607b4e0, 10000, 0, 4280804b, 0, 3000a7946a0)
Mar 1 16:37:21 server01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000060032554cd8 0000000000000034 0000060032a0a450
Mar 1 16:37:21 server01 %l4-7: 0000000000010011 0000000000010009 0000000000010000 0000000000010200
Mar 1 16:37:21 server01 genunix: [ID 723222 kern.notice] 000002a10607b430 unix:ktl0+48 (e, 10000, 0, 12698f8, 12698f4, 1953c00)
Mar 1 16:37:21 server01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000001400 0000009980001607 000000000101cb90
Mar 1 16:37:21 server01 %l4-7: 0000000000010011 0000000000010009 0000000000000000 000002a10607b4e0
Mar 1 16:37:22 server01 genunix: [ID 723222 kern.notice] 000002a10607b580 genunix:pathname_work+24 (e, 30037f30a40, 2a10607b6d8, 0, 0, 30037f30a40)
Mar 1 16:37:22 server01 genunix: [ID 179002 kern.notice] %l0-3: 0000009980001606 0000000000000016 0000009980001606 000000000101cb90
Mar 1 16:37:22 server01 %l4-7: 0000000000000000 000003001b97ce60 0000000000000000 000002a10607b590
Mar 1 16:37:22 server01 genunix: [ID 723222 kern.notice] 000002a10607b630 genunix:pathname_work+2c (3b54, 30037f30a40, 1000, 3002386fc50, 1000, 30037f30a40)
Mar 1 16:37:22 server01 genunix: [ID 179002 kern.notice] %l0-3: 00000600113b7ba0 0000000000000000 0000000000000001 0000000000000020
Mar 1 16:37:22 server01 %l4-7: 00000000018baa70 00000000018ba800 00000008c9bee0f6 0000000000000200
Mar 1 16:37:22 server01 genunix: [ID 723222 kern.notice] 000002a10607b6e0 scsi_vhci:vhci_get_client_path_list+60 (600113b7b38, 6003578a2c0, 5, 30037f30a40, 5, 0)
Mar 1 16:37:22 server01 genunix: [ID 179002 kern.notice] %l0-3: 000002a10607b790 0000000000000002 0000000000000001 fffffffffffffffd
Mar 1 16:37:22 server01 %l4-7: 0000000000000002 0000000000000005 0000000000000001 fffffffffffffffb
Mar 1 16:37:22 server01 genunix: [ID 723222 kern.notice] 000002a10607b7a0 scsi_vhci:vhci_ctl+3b0 (bd00000000, 19653f0, 600113b7b38, 100003, 12bcc00, 400)
Mar 1 16:37:23 server01 genunix: [ID 179002 kern.notice] %l0-3: 000003002e7a9140 0000030019ae1b40 0000060033228500 0000000000000000
Mar 1 16:37:23 server01 %l4-7: 0000000000000005 00000000000f9550 0000000000000000 0000000000000005
Mar 1 16:37:23 server01 genunix: [ID 723222 kern.notice] 000002a10607b8e0 genunix:fop_ioctl+20 (3003d51a880, 7801, ffbfe39c, 100003, 300398c6d38, 1270a58)
Mar 1 16:37:23 server01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000003 0000060032554cd8 00000000000f8e98 0000000000007c88
Mar 1 16:37:23 server01 %l4-7: 00000000ff2392a0 0000000000000000 0000000001947400 0000000000000001
Mar 1 16:37:23 server01 genunix: [ID 723222 kern.notice] 000002a10607b990 genunix:ioctl+184 (3, 60033312508, ffbfe39c, 0, fa9a0, 7801)
Mar 1 16:37:23 server01 genunix: [ID 179002 kern.notice] %l0-3: 0000000000000000 0000000000000000 0000000000000004 000000000000c48c
Mar 1 16:37:23 server01 %l4-7: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
Mar 1 16:37:23 server01 unix: [ID 100000 kern.notice]
Mar 1 16:37:23 server01 genunix: [ID 672855 kern.notice] syncing file systems...
Mar 1 16:37:24 server01 genunix: [ID 733762 kern.notice] 3
Mar 1 16:37:25 server01 genunix: [ID 733762 kern.notice] 2
Mar 1 16:37:26 server01 genunix: [ID 733762 kern.notice] 1
Mar 1 16:37:48 server01 last message repeated 20 times
Mar 1 16:37:49 server01 genunix: [ID 622722 kern.notice] done (not all i/o completed)
Mar 1 16:37:50 server01 genunix: [ID 111219 kern.notice] dumping to /dev/md/dsk/d1, offset 4195221504, content: kernel
Mar 1 16:39:08 server01 genunix: [ID 409368 kern.notice] ^M100% done: 234651 pages dumped, compression ratio 3.49,
Mar 1 16:41:48 server01 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_141414-01 64-bit


Notice that the level of patches is old, kernel patch 141414-01, same for qlc driver 149175-04 , and fp/fcp patches

! 119974 -09 -11 5.10: fp plug-in for cfgadm OBSOLETED BY 142088
! 141874 N/F! -10 5.10: fp patch OBSOLETED BY 143647
! 143647 N/F! -10 5.10: fp patch OBSOLETED BY 145957
! 145957 N/F! -11 5.10: fcp/fcip patch OBSOLETED BY 146232
! 146232 N/F! -23 5.10: iSCSI patch OBSOLETED BY 147143

! 142088 N/F! -03 5.10: fp.so
! 147143 N/F! -17 5.10: iSCSI patch fp/fcp/fcsm/fctl fixes
! 151615 N/F! -04 5.10: fcp patch

 

More detail from crash dump analysis

CAT(vmcore.0/10U)> panic
panic on CPU 0
panic string: BAD TRAP: type=34 rp=2a10607b4e0 addr=e mmu_fsr=0
==== panic user (LWP_SYS) thread: 0x3000a7946a0 PID: 18293 on CPU: 0 ====
cmd: luxadm probe
t_procp: 0x60032554cd8
p_as: 0x30022a6c030 size: 3981312 RSS: 2539520
a_hat: 0x30008ce9c00
cnum: CPU0:4233/2927 CPU2:3971/4559
cpusran: 0,2
p_zone: 0x19466f8 (global)
t_stk: 0x2a10607bae0 sp: 0x18c0091 t_stkbase: 0x2a106076000
t_pri: 59 (FSS) pctcpu: 0.095785
t_lwp: 0x60032a0a450 t_tid: 1
machpcb: 0x2a10607bae0
lwp_ap: 0x2a10607bbd0
t_mstate: LMS_SYSTEM ms_prev: LMS_USER
ms_state_start: 0.000131000 seconds earlier
ms_start: 0.036984500 seconds earlier
t_cpupart: 0x18c0f98(0) last CPU: 0
idle: 0 ticks (0s)
start: Tue Mar 1 15:37:18 2016
age: 0 seconds (0 seconds)
t_state: TS_ONPROC
t_flag: 0x1800 (T_PANIC|T_LWPREUSE)
t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT)
t_schedflag: 0x4003 (TS_LOAD|TS_DONT_SWAP|TS_RUNQMATCH)
p_flag: 0x4a004000 (SEXECED|SMSACCT|SAUTOLPG|SMSFORK)

pc: unix:panicsys+0x48: call unix:setjmp

void unix:panicsys+0x48((const char *)0x109d520, (va_list)0x2a10607b288, (struct regs *)0x18c0a60, (int)1, 0x1604, , , , , , , , 0x109d520, 0x2a10607b288)
unix:vpanic_common+0x78(0x109d520, 0x2a10607b288, 0x1132000, 8, 8, 0xa8)
void unix:panic+0x1c((const char *)0x109d520, (void *)0x34, 0x2a10607b4e0, 0xe, 0, 0x11a2988, ...)
int unix:die+0x9c((unsigned)0x34, (struct regs *)0x2a10607b4e0, (caddr_t)0xe, (uint_t)0)
void unix:trap+0x69c((struct regs *)0x2a10607b4e0, (caddr_t)0xe, (uint32_t), (uint32_t))
unix:ktl0+0x48()
-- trap data type: 0x34 (memory address not aligned) rp: 0x2a10607b4e0 LEAF --
addr: 0xe
pc: 0x11a2988 genunix:ddi_get_parent+4: ldx [%o0], %o0
npc: 0x11a42f0 genunix:pathname_work+0x2c: call genunix:pathname_work
global: %g1 0xe
%g2 0 %g3 0
%g4 0 %g5 0
%g6 0 %g7 0x3000a7946a0
out: %o0 0xe %o1 0x10000
%o2 0 %o3 0x12698f8
%o4 0x12698f4 %o5 0x1953c00
%sp 0x2a10607ad81 %o7 0x11a42e8
loc: %l0 0x9980001606 %l1 0x16
%l2 0x9980001606 %l3 0x101cb90
%l4 0 %l5 0x3001b97ce60
%l6 0 %l7 0x2a10607b590
in: %i0 0xe %i1 0x30037f30a40
%i2 0x2a10607b6d8 %i3 0
%i4 0 %i5 0x30037f30a40
%fp 0x2a10607ae31 %i7 0x11a42f0
<leaf trap>dev_info_t *genunix:ddi_get_parent+4((dev_info_t *)0xe)
char *genunix:pathname_work+0x24((dev_info_t *)0xe, (char *)0x30037f30a40)
char *genunix:pathname_work+0x2c((dev_info_t *)0x3b54, (char *)0x30037f30a40)
genunix:ddi_pathname((dev_info_t *), (char *)0x30037f30a40) - frame recycled
int scsi_vhci:vhci_get_client_path_list+0x60((dev_info_t *)0x600113b7b38, (sv_path_info_t *)0x30037f30a40?, (uint_t)5)
int scsi_vhci:vhci_ctl+0x3b0((dev_t), (int), (intptr_t), (int), (cred_t *), (int *))
specfs:spec_ioctl((struct vnode *)0x3003d51a880, (int)0x7801, (intptr_t)0xffbfe39c, (int)0x100003, (struct cred *)0x300398c6d38, (int *)) - frame recycled
int genunix:fop_ioctl+0x20((vnode_t *)0x3003d51a880, (int)0x7801, (intptr_t)0xffbfe39c, (int)0x100003, (cred_t *), (int *)0x2a10607badc)
int genunix:ioctl+0x184((int), (int), (intptr_t))
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --

 

Cause

Customer is hitting this bug
Bug 22321305 - panic due to vhci_get_client_path_list passing wrong path info
 
Very similar panic stack to Bug 22321305 - panic due to vhci_get_client_path_list passing wrong path info
and same panic stack as reported on old bug (closed as not reproduced)
Bug 15753793 : SUNBT7109871 panic due to vhci_get_client_path_list passing wrong path info. to ddi_get_parent

The RCA showed a problem with fcp driver, very unlikely to hit this issue, it seems this should only happen if the FC HBA is either
hot-unplugged (ie offlined one FC HBA using 'hotplug offline') or DR-ed out of the system.

Solution

The fix for this Bug 22321305 will be integrated on the next Solaris 11.3 SRU 21 version

Also, the fix will be backported to Solaris 10, we will update this document with the Solaris 10 patch that will solve this issue.

Note. If you face the same problem and cannot wait for the release of the official Solaris 10 patch or Solaris 11 SRU version that will include the fix, please contact with Oracle Support opening a new SR.

 

References

<BUG:22321305> - PANIC DUE TO VHCI_GET_CLIENT_PATH_LIST PASSING WRONG PATH INFO

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback