Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2235315.1
Update Date:2018-03-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  2235315.1 :   Solaris Panic BAD TRAP - occurred in module "qlc" due to a NULL pointer dereference  


Related Items
  • Sun SPARC Enterprise M3000 Server
  •  
  • Qlogic FC HBA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-14315005841>

Applies to:

Sun SPARC Enterprise M3000 Server - Version All Versions and later
Qlogic FC HBA - Version All Versions and later
Information in this document applies to any platform.

Symptoms

This is a Solaris 11.1 GA server with two Oracle Qlogic 8GB FC HBAs

The server panics, and before that we can see Loop OFFLINE / ONLINE errors on one FC HBA port :

....
Feb 16 15:28:27 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop OFFLINE
Feb 16 15:28:27 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop ONLINE
Feb 16 15:28:27 server01 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to ef failed state=Packet Transport error, reason=No Connection
Feb 16 15:28:27 server01 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to ef failed. state=e reason=5.
Feb 16 15:28:27 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop OFFLINE
Feb 16 15:28:37 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop ONLINE
Feb 16 15:28:37 server01 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to ef failed state=Packet Transport error, reason=No Connection
Feb 16 15:28:37 server01 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to ef failed. state=e reason=5.
Feb 16 15:28:37 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop OFFLINE
Feb 16 15:28:42 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop ONLINE
Feb 16 15:28:42 server01 fp: [ID 517869 kern.info] NOTICE: fp(0): PLOGI to ef failed state=Packet Transport error, reason=No Connection
Feb 16 15:28:42 server01 fctl: [ID 517869 kern.warning] WARNING: fp(0)::PLOGI to ef failed. state=e reason=5.
Feb 16 15:28:42 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop OFFLINE
Feb 16 15:28:42 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop ONLINE
Feb 16 15:28:42 server01 qlc: [ID 439991 kern.info] NOTICE: Qlogic qlc(2,0): Loop OFFLINE
Feb 16 15:28:44 server01 unix: [ID 836849 kern.notice]
Feb 16 15:28:44 server01 ^Mpanic[cpu2]/thread=2a100577c60:
Feb 16 15:28:44 server01 unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=2a1005776a0 addr=10 mmu_fsr=0 occurred in module "qlc" due to a NULL pointer dereference
Feb 16 15:28:44 server01 unix: [ID 100000 kern.notice]
Feb 16 15:28:44 server01 unix: [ID 839527 kern.notice] sched:
Feb 16 15:28:44 server01 unix: [ID 520581 kern.notice] trap type = 0x31
Feb 16 15:28:44 server01 unix: [ID 381800 kern.notice] addr=0x10
Feb 16 15:28:44 server01 unix: [ID 101969 kern.notice] pid=0, pc=0x7b6649cc, sp=0x2a100576f41, tstate=0x9980001601, context=0x0
Feb 16 15:28:44 server01 unix: [ID 743441 kern.notice] g1-g7: 0, 64003afde000, 2427, ffffffffffefffff, 406, 0, 2a100577c60
Feb 16 15:28:44 server01 unix: [ID 100000 kern.notice]
Feb 16 15:28:44 server01 genunix: [ID 723222 kern.notice] 000002a1005773f0 unix:die+7c (31, 2a1005776a0, 10, 0, 0, 10ac000)
Feb 16 15:28:44 server01 genunix: [ID 702911 kern.notice] %l0-3: 0000000000000031 0000000001000000 0000000000002000 00000000010ac3d8
Feb 16 15:28:44 server01 %l4-7: 00000000010ac000 0000000000000000 0000000000000005 000002a1005774b0
Feb 16 15:28:45 server01 genunix: [ID 723222 kern.notice] 000002a1005774d0 unix:trap+a40 (2a1005776a0, f25a6010, 1fff, 0, 1c00, 4420f8)
Feb 16 15:28:45 server01 genunix: [ID 702911 kern.notice] %l0-3: 0000000000000010 0000000000000031 00000000c1780000 0000000000000001
Feb 16 15:28:45 server01 %l4-7: 000000007b6960e2 0000000000000005 0000000000000001 0000000000000000
Feb 16 15:28:45 server01 genunix: [ID 723222 kern.notice] 000002a1005775f0 unix:ktl0+48 (64003afde528, 0, ffffffffffffffff, 0, 64003b02cd65, 0)
Feb 16 15:28:45 server01 genunix: [ID 702911 kern.notice] %l0-3: 0000000000000002 0000000000001400 0000009980001601 000000000101acb0
Feb 16 15:28:45 server01 %l4-7: 000000007b69743c 0000000001076c00 0000000000000000 000002a1005776a0
Feb 16 15:28:45 server01 genunix: [ID 562518 kern.notice] 000002a100577740 800 (64003b134a40, 2a100577c60, 40002332000, 640062f1eb00, 22e, fffffffffeffffff)
Feb 16 15:28:45 server01 genunix: [ID 702911 kern.notice] %l0-3: 000064003afdea60 000000000000022e 0000000000000000 000000000000022f
Feb 16 15:28:45 server01 %l4-7: 0000000000000002 0000000000001170 000064003afde000 00000000ffff7c00
Feb 16 15:28:45 server01 genunix: [ID 723222 kern.notice] 000002a1005777f0 qlc:qlc_task_thread+318 (64003b134a40, 406, 100000, 2000, 64003afde000, 1000)
Feb 16 15:28:46 server01 genunix: [ID 702911 kern.notice] %l0-3: 000064003afde572 000064003afde550 000064003afde000 0000000000000406
Feb 16 15:28:46 server01 %l4-7: 0000000000000406 0000000000002427 000064003afde000 0000000000000001
Feb 16 15:28:46 server01 genunix: [ID 723222 kern.notice] 000002a1005778b0 qlc:qlc_driver_thread+2c (64003b134a40, 6400392390d8, 7b6637c4, 0, 64003afde000, 64003afde568)
Feb 16 15:28:46 server01 genunix: [ID 702911 kern.notice] %l0-3: 000064003afde000 000064003afde570 0000000000000406 0000000000000006
Feb 16 15:28:46 server01 %l4-7: 0000000000000006 0000000000000006 0000000000000000 0000000000000001
Feb 16 15:28:46 server01 genunix: [ID 723222 kern.notice] 000002a100577960 genunix:taskq_thread+3a8 (fff7fc00, 6400391e0c08, 22a93172b8, 6400391e0c3a, 6400391e0c3c, 6400392390d8)
Feb 16 15:28:46 server01 genunix: [ID 702911 kern.notice] %l0-3: 0000000000080000 0000000000010000 00006400391e0c38 0000000000000001
Feb 16 15:28:46 server01 %l4-7: 00006400391e0c28 00006400391e0c78 00006400391e0c30 00000000fffeffff
Feb 16 15:28:46 server01 unix: [ID 100000 kern.notice]
Feb 16 15:28:46 server01 genunix: [ID 672855 kern.notice] syncing file systems...
Feb 16 15:28:46 server01 genunix: [ID 904073 kern.notice] done
Feb 16 15:28:47 server01 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Feb 16 15:29:01 server01 genunix: [ID 100000 kern.notice]
Feb 16 15:29:01 server01 genunix: [ID 665016 kern.notice] ^M100% done: 257334 pages dumped,
Feb 16 15:29:01 server01 genunix: [ID 851671 kern.notice] dump succeeded
Feb 16 15:31:37 server01 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 Version 11.1 64-bit

 

On the crash dump analysis we can see more in detail the panic stack, trap on qlc function qlc:qlc_abort_queues+0x88

CAT(vmcore.1139/11U)> panic
panic on CPU 7
panic string: BAD TRAP: type=31 rp=2a1007f76a0 addr=10 mmu_fsr=0 occurred in module "qlc" due to a NULL pointer dereference
==== panic kernel thread: 0x2a1007f7c60 PID: 0 on CPU: 7 ====
cmd: sched(qlc_2_driver_thread)
t_procp: 0x18a9a40 (proc_sched)
p_as: 0x18aba18 (kas)
p_zone: 0x19ac658 (global)
t_stk: 0x2a1007f7a50 sp: 0x191f3e1 t_stkbase: 0x2a1007f2000
t_pri: 60 (SYS) pctcpu: 0.000355
t_transience: 10 (TRANSIENT) t_wkld_flags: 0
t_cpupart: 0x19200c0(0) last CPU: 7
idle: 60044900 hrticks (0.060044900s)
start: Thu Feb 16 07:31:58 2017
age: 93574 seconds (1 days 1 hours 59 minutes 34 seconds)
t_state: TS_ONPROC
t_flag: 0x10808 (T_TALLOCSTK|T_PANIC|T_PUSHPAGE)
t_proc_flag: 0 (none set)
t_schedflag: 3 (TS_LOAD|TS_DONT_SWAP)
t_acflag: 0 (none set)
p_flag: 1 (SSYS)

pc: unix:panicsys+0x48: call unix:setjmp

void unix:panicsys+0x48((const char *)0x10ac388, (va_list)0x2a1007f7478, (struct regs *)0x191fda0, (int)1, 0x9900001602, , , , , , , , 0x10ac388, 0x2a1007f7478)
unix:vpanic_common+0x78(0x10ac388, 0x2a1007f7478, 0x48, 9, 0x80000000, 0x7fffe2b1)
void unix:panic+0x1c((const char *)0x10ac388, (void *)0x31, 0x2a1007f76a0, 0x10, 0, 0x64003a45df90, 0x10ac3d8, ...)
int unix:die+0x7c((unsigned)0x31, (struct regs *)0x2a1007f76a0, (caddr_t)0x10, (uint_t)0)
void unix:trap+0xa40((struct regs *)0x2a1007f76a0, (caddr_t)0x10, (uint32_t), (uint32_t))
unix:ktl0+0x48()
-- trap data type: 0x31 (data access MMU miss) rp: 0x2a1007f76a0 --
addr: 0x10
pc: 0x7b6649cc qlc:qlc_abort_queues+0x88: ldx [%o1 + 0x10], %i1
npc: 0x7b6649d0 qlc:qlc_abort_queues+0x8c: brz,pn %i1, qlc:qlc_abort_queues+0xf8
global: %g1 0
%g2 0x64003a3ec000 %g3 0x2427
%g4 0xffffffffffefffff %g5 0x406
%g6 0 %g7 0x2a1007f7c60
out: %o0 0x64003a3ec528 %o1 0
%o2 0xffffffffffffffff %o3 0
%o4 0x64003a436b15 %o5 0
%sp 0x2a1007f6f41 %o7 0x800
loc: %l0 0x64003a3eca60 %l1 0x69
%l2 0 %l3 0x6a
%l4 2 %l5 0x348
%l6 0x64003a3ec000 %l7 0xffff7c00
in: %i0 0x64003a546f40 %i1 0x2a1007f7c60
%i2 0x40002332000 %i3 0x6400614e5c00
%i4 0x69 %i5 0xfffffffffeffffff
%fp 0x2a1007f6ff1 %i7 0x7b663c54
<trap>void qlc:qlc_abort_queues+0x88((qlc_p_info_t *)0x64003a546f40)
void qlc:qlc_task_thread+0x318((qlc_p_info_t *)0x64003a546f40)
void qlc:qlc_driver_thread+0x2c((void *)0x64003a546f40)
void genunix:taskq_thread+0x3a8((void *)0x6400391e0c08)
unix:thread_start+4()
-- end of kernel thread's stack --

CAT(vmcore.1139/11U)>

 

 

Cause

There are two different issues here:

1. There is a HW link problem between FC HBA port and the FC switch where this port is connected , due to that we are getting loop offline/online errors

2. We are hitting the following bug (all of them related to the same RCA), qlc driver causes the panic when the link down occurs :
Bug 16174012 SUNBT7199879 qlc driver causes the panic when the link down occurs --> sub bug of 15817320
Bug 15973480 QLC DRIVER CAUSES THE PANIC WHEN THE LINK DOWN OCCURS S10U11
Bug 15817320 BACKPORT 16174012 TO 11.2 QLC DRIVER PANIC
Bug 16868908 bad trap panic in module qlc

 

Solution

Above bugs have been fixed on

Solaris 10 SPARC : qlc patch <SunPatch:149175-05> (or greater)
Solaris 10 x86 : qlc patch <SunPatch:149176-05> (or greater)
Solaris 11 : Oracle Solaris 11.2 SRU 11.1.19.6.0 (or greater)

References

<BUG:16174012> - SUNBT7199879 QLC DRIVER CAUSES THE PANIC WHEN THE LINK DOWN OCCURS.
<BUG:15817320> - BACKPORT 16174012 TO 11.2 QLC DRIVER PANIC
<BUG:15973480> - QLC DRIVER CAUSES THE PANIC WHEN THE LINK DOWN OCCURS S10U11
<BUG:16868908> - BAD TRAP PANIC IN MODULE QLC

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback