Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2029629.1
Update Date:2018-03-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  2029629.1 :   Emlxadm Reset May Cause Solaris System to Panic : occurred in module "emlxs" due to a NULL pointer dereference  


Related Items
  • Solaris Operating System
  •  
  • SPARC T4-4
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-10826710481>

Applies to:

SPARC T4-4 - Version All Versions and later
Solaris Operating System - Version 10 3/05 and later
Information in this document applies to any platform.

Symptoms

This is a Solaris 10 T4-4 ldom with two FC HBAs LPem12002E-S assigned in Direct IO mode, emlxs patch 149173-03 , emlxs driver version v20120917-2.80.8.0

C#  INST#  PORT WWN          MODEL             FCODE    STATUS         DEVICE PATH
--  -----  --------          -----             -----    ------         -----------
c1  emlxs0 10000090faxxxxx8  LPem12002E-S      3.10a3   NOT CONNECTED  /pci@400/pci@2/pci@0/pci@3/pci@0/pci@3/SUNW,emlxs@0
c2  emlxs1 10000090faxxxxx9  LPem12002E-S      3.10a3   NOT CONNECTED  /pci@400/pci@2/pci@0/pci@3/pci@0/pci@3/SUNW,emlxs@0,1
c3  emlxs2 10000000c9xxxxx8  LPem12002E-S      3.10a3   CONNECTED      /pci@600/pci@2/pci@0/pci@0/pci@0/pci@3/SUNW,emlxs@0
c4  emlxs3 10000000c9xxxxx9  LPem12002E-S      3.10a3   NOT CONNECTED  /pci@600/pci@2/pci@0/pci@0/pci@0/pci@3/SUNW,emlxs@0,1


Server lost access on one FC HBA port with these errors:

May 30 03:33:16 server01 emlxs: [ID 349649 kern.info] [13.11EE]emlxs0:  ERROR: 420: Adapter hardware error. (HS_FFER1 cleared)
May 30 03:33:16 server01 emlxs: [ID 349649 kern.info] [13.1208]emlxs0:  ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x20000000 status1=0x1e40 status2=0x161200)
May 30 03:33:16 server01 emlxs: [ID 349649 kern.info] [ 5.03DD]emlxs0: NOTICE: 710: Link down.
May 30 03:33:19 server01 emlxs: [ID 349649 kern.info] [ 6.0901]emlxs0:WARNING: 231: Adapter shutdown. (Reboot required.)
May 30 03:33:47 server01 emlxs: [ID 349649 kern.info] [13.17AA]emlxs1:  ERROR: 530: Mailbox timeout. (HEARTBEAT: Nowait.)
May 30 03:33:49 server01 genunix: [ID 408114 kern.info] /pci@400/pci@2/pci@0/pci@3/pci@0/pci@3/SUNW,emlxs@0 (emlxs0) down
May 30 03:33:50 server01 emlxs: [ID 349649 kern.info] [ 6.0901]emlxs1:WARNING: 231: Adapter shutdown. (Reboot required.)
May 30 03:33:53 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@400/pci@2/pci@0/pci@3/pci@0/pci@3/SUNW,emlxs@0/fp@0,0/ssd@w5006016047200f5c,3 (ssd13):
May 30 03:33:54 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@400/pci@2/pci@0/pci@3/pci@0/pci@3/SUNW,emlxs@0/fp@0,0/ssd@w5006016047200f5c,2 (ssd14):
May 30 03:34:17 server01 genunix: [ID 408114 kern.info] /pci@400/pci@2/pci@0/pci@3/pci@0/pci@3/SUNW,emlxs@0,1 (emlxs1) down



Instead of rebooting the server, administrator decided to reset the port with emlxadm utility.

When reseting the port with emlxadm the system panic,
on the crash dump we can see it was triggered by emlxadm command:

CAT(vmcore.0/10V)> panic
panic on CPU 0
panic string:   BAD TRAP: type=31 rp=2a10177cff0 addr=8 mmu_fsr=0 occurred in module "emlxs" due to a NULL pointer dereference
==== panic user (LWP_SYS) thread: 0x30022aed8c0  PID: 20013  on CPU: 0 ====
cmd: ./emlxadm  <<<-----
t_procp: 0x30043d70c98
   p_as: 0x3002d4f57b8  size: 4513792  RSS: 3129344
      a_hat: 0x3000c79e040
      cnum: CPU0:282/166
      cpusran: 0,5,6,7
   p_zone: 0x198d6e0 (global)
t_stk: 0x2a10177dae0  sp: 0x1912b81  t_stkbase: 0x2a101778000
t_pri: 0 (TS)  pctcpu: 76.563042
t_transience: 0  t_wkld_flags: 2 WLKD_CPU_INTENSIVE
t_lwp: 0x300465247d0  t_tid: 1
   machpcb: 0x2a10177dae0
   lwp_ap:   0x30046524888
   t_mstate: LMS_SYSTEM  ms_prev: LMS_KFAULT
   ms_state_start: 0.000116550 seconds earlier
   ms_start: 2 minutes 9.088143380 seconds earlier
t_cpupart: 0x1913a78(0)  last CPU: 0
idle: 3580 ticks (35.80s)
start: Mon Jun  1 06:02:31 2015
age: 129 seconds (2 minutes 9 seconds)
t_state:     TS_ONPROC
t_flag:      0x1800 (T_PANIC|T_LWPREUSE)
t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT)
t_schedflag: 3 (TS_LOAD|TS_DONT_SWAP)
t_acflag:    1 (TA_NO_PROCESS_LOCK)
p_flag:      0x4a004000 (SEXECED|SMSACCT|SAUTOLPG|SMSFORK)

pc:      unix:panicsys+0x48:   call     unix:setjmp

void unix:panicsys+0x48((const char *)0x10bb508, (va_list)0x2a10177cd98, (struct regs *)0x1913550, (int)1, 0x9900001601, , , , , , , , 0x10bb508, 0x2a10177cd98)
unix:vpanic_common+0x78(0x10bb508, 0x2a10177cd98, 0xc, 0x80000000, 0x1dffd2e5, 0x80000000)
void unix:panic+0x1c((const char *)0x10bb508, (void *)0x31, 0x2a10177cff0, 8, 0, 0x60021468a10, 0x18468e0, ...)
int unix:die+0x78((unsigned)0x31, (struct regs *)0x2a10177cff0, (caddr_t)8, (uint_t)0)
void unix:trap+0x9e4((struct regs *)0x2a10177cff0, (caddr_t)8, (uint32_t), (uint32_t))
unix:ktl0+0x64()
-- trap data  type: 0x31 (data access MMU miss)  rp: 0x2a10177cff0  --
  addr: 0x8
pc:  0x7b669680 emlxs:emlxs_chipq_node_flush+0xe8:   ldx        [%o3 + %o4], %o1
npc: 0x7b669684 emlxs:emlxs_chipq_node_flush+0xec:   subcc        %o1, 0x0, %g0   ( cmp   %o1, 0x0 )
  global:                       %g1            0x1968c
        %g2            0x1968c  %g3                  4
        %g4                  0  %g5                  0
        %g6                  0  %g7      0x30022aed8c0
  out:  %o0            0x19cac  %o1                  0
        %o2            0x19cac  %o3                  0
        %o4                  8  %o5                  1
        %sp      0x2a10177c891  %o7            0x19ca0
  loc:  %l0            0x19c00  %l1                  0
        %l2            0x19ca0  %l3         0xffffffff
        %l4      0x60023c19cb0  %l5            0x19cac
        %l6            0x80000  %l7                  1
  in:   %i0      0x60023c19690  %i1                  0
        %i2      0x60023c1c9e0  %i3      0x2a10177d170
        %i4      0x60023c00000  %i5      0x2a10177d178
        %fp      0x2a10177c991  %i7         0x7b665268
<trap>uint32_t emlxs:emlxs_chipq_node_flush+0xe8((emlxs_port_t *)0x60023c1c4d8, (CHANNEL *)0, (NODELIST *)0x60023c1c9e0, (emlxs_buf_t *)0)
int emlxs:emlxs_port_offline+0x324((emlxs_port_t *)0x60023c1c4d8, (uint32_t)0xffffffff)
void emlxs:emlxs_linkdown+0x160((emlxs_hba_t *)0x60023c00000)
int emlxs:emlxs_offline+0x134((emlxs_hba_t *)0x60023c00000, (uint32_t)0)
int32_t emlxs:emlxs_reset+0x490((emlxs_port_t *), (uint32_t))
emlxs:emlxs_fca_reset((opaque_t)0x60023c1c4d8, (uint32_t)4) - frame recycled
int fp:fp_fciocmd+0x2838((fc_port_t *)0x60021780000, (intptr_t)0xffbff0e0, (int)0x100401, (fcio_t *)0x2a10177d868)
int fp:fp_ioctl+0x1b8((dev_t), (int), (intptr_t)0xffbff0e0, (int), (cred_t *)0x3004f8c5d88, (int *)0x2a10177dadc)
specfs:spec_ioctl((struct vnode *)0x30043db5cc0, (int)0x47ce, (intptr_t)0xffbff0e0, (int)0x100401, (struct cred *), (int *)0x2a10177dadc) - frame recycled
int genunix:fop_ioctl+0x2c((vnode_t *)0x30043db5cc0, (int)0x47ce, (intptr_t)0xffbff0e0, (int)0x100401, (cred_t *), (int *)0x2a10177dadc)
int genunix:ioctl+0x184((int), (int), (intptr_t))
unix:syscall_trap32+0xcc()
-- switch to user thread's user stack --

CAT(vmcore.0/10V)>

 

Solaris 10 server panic with this messages:

Mon Jun  1 06:04:40 2015|
                        | panic[cpu0]/thread=30022aed8c0:
Mon Jun  1 06:04:40 2015| BAD TRAP: type=31 rp=2a10177cff0 addr=8 mmu_fsr=0 occurred in module "emlxs" due to a NULL pointer dereference
Mon Jun  1 06:04:40 2015|
                        |
Mon Jun  1 06:04:40 2015| emlxadm:
Mon Jun  1 06:04:40 2015| trap type = 0x31
Mon Jun  1 06:04:40 2015| addr=0x8
Mon Jun  1 06:04:40 2015| pid=20013, pc=0x7b669680, sp=0x2a10177c891, tstate=0x80001604, context=0xa6
Mon Jun  1 06:04:40 2015| g1-g7: 1968c, 1968c, 4, 0, 0, 0, 30022aed8c0
Mon Jun  1 06:04:40 2015|
Mon Jun  1 06:04:40 2015| 000002a10177cd10 unix:die+78 (31, 2a10177cff0, 8, 0, 2a10177cdd0, 10bb400)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000000000000001 0000000000000031 0000000001000000 0000000000002000
                        |   %l4-7: 00000000018468e0 0000000001846800 0000000000000000 0000000080001604
Mon Jun  1 06:04:40 2015| 000002a10177cdf0 unix:trap+9e4 (2a10177cff0, 10000, 1fff, 5, 0, 1)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000000000000000 0000030043d70c98 0000000000000031 0000000000000000
                        |   %l4-7: 0000000000001c00 0000000000000001 0000000000000005 0000000000000001
Mon Jun  1 06:04:40 2015| 000002a10177cf40 unix:ktl0+64 (19cac, 0, 19cac, 0, 8, 1)
Mon Jun  1 06:04:40 2015|   %l0-3: 000000000180c000 0000000000000000 0000000080001604 00000000010202e8
                        |   %l4-7: 0000000000000020 0000000000000000 0000000000000000 000002a10177cff0
Mon Jun  1 06:04:40 2015| 000002a10177d090 19ca0 (60023c19690, 0, 60023c1c9e0, 2a10177d170, 60023c00000, 2a10177d178)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000000000019c00 0000000000000000 0000000000019ca0 00000000ffffffff
                        |   %l4-7: 0000060023c19cb0 0000000000019cac 0000000000080000 0000000000000001
Mon Jun  1 06:04:40 2015| 000002a10177d190 emlxs:emlxs_port_offline+324 (60023c1c4d8, 331, 60023c00000, 0, 1, 0)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000060023c1c9e0 0000000000000001 0000000000000000 0000000000000000
                        |   %l4-7: 0000060023c1c4d0 0000000000000000 0000000000000000 00000000708d7ae8
Mon Jun  1 06:04:40 2015| 000002a10177d280 emlxs:emlxs_linkdown+160 (60023c1c4d8, 60023c1c4d8, 0, 1c00, ffffffffffffffff, 60023c1c4d0)
Mon Jun  1 06:04:40 2015|   %l0-3: 00000000000010ac 0000000000001000 00000000fff30c1f 00000000fff30c00
                        |   %l4-7: 0000000000000331 0000000000001ef0 0000060023c010a8 0000000008e10001
Mon Jun  1 06:04:40 2015| 000002a10177d340 emlxs:emlxs_offline+134 (60023c00000, 0, 0, 60023c1c4d0, 60023c1c4d8, 1000)
Mon Jun  1 06:04:40 2015|   %l0-3: 00000600214e8dc0 0000000000000003 0000060023c010e0 00000000000010e0
                        |   %l4-7: 000000007b6d0800 000000007b6d0800 0000000000000000 0000000000000000
Mon Jun  1 06:04:40 2015| 000002a10177d3f0 emlxs:emlxs_reset+490 (60023c1c4d8, 3, 708d8400, 60023c1c4d0, 3, 60023c00000)
Mon Jun  1 06:04:40 2015|   %l0-3: 000000000020d614 000000000020d614 00000000708d8400 0000000000000000
                        |   %l4-7: 0000000000000000 000000007b6ba400 000000007b6ba400 000000000020d400
Mon Jun  1 06:04:40 2015| 000002a10177d4a0 fp:fp_fciocmd+2838 (60021780000, ffbff0e0, 0, 2a10177d868, 7bfd6434, 5a01)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000000000000048 0000000000100401 0000000000000001 0000000000005a13
                        |   %l4-7: 0000030022aed8c0 000000007b6a33a4 0000000000005800 0000000000000000
Mon Jun  1 06:04:40 2015| 000002a10177d7b0 fp:fp_ioctl+1b8 (1e00000004, 60021780000, ffbff0e0, 0, 3004f8c5d88, 2a10177dadc)
Mon Jun  1 06:04:40 2015|   %l0-3: 00000000000047ce 0000000000000000 0000000000100401 0000000000000000
                        |   %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Mon Jun  1 06:04:40 2015| 000002a10177d8d0 genunix:fop_ioctl+2c (30043db5cc0, 47ce, ffbff0e0, 100401, 600214e8dc0, 2a10177dadc)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000000000000003 0000030043d70c98 00000000012b4b28 0000000000000001
                        |   %l4-7: 0000000000000000 0000030043d716a8 00000000000000c0 0000000000000001
Mon Jun  1 06:04:40 2015| 000002a10177d990 genunix:ioctl+184 (3, 30051ed3458, ffbff0e0, 5a13, 3, 47ce)
Mon Jun  1 06:04:40 2015|   %l0-3: 0000000001941c00 000002a10177db90 0000000000000004 0000000000000000
                        |   %l4-7: 0000000000000000 0000000000000000 0000000000000001 0000000000000001
Mon Jun  1 06:04:40 2015|
Mon Jun  1 06:04:40 2015| syncing file systems...



Cause

There is a bug on emlxs driver that caused the system to panic:
Bug 21217352 - Panic occurred in module "emlxs" due to a NULL pointer dereference emlxadm reset
 

Solution

Open a Service Request with Oracle Support and request a fix for the Solaris version you have.

Emulex provided two fixes for Solaris 10 that can be obtained through Oracle Support channels.

emlxs_kit-2.80.8.5-s10-sparc.tar
emlxs_kit-2.90.16.2-s10-sparc.tar

 

As a workaround, do not use emlxadm utility to reset a failed FC HBA port.

References

<NOTE:1602837.1> - FC HBA Emlxs ERROR: 420: Adapter Hardware Error. (Host Error Attention: Status=0x40000000
<BUG:21217352> - PANIC OCCURRED IN MODULE "EMLXS" DUE TO A NULL POINTER DEREFERENCE EMLXADM RESET

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback