Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1952091.1
Update Date:2016-01-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1952091.1 :   Fabric Interconnect 2x8 FC IO Module Repeatedly Crashing  


Related Items
  • Oracle Fabric Interconnect F1-15
  •  
  • Oracle Fabric Interconnect F1-4
  •  
  • Oracle Virtual Compute Appliance X4-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Oracle Virtual Networking
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-9956195031>

Applies to:

Oracle Fabric Interconnect F1-4 - Version All Versions to All Versions [Release All Releases]
Oracle Fabric Interconnect F1-15 - Version All Versions to All Versions [Release All Releases]
Oracle Virtual Compute Appliance X4-2 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

2x8 FC IO Module repeatedly crashing.  After generating the 'get-log-files -cores' diagnostic log script on the Fabric Interconnect to get all 'cores' files along with normal diagnostic files, find something similar to the below, note the diag and dmsg files for fccard_3 below:

Example:

diag_fccard_3_ts_1412259529  dmesg.1.gz                   ib.log                       opensm.log                   syslog.log.1.gz              tech-support                 wtmp.1.gz
cli.log                      diag_fccard_3_ts_1413805123  dmesg.2.gz                   ib.log.1.gz                  ph.log                       syslog.log.2.gz              upgrade.log                  xc_xsm.3267.core
createdb.log                 diag_fccard_3_ts_1414126764  dmesg.3.gz                   install.log                  phonehome.log                syslog.log.3.gz              upgrade_sw.log               xdsd.log
daemon.log                   diag_fccard_3_ts_1414597536  dmesg.4.gz                   kern.log                     postgresql.log               syslog.log.4.gz              user-debug.log               xms.log
dhcp3.log                    diag_fccard_3_ts_1417220734  dmsg_iocard-3_ts40_0         lilo.log                     stopdb.log                   syslog.log.5.gz              user-debug.log.1.gz          xvnd.log
diag_fccard_3_ts_1410892807  diag_fccard_3_ts_1417319666  dpkg.log                     logerrors.log                syslog.log                   systemcontrolle.2880.core    user.log

 

 Looking at the contents of dmsg_iocard-3_ts40_0 reveals something similar to:

<4>[42949388.000000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4020000) mask((0x377ffff)
<4>[42949388.010000]  0000:00:02.0: Unrecognized ISP -- !
<1>[42949388.020000] CPU 0 Unable to handle kernel paging request at virtual address 00000000, epc == 80201c98, ra == 80236bb0
<4>[42949388.030000] Oops[#1]:
<4>[42949388.030000] Cpu 0
<4>[42949388.030000] $ 0   : 00000000 80350000 00000009 80327849
<4>[42949388.030000] $ 4   : 00000000 8023b294 00000003 00000000
<4>[42949388.030000] $ 8   : ffffffff 807fbe69 00000000 00000000
<4>[42949388.030000] $12   : fffffffb ffffffff 0000000a 00000000
<4>[42949388.030000] $16   : 00000009 850001f0 80356730 850001f0
<4>[42949388.030000] $20   : 00000000 00000000 8035679c 00000000
<4>[42949388.030000] $24   : 00000002 807fbd5c
<4>[42949388.030000] $28   : 807fa000 807fbe18 00000000 80236bb0
<4>[42949388.030000] Hi    : 000000c7
<4>[42949388.030000] Lo    : 79aef000
<4>[42949388.030000] epc   : 80201c98 strlen+0x0/0x24     Not tainted
<4>[42949388.030000] ra    : 80236bb0 make_class_name+0x2c/0xa0
<4>[42949388.030000] Status: 1000e803    KERNEL EXL IE
<4>[42949388.030000] Cause : 00800008
<4>[42949388.030000] BadVA : 00000000
<4>[42949388.030000] PrId  : 0001800a
<4>[42949388.030000] Modules linked in: mod_qla2500 mod_qla25xx mod_vh4 mod_xt3
<4>[42949388.030000] Process insmod (pid: 529, threadinfo=807fa000, task=878cf5d0)
<4>[42949388.030000] Stack : 85000000 85000028 85000000 85000000 850001f8 80356794 80237294 8023aa08
<4>[42949388.030000]         80202ebc 85000028 807fbe68 801b0020 850001f0 85000028 85000000 85000000
<4>[42949388.030000]         0000dead c008f0a8 10010504 802373b0 85000030 00000001 deadbeef 85000000
<4>[42949388.030000]         850000e8 8023fee4 801259ec c00fe800 00000000 c01194ac 85000280 00000001
<4>[42949388.030000]         deadbeef c009ac6c c0100000 c00fddc8 807e18e4 00000004 00000000 c01194ac
<4>[42949388.030000]         ...
<4>[42949388.030000] Call Trace:
<4>[42949388.030000]  [<80237294>] class_device_del+0x1a0/0x2a8
<4>[42949388.030000]  [<8023aa08>] attribute_container_device_trigger+0x8c/0x260
<4>[42949388.030000]  [<80202ebc>] sprintf+0x28/0x34
<4>[42949388.030000]  [<801b0020>] sys_ioprio_set+0x238/0x26c
<4>[42949388.030000]  [<c008f0a8>] RD_REG_CONFIG_DWORD_I+0x0/0xb0 [mod_qla25xx]
<4>[42949388.030000]  [<802373b0>] class_device_unregister+0x14/0x2c
<4>[42949388.030000]  [<8023fee4>] scsi_remove_host+0x118/0x194
<4>[42949388.030000]  [<801259ec>] printk+0x20/0x2c
<4>[42949388.030000]  [<c009ac6c>] qla2x00_xg_probe_one+0x298/0x6b8 [mod_qla25xx]
<4>[42949388.030000]  [<c009a9d4>] qla2x00_xg_probe_one+0x0/0x6b8 [mod_qla25xx]
<4>[42949388.030000]  [<801259cc>] printk+0x0/0x2c
<4>[42949388.030000]  [<c009a9d4>] qla2x00_xg_probe_one+0x0/0x6b8 [mod_qla25xx]
<4>[42949388.030000]  [<c00400a0>] qla24xx_init+0xa0/0xf8 [mod_qla2500]
<4>[42949388.030000]  [<8014b054>] sys_init_module+0x71c/0x8e4
<4>[42949388.030000]  [<8010b5e0>] stack_done+0x20/0x3c
<4>[42949388.030000]  [<8010b5e0>] stack_done+0x20/0x3c
<4>[42949388.030000]  [<8026006e>] cfi_staa_lock+0x142/0xb2c
<4>[42949388.030000]
<4>[42949388.030000]
<4>[42949388.030000] Code: 00001821  03e00008  00601021 <80820000> 10400005  00801821  24630001  80620000  1440fffd

 

 -bash-3.2$ grep -i "IOCARD=3" user.log

 

Nov 28 19:27:45 iop-3 chassisAgt[547]: [NOTICE] chassisagt vhba2x8g-3 [IOCARD=3] (proc_equipmentStateMsg) component=fiberChannelChip, state=operStateUp stateQual=default
Nov 28 19:27:46 fpp chassisCtr[469]: [NOTICE] chassisctr fpp-1 [chassis::cardstatechange] (reportState) [IOCARD=3] Operational state changed. OldState=operStateInitializing, NewState=operStateUp, Qualifier=default
Nov 29 22:54:25 iop-3 chassisAgt[547]: [NOTICE] chassisagt vhba2x8g-3 [IOCARD=3] (proc_equipmentStateMsg) component=vhChip, state=operStateCriticalFailure stateQual=vhChip
Nov 29 22:54:27 iop-3 chassisAgt[547]: [NOTICE] chassisagt vhba2x8g-3 [IOCARD=3] (collectVHCardInfo) Saving diagnostics information. File = /var/log/coredumps/diag_fccard_3_ts_1417319666
Nov 29 22:54:31 fpp chassisCtr[469]: [ERR] chassisctr fpp-1 [chassis::cardfailure] [IOCARD=3] One or more HW components on IO boards are failed. Unable to Recover. Card turned off. OldState=operStateInitializing, NewState=operStateFailed, Qualifier=vhChip Reset counter=3
Nov 29 22:54:31 fpp chassisCtr[469]: [NOTICE] chassisctr fpp-1 [IOCARD=3] Power Down
Nov 29 22:54:31 fpp chassisCtr[469]: [NOTICE] chassisctr fpp-1 [chassis::cardstatechange] (reportState) [IOCARD=3] Operational state changed. OldState=operStateUp, NewState=operStateCriticalFailure, Qualifier=vhChip
Nov 29 22:54:32 fpp chassisCtr[469]: [NOTICE] chassisctr fpp-1 [chassis::cardstatechange] (reportState) [IOCARD=3] Operational state changed. OldState=operStateCriticalFailure, NewState=operStateFailed, Qualifier=vhChip
Nov 29 22:58:23 fpp chassisCtr[470]: [NOTICE] chassisctr fpp-1 [chassis::disconnect_iocard] [IOCARD=3] Chassis controller process received disconnect event from chassis agent.

 

-bash-3.2$ grep -i iop-3 syslog.log

  

Nov 29 22:22:13 iop-3 klogd: [96515.730000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:22:20 iop-3 klogd: [96522.600000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:22:45 iop-3 klogd: [96547.760000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:25:01 iop-3 klogd: [96683.010000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:25:32 iop-3 klogd: [96714.210000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:25:40 iop-3 klogd: [96722.020000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:27:56 iop-3 klogd: [96858.020000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:28:42 iop-3 -- MARK --
Nov 29 22:29:31 iop-3 klogd: [96952.840000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:31:14 iop-3 klogd: [97055.830000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:34:13 iop-3 klogd: [97233.970000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:35:16 iop-3 klogd: [97297.200000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:38:47 iop-3 klogd: [97508.500000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:39:45 iop-3 klogd: [97565.360000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:39:57 iop-3 klogd: [97577.200000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:42:24 iop-3 klogd: [97724.300000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:43:27 iop-3 klogd: [97786.980000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:46:12 iop-3 klogd: [97951.900000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:48:44 iop-3 -- MARK --
Nov 29 22:49:45 iop-3 klogd: [98164.840000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:49:54 iop-3 klogd: [98173.690000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:51:24 iop-3 klogd: [98263.420000] vh4_pcieif_block_intr_handler():Spurious VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4000000) mask((0x377ffff)
Nov 29 22:54:24 iop-3 klogd: [98443.460000] vh4_pcieif_block_intr_handler():FATAL: VH4_PCIEIF_INT_STATUS_LINK_DOWN link_status(0x7245) misc_int_status(0x100)
Nov 29 22:54:24 iop-3 klogd: [98443.470000] vh4_post_event(): vh_msg->event: 0xa5a5  (VH4_PCIEIF_INT_STATUS_LINK_DOWN)

 Nov 29 22:54:24 iop-3 klogd: [98443.700000] vh4_pcieif_block_ql_handler():read_reg_config_dword,write_pcie_reg_config_dword,pcie_reg_config_dword not installed vh4_pcieif_block_nonfatal_handler(): Unsupported interrupt status(0x4000000)

Changes

No manual changes occurred in the environment. 

Cause

Following line indicates that the PCIe link between VH4 (FPGA) and QLogic (ASIC) is flapping:

VH4_PCIEIF_INT_STATUS_LINK_DOWN link status(0x7a44) status(0x4020000)

This is an indication of a HW issue.

Solution

 RMA the 2x8 FC IO Module using this CAP (Canned Action Plan) Document: 

References

<NOTE:1518778.1> - How to Replace a Defective I/O Module on a Oracle Fabric Interconnect Chassis?

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback