Asset ID: |
1-72-2377615.1 |
Update Date: | 2018-05-14 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2377615.1
:
FMA fault - NIC-8000-0Q reported on 'ixgbe' interfaces
Related Items |
- Oracle SuperCluster T5-8 Full Rack
- SPARC T5-8
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5
|
In this Document
Created from <SR 3-17103536117>
Applies to:
SPARC T5-8 - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster T5-8 Full Rack - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Symptoms
FMA reports NIC-8000-0Q faults on 'ixgbe' interface. This maybe reported on a single interface, or multiple interfaces and NIC cards. There are 2 possible fault reports that can be logged:
1) This fault reports an 'invalid_state' error detected by the driver:
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 19 14:08:08 f3c169c3-b164-49fb-8330-bf1c40f576fe NIC-8000-0Q Critical
Problem Status : open
Diag Engine : eft / 1.16
System
Manufacturer : Oracle Corporation
Name : SuperCluster T5-8
Part_Number : SuperCluster T5-8
Serial_Number : AK00XXXXX
System Component
Manufacturer : Oracle Corporation
Name : SPARC T5-8
Part_Number : 7308731
Serial_Number : AK00XXXXXX
Host_ID : XXXXXXXX
----------------------------------------
Suspect 1 of 2 :
Problem class : defect.io.nic.correctable
Certainty : 95%
Affects : mod:///mod-name=ixgbe/
Status : faulted but still in service
FRU
Status : faulty
FMRI : "pkg://solaris/driver/network/ethernet/ixgbe@0.5.11,5.11-0.175.3.22.0.3.0:20170629T155906Z
"
----------------------------------------
Suspect 2 of 2 :
Problem class : fault.io.nic.correctable
Certainty : 5%
Affects : dev:////pci@380/pci@1/pci@0/pci@a/network@0,1
Status : faulted but still in service
FRU
Status : faulty
Location : "/SYS/PCIE9"
Manufacturer : unknown
Name : unknown
Part_Number : unknown
Revision : unknown
Serial_Number : unknown
Chassis
Manufacturer : Oracle Corporation
Name : SPARC T5-8
Part_Number : 7308731
Serial_Number : AK00XXXXXX
Description : The number of correctable errors detected in the driver has
crossed the allowed threshold. A(n) invalid_state error has been
detected during driver's runtime context causing a(n) correctable
service impact.
Firmware: pba number: E70856-012; short1: 00.03 61ab0001 0.0.0
Response : One or more device instances may be disabled.
Impact : Loss of services provided by the device instances associated with
this fault.
Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
http://support.oracle.com/msg/NIC-8000-0Q for the latest service
procedures and policies regarding this diagnosis.
--
The stack for this will be noted in the associated error in 'fmdump-eV.out':
Mar 19 2018 14:07:43.608305389 ereport.io.nic.correctable
nvlist version: 0
class = ereport.io.nic.correctable
ena = 0xea2799eb1360401
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@400/pci@1/pci@0/pci@c/network@0,1
(end detector)
core = (embedded nvlist)
nvlist version: 0
reason = A(n) invalid_state error has been detected during driver's runtime context causing a(n) correctable service impact
driver_error = 0x7
driver_context = 0x1
service_impact = 0x1
device_subsystem = 0x0
driver_error_label = invalid_state
driver_context_label = runtime
service_impact_label = correctable
device_subsystem_label = unknown
driver_health = 0x0
reaped = 1
overridden = 0
throttle_threshold = 0x0
throttle_interval = 0x0
stack = [ mac`mac_fm_error_log+20 () | ixgbe`ixgbe_fm_shared_code_error+44 () | ixgbe`ixgbe_fc_autoneg+68 () | ixgbe`ixgbe_fc_enable_generic+94 () | ixgbe`ixgbe_driver_link_check+1d8 () | ixgbe`ixgbe_link_check_task+34 () | genunix`taskq_thread+3e0 () ]
(end core)
framework = (embedded nvlist)
nvlist version: 0
nvm_version = pba number: E70856-012; short1: 00.03 61ab0001 0.0.0
dcb_flags = 0x0
(end framework)
driver = (embedded nvlist)
nvlist version: 0
driver_error_message = The link is down
(end driver)
__ttl = 0x1
__tod = 0x5ab00a7f 0x244200ed
============================
2) This fault reports a 'software' error detected during the driver attach:
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 19 14:19:55 f4e0d8ba-62d1-4f8c-a5eb-83e276af6fe3 NIC-8000-0Q Critical
Problem Status : open
Diag Engine : eft / 1.16
System
Manufacturer : Oracle Corporation
Name : SPARC S7-2L
Part_Number : 35442060+1+1
Serial_Number : 1803XXXXXX
Host_ID : XXXXXXXX
----------------------------------------
Suspect 1 of 2 :
Problem class : defect.io.nic.correctable
Certainty : 95%
Affects : mod:///mod-name=ixgbe/
Status : faulted but still in service
FRU
Status : faulty
FMRI : "pkg://solaris/driver/network/ethernet/ixgbe@0.5.11,5.11-0.175.3.29.0.4.0:20180206T160622Z
"
----------------------------------------
Suspect 2 of 2 :
Problem class : fault.io.nic.correctable
Certainty : 5%
Affects : dev:////pci@302/pci@2/pci@0/pci@16/network@0
Status : faulted but still in service
FRU
Status : faulty
Location : "/SYS/MB/PCIE6"
Manufacturer : unknown
Name : unknown
Part_Number : unknown
Revision : unknown
Serial_Number : unknown
Chassis
Manufacturer : Oracle Corporation
Name : SPARC S7-2L
Part_Number : 35442060+1+1
Serial_Number : 1803XXXXXX
Description : The number of correctable errors detected in the driver has
crossed the allowed threshold. A(n) software error has been
detected during driver's attach context causing a(n) correctable
service impact.
Firmware: pba number: G45101-010; short2: 04.04.0 4.03.0 80000600
0.0.0
Response : One or more device instances may be disabled.
Impact : Loss of services provided by the device instances associated with
this fault.
Action : Use 'fmadm faulty' to provide a more detailed view of this event.
Please refer to the associated reference document at
http://support.oracle.com/msg/NIC-8000-0Q for the latest service
procedures and policies regarding this diagnosis.
--
The stack for this will be noted in the associated error in 'fmdump-eV.out':
Mar 19 2018 14:19:25.640053998 ereport.io.nic.correctable
nvlist version: 0
class = ereport.io.nic.correctable
ena = 0x3fe7b0bb8a1b801
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = dev
device-path = /pci@302/pci@2/pci@0/pci@16/network@0,1
(end detector)
core = (embedded nvlist)
nvlist version: 0
reason = A(n) software error has been detected during driver's attach context causing a(n) correctable service impact
driver_error = 0x8
driver_context = 0x2
service_impact = 0x1
device_subsystem = 0x0
driver_error_label = software
driver_context_label = attach
service_impact_label = correctable
device_subsystem_label = unknown
driver_health = 0x0
reaped = 1
overridden = 0
throttle_threshold = 0x0
throttle_interval = 0x0
psargs = /usr/lib/ldoms/ldmad
stack = [ mac`mac_fm_error_log+20 () | ixgbe`ixgbe_fm_shared_code_error+44 () | ixgbe`ixgbe_set_rar_generic+30 () | ixgbe`ixgbe_reset_hw_X540+1f4 () | ixgbe`ixgbe_solaris_reset_hw+18 () | ixgbe`ixgbe_init+8 () | ixgbe`ixgbe_attach+7d8 () | genunix`devi_attach+ec () | genunix`attach_node+b8 () | genunix`i_ndi_config_node+178 () | genunix`i_ddi_attachchild+34 () | genunix`devi_attach_node+110 () | genunix`devi_config_one+318 () | genunix`ndi_devi_config_one+d0 () | devfs`dv_find+1e4 () | devfs`devfs_lookup+1c () | genunix`fop_lookup+15c () | genunix`lookuppnvp+430 () | genunix`lookuppnatcred+118 () | genunix`lookupnameatcred+4c () | genunix`lookupnameat+20 () | genunix`vn_openat+338 () | genunix`copen+43c () ]
(end core)
framework = (embedded nvlist)
nvlist version: 0
nvm_version = pba number: G45101-010; short2: 04.04.0 4.03.0 80000600 0.0.0
(end framework)
driver = (embedded nvlist)
nvlist version: 0
driver_error_message = RAR index 128 is out of range.
(end driver)
__ttl = 0x1
__tod = 0x5aafc6ed 0x262672ee
=================================================
Changes
Customer updated to S11.3 SRU 15 or greater, or may have updated from S10 to S11.3 SRU 15 or greater.
Cause
Modifications to the 'ixgbe' driver.
Solution
There are different fixes for the 2 issues:
1) Update to S11.3 SRU 28.4 or greater
2) Fix not yet released, targeted for S11.3 SRU 30.5
References
<BUG:27675054> - NAHSUA: HIT "DEFECT.IO.NIC.CORRECTABLE" FOR TWINVILLE DURING S11U3 WARM REBOOT
<BUG:26382153> - NIC-8000-0Q FAULT SEEN FOR IXGBE CARD
Attachments
This solution has no attachment