![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1642066.1 : ixgbe interface faulted because of message "Problem: Network adapter has been stopped because it has overheated"
FMA has faulted ixgbe interface with fault ID PCIEX-8000-0A. This is a software bug and not a hardware fault. In this Document
Created from <SR 3-8243904081> Applies to:SPARC M5-32 - Version All Versions and laterSPARC M6-32 - Version All Versions and later SPARC T5-2 - Version All Versions and later SPARC T5-4 - Version All Versions and later SPARC T5-8 - Version All Versions and later Oracle Solaris on x86-64 (64-bit) Oracle Solaris on SPARC (64-bit) Affected are all Network Interface PCIe Cards or onboard interfaces that use the Intel Twinville 10G X540 dual Ethernet controller: Dual 10-Gigabit Ethernet Base-T PCIe Gen2 PN 7014776 / 7070006 Dual 10-Gigabit Base-T PCIe 2.0 ExpressModule PN 7014780 / 7069995 SymptomsSystem reports a fault for the mother board (network interface is on mother board) or the PCIe card (network interface is a PCIe) card. # fmadm faulty
--------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Dec 12 02:32:52 c644edfb-b986-c3c1-9b9b-e4d3b0514a46 PCIEX-8000-0A Critical Problem Status : solved Diag Engine : eft / 1.16 System Manufacturer : Oracle-Corporation Name : SPARC-T5-2 Part_Number : 31731774+1+1 Serial_Number : 1234567890 Host_ID : 12345678 ---------------------------------------- Suspect 1 of 3 : Fault class : fault.io.pciex.device-interr Certainty : 40% Affects : dev:////pci@300/pci@1/pci@0/pci@1/network@0,1 Status : out of service, but associated components no longer faulty FRU Location : "/SYS/MB" Manufacturer : unknown Name : unknown Part_Number : 7063306 Revision : 04 Serial_Number : 465769T+1317UL03NT Chassis Manufacturer : Oracle Corporation Name : SPARC T5-2 Part_Number : 31731774+1+1 Serial_Number : 1234567890 Status : faulty ---------------------------------------- Suspect 2 of 3 : Fault class : fault.io.pciex.device-interr Certainty : 40% Affects : dev:////pci@300/pci@1/pci@0 Status : faulted but still in service FRU Location : "/SYS/MB" Manufacturer : unknown Name : unknown Part_Number : 7063306 Revision : 04 Serial_Number : 465769T+1317UL03NT Chassis Manufacturer : Oracle Corporation Name : SPARC T5-2 Part_Number : 31731774+1+1 Serial_Number : 1234567890 Status : faulty ---------------------------------------- Suspect 3 of 3 : Fault class : fault.io.pciex.device-interr Certainty : 20% Affects : dev:////pci@300/pci@1 Status : faulted but still in service FRU Location : "/SYS/MB" Manufacturer : unknown Name : unknown Part_Number : 7063306 Revision : 04 Serial_Number : 465769T+1234567 Chassis Manufacturer : Oracle Corporation Name : SPARC T5-2 Part_Number : 31731774+1+1 Serial_Number : 1234567890 Status : faulty Description : A problem was detected for a PCIEX device. Response : One or more device instances may be disabled Impact : Loss of services provided by the device instances associated with this fault Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis.
Link up messages of the following kind appear every hour at exactly the same time and finally a message that the network adapter has "overheated": $ grep ixgbe messages
... Dec 11 21:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 11 21:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 11 22:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 11 22:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 11 23:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 11 23:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 12 00:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 12 00:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 12 01:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Dec 12 01:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 12 02:32:22 abcde1 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe1: Problem: Network adapter has been stopped because it has overheated <---- Dec 12 02:32:22 abcde1 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe1: Action: Restart the computer. If the problem persists, power off the system and replace the adapter <---- Dec 12 02:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 12 02:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 12 02:32:52 abcde1 genunix: [ID 408114 kern.info] /pci@300/pci@1/pci@0/pci@1/network@0,1 (ixgbe1) down Dec 12 03:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 12 03:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 12 04:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 12 04:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 12 05:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Dec 12 05:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 1000 Mbps, full duplex Dec 12 06:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Dec 12 06:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 1000 Mbps, full duplex Dec 12 07:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 1000 Mbps, full duplex Dec 12 07:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 1000 Mbps, full duplex Dec 12 08:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 12 08:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex ... Dec 14 21:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 1000 Mbps, full duplex Dec 14 22:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 14 22:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 14 23:32:22 abcde1 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe1: Problem: Network adapter has been stopped because it has overheated <---- Dec 14 23:32:22 abcde1 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe1: Action: Restart the computer. If the problem persists, power off the system and replace the adapter <---- Dec 14 23:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 14 23:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 1000 Mbps, full duplex Dec 14 23:32:52 abcde1 genunix: [ID 408114 kern.info] /pci@300/pci@1/pci@0/pci@1/network@0,1 (ixgbe1) down Dec 15 00:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 15 00:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 15 01:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 15 01:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 15 02:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 15 02:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 15 03:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 15 03:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, full duplex Dec 15 04:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 15 04:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 15 05:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 15 05:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 15 06:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 15 06:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 15 07:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, full duplex Dec 15 07:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 15 08:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 15 08:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 1000 Mbps, full duplex ...
ChangesA network cable is installed but the interface is not yet configured / in use. CauseAt first glance this looks like a hardware issue with the network interface but this is a bug instead.
messages file ...
Dec 12 02:32:16 abcde1 SC Alert: [ID 438350 daemon.notice] Audit | minor: root : Open Session : object = "/SP/session/type" : value = "shell" : success Dec 12 02:32:18 abcde1 SC Alert: [ID 665947 daemon.notice] Audit | minor: root : Close Session : object = "/SP/session/type" : value = "shell" : success Dec 12 02:32:22 abcde1 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe1: Problem: Network adapter has been stopped because it has overheated <---- Dec 12 02:32:22 abcde1 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe1: Action: Restart the computer. If the problem persists, power off the system and replace the adapter <---- Dec 12 02:32:22 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe1 link up, 0 Mbps, unknown duplex Dec 12 02:32:23 abcde1 SC Alert: [ID 438350 daemon.notice] Audit | minor: root : Open Session : object = "/SP/session/type" : value = "shell" : success Dec 12 02:32:25 abcde1 mac: [ID 435574 kern.info] NOTICE: ixgbe2 link up, 0 Mbps, unknown duplex Dec 12 02:32:28 abcde1 SC Alert: [ID 438350 daemon.notice] Audit | minor: root : Open Session : object = "/SP/session/type" : value = "shell" : success Dec 12 02:32:30 abcde1 SC Alert: [ID 665947 daemon.notice] Audit | minor: root : Close Session : object = "/SP/session/type" : value = "shell" : success Dec 12 02:32:30 abcde1 last message repeated 1 time Dec 12 02:32:52 abcde1 genunix: [ID 408114 kern.info] /pci@300/pci@1/pci@0/pci@1/network@0,1 (ixgbe1) down <---- Dec 12 02:32:52 abcde1 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical <---- Dec 12 02:32:52 abcde1 EVENT-TIME: Thu Dec 12 02:32:52 CET 2013 Dec 12 02:32:52 abcde1 PLATFORM: SPARC-T5-2, CSN: 1234567890, HOSTNAME: abcde1 Dec 12 02:32:52 abcde1 SOURCE: eft, REV: 1.16 Dec 12 02:32:52 abcde1 EVENT-ID: c644edfb-b986-c3c1-9b9b-e4d3b0514a46 Dec 12 02:32:52 abcde1 DESC: A problem was detected for a PCIEX device. Dec 12 02:32:52 abcde1 AUTO-RESPONSE: One or more device instances may be disabled Dec 12 02:32:52 abcde1 IMPACT: Loss of services provided by the device instances associated with this fault Dec 12 02:32:52 abcde1 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis. Dec 12 02:32:52 abcde1 SC Alert: [ID 482699 daemon.alert] Fault | critical: Fault detected at time = Thu Dec 12 02:32:52 2013. The suspect components: /SYS/MB has fault.io.pciex.device-interr with probability=40, /SYS/MB has fault.io.pciex.device-interr with probability=40, /SYS/MB has fault.io.pciex.device-inte Dec 12 02:32:58 abcde1 SC Alert: [ID 645471 daemon.error] Email | major: Alert rule 1: SMTP session failed with error: Code 0 Dec 12 02:32:59 abcde1 hwmgmtd[3244]: [ID 702911 daemon.notice] State change: service indicator: /SYS/SERVICE (ID: 232) changed state from "Off" (3) to "On" (4). Dec 12 02:33:06 abcde1 SC Alert: [ID 438350 daemon.notice] Audit | minor: root : Open Session : object = "/SP/session/type" : value = "shell" : success ... $ more fmdump-eVu_c644edfb-b986-c3c1-9b9b-e4d3b0514a46.out
TIME CLASS Dec 12 2013 02:32:22.247415010 ereport.io.service.lost nvlist version: 0 class = ereport.io.service.lost ena = 0x5ea87ea277d01001 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@300/pci@1/pci@0/pci@1/network@0,1 (end detector) __ttl = 0x1 __tod = 0x52a91226 0xebf40e2 ...
1 ereport.io.pciex.dl.btlp
5 ereport.io.pciex.pl.re 5 ereport.io.pciex.rc.ce-msg 1 ereport.io.pciex.rc.mce-msg 29 ereport.io.pci.fabric 2 ereport.io.service.lost $ grep /pci@300/pci@1/pci@0/pci@1/network@0,1 etc/path_to_inst
"/pci@300/pci@1/pci@0/pci@1/network@0,1" 1 "ixgbe" # dladm show-phys -L
LINK DEVICE LOC net0 ixgbe0 /SYS/MB net1 ixgbe1 /SYS/MB <---- net2 ixgbe2 /SYS/MB net3 ixgbe3 /SYS/MB net4 igb0 PCIE1 net5 igb1 PCIE1 net6 igb2 PCIE1 net7 igb3 PCIE1 net8 igb4 PCIE2 net9 igb5 PCIE2 net10 igb6 PCIE2 net11 igb7 PCIE2 net12 ixgbe4 PCIE3 net13 ixgbe5 PCIE3 net16 vsw0 -- net14 usbecm2 -- # dladm show-phys -Z
LINK ZONE MEDIA STATE SPEED DUPLEX DEVICE net11 global Ethernet unknown 0 unknown igb7 net13 global Ethernet unknown 0 unknown ixgbe5 net0 global Ethernet up 1000 full ixgbe0 net3 global Ethernet unknown 0 unknown ixgbe3 net9 global Ethernet unknown 0 unknown igb5 net4 global Ethernet unknown 0 unknown igb0 net10 global Ethernet unknown 0 unknown igb6 net5 global Ethernet unknown 0 unknown igb1 net1 global Ethernet unknown 0 unknown ixgbe1 <---- No link net7 global Ethernet unknown 0 unknown igb3 net6 global Ethernet unknown 0 unknown igb2 net12 global Ethernet unknown 0 unknown ixgbe4 net8 global Ethernet unknown 0 unknown igb4 net2 global Ethernet unknown 0 unknown ixgbe2 net14 global Ethernet up 10 full usbecm2 net16 global Ethernet up 1000 full vsw0 # ipadm show-addr
ADDROBJ TYPE STATE ADDR lo0/v4 static ok 127.0.0.1/8 net0/v4 static ok 10.xx.xx.xx/24 net14/v4 static ok 169.254.182.77/24 lo0/v6 static ok ::1/128
SolutionAs a workaround plumb the affected network interface ("ipadm create-ip <interface>")
References<NOTE:1005907.1> - SPARC Platforms: Matrix of Recognized Device Paths<NOTE:1467458.1> - Twinville(Intel) 10 GbE NIC's(copper ports) - Info <BUG:18131062> - IXGBE1: PROBLEM: NETWORK ADAPTER HAS BEEN STOPPED BECAUSE IT HAS OVERHEATED <BUG:16743960> - DLADM SHOW-LINKPROP RESPONDS SLOWER WHEN MACHINE HAS UNSET NETWORK INTERFACES <BUG:17502286> - OVERTEMP CHECK FOR X540 NIC USES RESERVED BIT Attachments This solution has no attachment |
||||||||||||||||||||
|