Asset ID: |
1-72-1566851.1 |
Update Date: | 2017-08-16 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1566851.1
:
USB faults seen on T3 or T4 platforms during boot of Operating System
Related Items |
- SPARC T3-1
- SPARC T3-1B
- Netra SPARC T3-1B
- SPARC T3-4
- SPARC T4-2
- Netra SPARC T4-2 Server
- Netra SPARC T4-1B
- SPARC T4-1B
- SPARC T4-1
- Netra SPARC T4-1 Server
- SPARC T3-2
- Solaris Operating System
- SPARC T4-4
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
|
In this Document
Applies to:
SPARC T4-1 - Version All Versions and later
SPARC T3-1B - Version All Versions and later
SPARC T4-4 - Version All Versions and later
Netra SPARC T3-1B - Version All Versions and later
Netra SPARC T4-1 Server - Version Not Applicable and later
Information in this document applies to any platform.
Symptoms
To understand if your system hit this issue the following should be verified:
1. In /var/adm/messages file and in the ILOM event log:
- /var/adm/messages file:
...
Aug 21 13:32:58 xxxx mac: [ID 736570 kern.info] NOTICE: usbecm0 unregistered
Aug 21 13:32:58 xxxx genunix: [ID 408114 kern.info] /pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3 (usbecm0) removed <<--!!
...
- ILOM event log:
...
1445 Wed Aug 21 13:33:25 2013 Audit Log minor host_root : Close Session : object = "/SP/session/type" : value = "shell" : success
1444 Wed Aug 21 13:33:23 2013 Audit Log minor root : Open Session : object = "/SP/session/type" : value = "shell" : success
1443 Wed Aug 21 13:33:02 2013 Fault Fault critical Fault detected at time = Wed Aug 21 13:33:01 2013. The suspect component:
/SYS/MB has fault.io.pciex.device-invreq with probability=10. Refer to http://www.sun.com/msg/FMD-8000-11 for details.
1442 Wed Aug 21 13:33:02 2013 Audit Log minor host_root : Close Session : object = "/SP/session/type" : value = "shell" : success
1441 Wed Aug 21 13:33:00 2013 Audit Log minor host_root : Set : object = "/SP/network/interconnect/state" : value = "disabled" : success
1440 Wed Aug 21 13:32:59 2013 Audit Log minor host_root : Set : object = "/SP/network/interconnect/state" : value = "disabled" : success <<--!!
...
- Often the interconnect later comes online again on its own (as in this example):
- messages file:
...
Aug 21 13:33:49 xxxx mac: [ID 469746 kern.info] NOTICE: usbecm0 registered
Aug 21 13:33:49 xxxx usba: [ID 912658 kern.info] USB 2.0 device (usb430,a4a2) operating at hi speed (USB 2.x) on USB 2.0 external hub: communications@3, usbecm0 at bus address 7
Aug 21 13:33:49 xxxx usba: [ID 349649 kern.info] SunMicro Virtual Eth Device
Aug 21 13:33:49 xxxx genunix: [ID 936769 kern.info] usbecm0 is /pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3
Aug 21 13:33:49 xxxx genunix: [ID 408114 kern.info] /pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3 (usbecm0) online <<--!!
...
ILOM event log:
...
1455 Wed Aug 21 13:33:40 2013 Audit Log minor host_root : Set : object = "/SP/network/interconnect/state" : value = "enabled" : success
...
- Check its status:
# /usr/sbin/ilomconfig list interconnect
Interconnect
============
State: enabled <----- enabled again
Type: USB Ethernet
SP Interconnect IP Address: 169.254.182.76
Host Interconnect IP Address: 169.254.182.77
Interconnect Netmask: 255.255.255.0
SP Interconnect MAC Address: 02:21:28:xx:xx:xx
Host Interconnect MAC Address: 02:21:28:xx:xx:xx
2.a. Solaris "#fmadm faulty" will show a PCIEX-8000-5Y Event
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
May 14 13:59:25 522a9f0f-2604-6185-ec77-f9d56ad78e5d PCIEX-8000-5Y Major
Problem Status : solved
Diag Engine : eft / 1.16
System
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : unknown
Serial_Number : 1307AAA51A
Host_ID : 1237f298
----------------------------------------
Suspect 1 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@8/pci@0/usb@0,2
Status : faulted but still in service
FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
----------------------------------------
Suspect 2 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@7/pci@0/display@0
Status : faulted but still in service
FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
----------------------------------------
Suspect 3 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@8/pci@0/usb@0
Status : faulted and taken out of service
FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
----------------------------------------
Suspect 4 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@8/pci@0/usb@0,1
Status : faulted and taken out of service
FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
2.b. In some cases, a FMD-8000-11 Event has been seen in the output of "#fmadm faulty"
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
May 14 13:59:25 83fa34c2-4810-eaf4-8681-d615795f66e9 FMD-8000-11 Minor
Host : <hostname>
Platform : ORCL,SPARC-T4-4 Chassis_id :
Product_sn :
Fault class : fault.io.pciex.device-invreq max 10%
fault.io.pci.device-invreq max 10%
Affects : dev:////pci@400/pci@2/pci@0/pci@7/network@0,1
dev:////pci@400/pci@2/pci@0/pci@0/pci@0/display@0
dev:////pci@400/pci@2/pci@0/pci@f/pci@0/usb@0
dev:////pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,1
dev:////pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2
dev:////pci@400/pci@1/pci@0/pci@4/scsi@0
dev:////pci@400/pci@2/pci@0/pci@4/scsi@0
dev:////pci@400/pci@2/pci@0/pci@6/network@0
dev:////pci@400/pci@2/pci@0/pci@6/network@0,1
dev:////pci@400/pci@2/pci@0/pci@7/network@0
faulted but still in service
FRU : "/SYS/MB" (hc://:product-id=ORCL,SPARC-T4-4:product-sn=9999ABC99A:server-id=www01:chassis-id=9999ABC99A:serial=465769T+1242BW0LAD:part=7041481:revision=02/chassis=0/motherboard=0)
faulty
Description : A Solaris Fault Manager component generated a diagnosis for which
no message summary exists. Refer to
http://sun.com/msg/FMD-8000-11 for more information.
Response : The diagnosis has been saved in the fault log for examination by
Sun.
Impact : The fault log will need to be manually examined using fmdump(1M)
in order to determine if any human response is required.
Action : Use fmdump -v -u <EVENT-ID> to view the diagnosis result. Run
pkgchk -n SUNWfmd to ensure that fault management software is
installed properly.
Changes
These errors have been first observed after the upgrade of the System Firmware to 8.2.x or later, but they also can occur upon each boot of the Operating System,
also during the first boot after the installation of the Operating System. Also, these errors have been observed during normal operation of the OS.
Cause
These errors can occur during normal communication between the ILOM and the USB devices, when the USB driver fails to terminate the communication correctly.
Solution
The solution has been implemented in Solaris 11.1 SRU 10.5.
If using Solaris 10, please install:
SPARC: Patch 150631-02 or later: SunOS 5.10: ehci patch
X86/X64: Patch 148695-03 or later: SunOS 5.10_x86: ehci Patch
Then, clear the corresponding FMA Event on OS level. Example: # fmadm repair 522a9f0f-2604-6185-ec77-f9d56ad78e5d
In the unlikely case that this should not clear the same Event on the Service Processor, please execute on the Service Processor:
-> set /SYS/MB clear_fault_action=true
-> reset /SP
In case that you should observe these errors after implementing the recommended solution, please open a service request at Oracle.
For legacy reasons, I'm leaving the workaround info in this document in an "internal only" note.
This was the workaround for the described problem prior to the release of the patches mentioned above.
Workaround
1. Clear the fault using fmadm repair <uuid> in Solaris. This should automatically also clear the fault on the SP.
2. If the interconnect is not yet online, try to re-vive it with "svcadm restart ilomconfig-interconnect" (please give it a couple of minutes).
Until the fix for Solaris 10 is available you can also disable this interface so that Solaris does not configure it upon boot:
1) Make sure on the SP the hostmanaged property of "/SP/network/interconnect" is set to "true" ("true" is the default value).
If it should be set to "false", please change it with the following command to "true"
-> set /SP/network/interconnect hostmanaged=true
2) Disable the following service:
# svcadm disable svc:/network/ilomconfig-interconnect:default
It can take a couple of minutes until "ilomconfig list interconnect" shows it to be disabled.
Please note: If "hostmanaged" was already set to "false" because of
Doc ID 1583230.1 : SPARC T3/T4: usba:(hubd2): "Connecting device on port 3 failed" WARNING message reported at system boot
then you can either set it to "true" (and leave it) in order to perform the above procedure
or you can use the following alternate procedure on the SP:
-> set /SP/network/interconnect/ state=disabled (while hostmanaged=false)
Both procedures shown here are a workaround for the issues described in the document you're reading now and Doc 1583230.1.
Note: Oracle Hardware Management Pack uses this interconnect but, if not available, it will use a slower, alternate path.
** Further documentation **
Oracle Integrated Lights Out Manager (ILOM) 3.0 HTML Documentation Collection:
Starting, Stopping, and Logging Fault Management Shell Sessions
http://docs.oracle.com/cd/E19860-01/E21549/z400015e1396491.html#scrolltoc
fmadm – Fault Management Administration Tool
http://docs.oracle.com/cd/E19860-01/E21549/z400015e1396982.html#scrolltoc
References
<BUG:16081077> - FMD FAULT DURING S11.1 AI AUTO-INSTALL T-SERIES - USBECM0: OBJECT NOT FOUND
<BUG:17452623> - BACKPORT 16081077 TO S10 - FMD FAULT DURING S11.1 AI AUTO-INSTALL T-SERIES
Attachments
This solution has no attachment