Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1566851.1
Update Date:2017-08-16
Keywords:

Solution Type  Problem Resolution Sure

Solution  1566851.1 :   USB faults seen on T3 or T4 platforms during boot of Operating System  


Related Items
  • SPARC T3-1
  •  
  • SPARC T3-1B
  •  
  • Netra SPARC T3-1B
  •  
  • SPARC T3-4
  •  
  • SPARC T4-2
  •  
  • Netra SPARC T4-2 Server
  •  
  • Netra SPARC T4-1B
  •  
  • SPARC T4-1B
  •  
  • SPARC T4-1
  •  
  • Netra SPARC T4-1 Server
  •  
  • SPARC T3-2
  •  
  • Solaris Operating System
  •  
  • SPARC T4-4
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

SPARC T4-1 - Version All Versions and later
SPARC T3-1B - Version All Versions and later
SPARC T4-4 - Version All Versions and later
Netra SPARC T3-1B - Version All Versions and later
Netra SPARC T4-1 Server - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

To understand if your system hit this issue the following should be verified:

 

1. In /var/adm/messages file and in the ILOM event log:


- /var/adm/messages file:

...
Aug 21 13:32:58 xxxx mac: [ID 736570 kern.info] NOTICE: usbecm0 unregistered
Aug 21 13:32:58 xxxx genunix: [ID 408114 kern.info] /pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3 (usbecm0) removed   <<--!!
...


- ILOM event log:

...
1445   Wed Aug 21 13:33:25 2013  Audit     Log       minor   host_root : Close Session : object = "/SP/session/type" : value = "shell" : success
1444   Wed Aug 21 13:33:23 2013  Audit     Log       minor   root : Open Session : object = "/SP/session/type" : value = "shell" : success
1443   Wed Aug 21 13:33:02 2013  Fault     Fault     critical   Fault detected at time = Wed Aug 21 13:33:01 2013. The suspect component:
                                                                          /SYS/MB has fault.io.pciex.device-invreq with probability=10. Refer to http://www.sun.com/msg/FMD-8000-11 for details.
1442   Wed Aug 21 13:33:02 2013  Audit     Log       minor   host_root : Close Session : object = "/SP/session/type" : value = "shell" : success
1441   Wed Aug 21 13:33:00 2013  Audit     Log       minor   host_root : Set : object = "/SP/network/interconnect/state" : value = "disabled" : success
1440   Wed Aug 21 13:32:59 2013  Audit     Log       minor   host_root : Set : object = "/SP/network/interconnect/state" : value = "disabled" : success     <<--!!
...


- Often the interconnect later comes online again on its own (as in this example):

- messages file:

...
Aug 21 13:33:49 xxxx mac: [ID 469746 kern.info] NOTICE: usbecm0 registered
Aug 21 13:33:49 xxxx usba: [ID 912658 kern.info] USB 2.0 device (usb430,a4a2) operating at hi speed (USB 2.x) on USB 2.0 external hub: communications@3, usbecm0 at bus address 7
Aug 21 13:33:49 xxxx usba: [ID 349649 kern.info] SunMicro Virtual Eth Device
Aug 21 13:33:49 xxxx genunix: [ID 936769 kern.info] usbecm0 is /pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3
Aug 21 13:33:49 xxxx genunix: [ID 408114 kern.info] /pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3 (usbecm0) online       <<--!!
...


ILOM event log:
...
1455   Wed Aug 21 13:33:40 2013  Audit     Log       minor   host_root : Set : object = "/SP/network/interconnect/state" : value = "enabled" : success
...


- Check its status:


# /usr/sbin/ilomconfig list interconnect     
Interconnect
============
State: enabled                                                <----- enabled again
Type: USB Ethernet
SP Interconnect IP Address: 169.254.182.76
Host Interconnect IP Address: 169.254.182.77
Interconnect Netmask: 255.255.255.0
SP Interconnect MAC Address: 02:21:28:xx:xx:xx
Host Interconnect MAC Address: 02:21:28:xx:xx:xx

 


2.a. Solaris "#fmadm faulty" will show a PCIEX-8000-5Y Event

--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
May 14 13:59:25 522a9f0f-2604-6185-ec77-f9d56ad78e5d PCIEX-8000-5Y Major

Problem Status : solved
Diag Engine : eft / 1.16
System
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : unknown
Serial_Number : 1307AAA51A
Host_ID : 1237f298

----------------------------------------
Suspect 1 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@8/pci@0/usb@0,2
Status : faulted but still in service

FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
----------------------------------------
Suspect 2 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@7/pci@0/display@0
Status : faulted but still in service

FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
----------------------------------------
Suspect 3 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@8/pci@0/usb@0
Status : faulted and taken out of service

FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty
----------------------------------------
Suspect 4 of 4 :
Fault class : fault.io.pci.device-invreq
Certainty : 25%
Affects : dev:////pci@400/pci@1/pci@0/pci@8/pci@0/usb@0,1
Status : faulted and taken out of service

FRU
Location : "/SYS/MB"
Manufacturer : unknown
Name : unknown
Part_Number : 7048938
Revision : 02
Serial_Number : 465769T+1246H70A88
Chassis
Manufacturer : unknown
Name : ORCL,SPARC-T4-4
Part_Number : 31664288+1+1
Serial_Number : 9999ABC99A
Status : faulty


2.b. In some cases, a FMD-8000-11 Event has been seen in the output of "#fmadm faulty"

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
May 14 13:59:25 83fa34c2-4810-eaf4-8681-d615795f66e9  FMD-8000-11    Minor    

Host        : <hostname>
Platform    : ORCL,SPARC-T4-4 Chassis_id  :
Product_sn  :

Fault class : fault.io.pciex.device-invreq max 10%
             fault.io.pci.device-invreq max 10%

Affects     : dev:////pci@400/pci@2/pci@0/pci@7/network@0,1
             dev:////pci@400/pci@2/pci@0/pci@0/pci@0/display@0
             dev:////pci@400/pci@2/pci@0/pci@f/pci@0/usb@0
             dev:////pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,1
             dev:////pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2
             dev:////pci@400/pci@1/pci@0/pci@4/scsi@0
             dev:////pci@400/pci@2/pci@0/pci@4/scsi@0
             dev:////pci@400/pci@2/pci@0/pci@6/network@0
             dev:////pci@400/pci@2/pci@0/pci@6/network@0,1
             dev:////pci@400/pci@2/pci@0/pci@7/network@0
                 faulted but still in service

FRU         : "/SYS/MB" (hc://:product-id=ORCL,SPARC-T4-4:product-sn=9999ABC99A:server-id=www01:chassis-id=9999ABC99A:serial=465769T+1242BW0LAD:part=7041481:revision=02/chassis=0/motherboard=0)
                 faulty

Description : A Solaris Fault Manager component generated a diagnosis for which
             no message summary exists.  Refer to
             http://sun.com/msg/FMD-8000-11 for more information.

Response    : The diagnosis has been saved in the fault log for examination by
             Sun.

Impact      : The fault log will need to be manually examined using fmdump(1M)
             in order to determine if any human response is required.

Action      : Use fmdump -v -u <EVENT-ID> to view the diagnosis result.  Run
             pkgchk -n SUNWfmd to ensure that fault management software is
             installed properly.

Changes

These errors have been first observed after the upgrade of the System Firmware to 8.2.x or later, but they also can occur upon each boot of the Operating System,
also during the first boot after the installation of the Operating System. Also, these errors have been observed during normal operation of the OS.

Cause

These errors can occur during normal communication between the ILOM and the USB devices, when the USB driver fails to terminate the communication correctly.

Solution

The solution has been implemented in Solaris 11.1 SRU 10.5.

If using Solaris 10, please install:

  SPARC:   Patch 150631-02 or later: SunOS 5.10: ehci patch

  X86/X64: Patch 148695-03 or later: SunOS 5.10_x86: ehci Patch

Then, clear the corresponding FMA Event on OS level. Example: # fmadm repair 522a9f0f-2604-6185-ec77-f9d56ad78e5d

In the unlikely case that this should not clear the same Event on the Service Processor, please execute on the Service Processor:
-> set /SYS/MB clear_fault_action=true
-> reset /SP


In case that you should observe these errors after implementing the recommended solution, please open a service request at Oracle.




For legacy reasons, I'm leaving the workaround info in this document in an "internal only" note.
This was the workaround for the described problem prior to the release of the patches mentioned above.

Workaround

1. Clear the fault using fmadm repair <uuid> in Solaris. This should automatically also clear the fault on the SP.

2. If the interconnect is not yet online, try to re-vive it with "svcadm restart ilomconfig-interconnect" (please give it a couple of minutes).


Until the fix for Solaris 10 is available you can also disable this interface so that Solaris does not configure it upon boot:

1) Make sure on the SP the hostmanaged property of "/SP/network/interconnect" is set to "true"  ("true" is the default value).

   If it should be set to "false", please change it with the following command to "true"
   -> set /SP/network/interconnect hostmanaged=true


2) Disable the following service:

# svcadm disable svc:/network/ilomconfig-interconnect:default  

It can take a couple of minutes until "ilomconfig list interconnect" shows it to be disabled.

Please note: If "hostmanaged" was already set to "false" because of

    Doc ID 1583230.1 : SPARC T3/T4: usba:(hubd2): "Connecting device on port 3 failed" WARNING message reported at system boot

then you can either set it to "true" (and leave it) in order to perform the above procedure
or you can use the following alternate procedure on the SP:

-> set /SP/network/interconnect/ state=disabled       (while hostmanaged=false)

Both procedures shown here are a workaround for the issues described in the document you're reading now and Doc 1583230.1.

Note: Oracle Hardware Management Pack uses this interconnect but, if not available, it will use a slower, alternate path.



** Further documentation **

Oracle Integrated Lights Out Manager (ILOM) 3.0 HTML Documentation Collection:

Starting, Stopping, and Logging Fault Management Shell Sessions
http://docs.oracle.com/cd/E19860-01/E21549/z400015e1396491.html#scrolltoc 

fmadm – Fault Management Administration Tool
http://docs.oracle.com/cd/E19860-01/E21549/z400015e1396982.html#scrolltoc

References

<BUG:16081077> - FMD FAULT DURING S11.1 AI AUTO-INSTALL T-SERIES - USBECM0: OBJECT NOT FOUND
<BUG:17452623> - BACKPORT 16081077 TO S10 - FMD FAULT DURING S11.1 AI AUTO-INSTALL T-SERIES

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback