Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1683087.1
Update Date:2017-10-11
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1683087.1 :   SPARC M5-32 and M6-32 Servers: Interconnect, FMA Fault Proxying and LDOM configuration  


Related Items
  • SPARC M5-32
  •  
  • SPARC M6-32
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx-32
  •  




In this Document
Purpose
Details
 Role of interconnect
 
Identify the Interconnect path 
 
Configuration rules
 Empty DCU (no CMU installed)
 The root complexes used for the usbecm interface
 Additional Information
 Special case : Golden SPP failover
References


Applies to:

SPARC M5-32 - Version All Versions and later
SPARC M6-32 - Version All Versions and later
Information in this document applies to any platform.

Purpose

This document provides details about the Interconnect, FMA Fault Proxying and LDOM on the SPARC M5-32 and M6-32 Servers.

 

Details

Role of interconnect

The Interconnect provides an internal communication interface between the Host/Pdom and the service processor.
On M5-32/M6-32, it's actually an interface between the Host (control/primary domain) and the Pdomain-SPP (aka Golden SPP) of the Host.

For M5-32/M6-32, this IP address can be used to connect the Pdomain-SPP. Not the SP like on the Tseries servers.

The Interconnect is using an Ethernet-Over-USB interface (usbecm).

It can be controlled/managed :
- from the SP :
    /Servers/PDomains/PDomain_x/SP/network/interconnect
- from the Solaris Host
    using the 'ilomconfig' command from Solaris or OHMP.

Note : with Solaris 11.1, ilomconfig is bundled with Solaris. The Oracle Hardware Management Pack is not required to configure the Interconnect.

See :

  • Oracle ILOM Administrator’s Guide for Configuration and Maintenance Firmware Release 3.2.1 (here)
  • SPARC M5-32 and SPARC M6-32 Servers Administration Guide (here)


The Fault Management Architecture is using the Interconnect to proxy the faults diagnosed on the SP/SPP to the Host and to proxy the faults diagnosed on the Solaris Host to the SP/SPP. This is known as FMA Fault Proxying; keeping the FMA faults in sync between the host and the Active SP.

When the interconnect is not available then the FMA Fault Proxying cannot work properly.

Note : when the FMA Fault Proxying mechanism does not work, FMA on SP and Solaris Host still works but the faults diagnosed are no longer proxyed to the other side.

The interconnect is enabled and configured by default and should always be enabled from both the SP and the Host.

Interconnect is using the usbecm interface. On top of it, an IP address is available.
The usbecm interface is using the rootcomplex/bus from CMUx/CMP0 where x is the lowest numbered CMU in the DCU (CMU0 in DCU#0, CMU4 in DCU#1 etc...). So the usbecm interface is using the following rootcomplexes/bus depending on the PDom configuration and Pdomain-SPP role selection :
- DCU#0 - SPP0 : pci_1,
- DCU#1 - SPP1 : pci_17,
- DCU#2 - SPP2 : pci_33,
- DCU#3 - SPP3 : pci_49
See SPARC M5-32 and M6-32 Servers: Device Paths (Doc ID 1540545.1).

When LDOMs are configured, if the usbecm connection is not available to the control domain, FMA Fault Proxying does not work. See below.


Identify the Interconnect path 

  • From the SP
-> show /Servers/PDomains/PDomain_1/SP/network/interconnect

 /Servers/PDomains/PDomain_1/SP/network/interconnect
    Targets:

    Properties:
        hostmanaged = true
        type = USB Ethernet
        ipaddress = 169.254.182.76
        ipnetmask = 255.255.255.0
        spmacaddress = 02:21:28:57:47:16
        hostmacaddress = 02:21:28:57:47:17


It's a communication path to the Pdomain-SPP.

Ex :

-> show /Servers/PDomains/PDomain_1/HOST/ sp_name

  /Servers/PDomains/PDomain_1/HOST
    Properties:
        sp_name = /SYS/SPP2

 

  • From Solaris

The available physical paths

root@m5-32-sca11-a-pdom01:~# prtdiag -v | grep -i usb

/SYS/SPP1/USB     PCIE  usb-pciexclass,0c0330                        5.0GTx1
                        /pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0
/SYS/SPP2/USB     PCIE  usb-pciexclass,0c0330                        5.0GTx1
                        /pci@b40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0

root@m5-32-sca11-a-pdom01:/var/tmp# hotplug list -cv
Connection           State           Description
________________________________________________________________________________
...
pcie0                ENABLED         PCIe-Native
     Device                     Usage
     ___________________________________________________________________________
     pci@0                      -
     pci@0                      -
     usb@0                      -
     communications@6           Network interface net24
                                net24: hosts IP addresses: 169.254.182.77
     pci@5                      -
     display@0                  framebuffer device

 

The IP configuration can be checked as following :

root@m5-32-sca11-a-pdom01:~# ilomconfig list interconnect
Interconnect
============
State: enabled
Type: USB Ethernet
SP Interconnect IP Address: 169.254.182.76
Host Interconnect IP Address: 169.254.182.77
Interconnect Netmask: 255.255.255.0
SP Interconnect MAC Address: 02:21:28:57:47:16
Host Interconnect MAC Address: 02:21:28:57:47:17


So the Interconnect is using by default the IP address 169.254.182.77 to communicate to the Pdomain-SPP via the IP address 169.254.182.76.

On a multi-DCUs Pdom, any of the SPP can be selected as Pdomain-SPP and this role may change after a stop/start operation.

On a single-DCU Pdom, there is obviously only one SPP available and selected as Pdomain-SPP.

It's possible to identify the path used by the usbecm interface.

On this dual DCU (DCU1 + DCU2) Pdom, SPP2 is the Pdomain-SPP.

-> show /Servers/PDomains/PDomain_1/HOST/ sp_name

  /Servers/PDomains/PDomain_1/HOST
    Properties:
        sp_name = /SYS/SPP2


And the usbecm interface is using the path to SPP2

root@m5-32-sca11-a-pdom01:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
...
net46             ip         ok           --         --
   net46/v4       static     ok           --         169.254.182.77/24

root@m5-32-sca11-a-pdom01:~# dladm show-phys -L | grep usb
net46             usbecm0      --

root@m5-32-sca11-a-pdom01:~# grep usb /etc/path_to_inst
"/pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0" 0 "xhci"
"/pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6" 2 "usbecm"
"/pci@b40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0" 1 "xhci"
"/pci@b40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6" 0 "usbecm"


As described in SPARC M5-32 and M6-32 Servers: Device Paths (Doc ID 1540545.1), /pci@b40/pci@1/pci@0/pci@1/pci@0/pci@0 is the path to SPP2.

If at some point the Golden SPP role switch to SPP1, the interconnect will use usbecm2 (/pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0).


Configuration rules

Empty DCU (no CMU installed)

  • should not be assigned to a Host (/Servers/PDomains/PDomain_1/HOST dcus_assigned),
  • the SPP for the empty DCU might become the Pdomain-SPP at some point,
  • because there is no CMU in the empty DCU, the usbecm interface cannot use any root complex,
  • this may result in a non-functioning interconnect and so no FMA Fault Proxying.

The root complexes used for the usbecm interface

  • must be owned by the control/primary domain (pci_1, pci_17, pci_33, pci_49),
  • when the root complexes used for the usbecm interface is assigned to a non-primary domain then FMA Fault Proxying does not work,
  • this also applies to multi-DCUs PDom because the Pdomain-SPP may change at some point so the root complexes need to be available on the control domain even after a Pdomain-SPP role change,
  • as long as the root complex is owned by the primary domain, it's possible to assign the PCIe endpoint to a non-primary domain; ie. PCIE2 (full configuration) or PCIE1/PCIE2 (half configuration). The card installed in the slot requires to support DIO.
    • References :
      • Oracle®VMServer for SPARC 3.1 Administration Guide / Creating an I/O Domain by Assigning PCIe EndpointDevices / Howto Create an I/O Domain by Assigning a PCIe EndpointDevice
      • Oracle VM Server for SPARC PCIe Direct I/O and SR-IOV Features (Doc ID 1325454.1)
  • unrelated to FMA Fault Proxying, but listed in the context of required root complex ownership -- when the root complexes used for the usbecm interface is assigned to a non-primary domain, storage redirection used for rcdrom boot does not work
    • the rKVMS device path used for storage redirection is also used for the interconnect path described in this document
    • no rcdrom device will be listed nor available in OBP if the required root complexes are not assigned to the control domain

 

Additional Information

  • When the SPARC M5-32 and M6-32 Servers is running SysFW 9.1.1.a or 9.1.1.b, the FMA Fault Proxying may not be running properly because of 17768292 and so because the ip-transport module is not loaded (fmstat -T). The SysFW should be upgraded to the latest version. When properly loaded, you should see
# fmstat -T | grep ip-transport
  3   RUN ip-transport        server-name=169.254.182.76:24 

 

  • During the boot sequence, the following warning may be reported
WARNING: /pci@340/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0 (xhci0): Connecting device on port 6 failed

which means that "the driver failed to enumerate the device connected  on port  <number>  of hub. If enumeration fails, disconnect and re-connect."; see man hubd(7D).

You can use the commands above to confirm that the path is used for the Eth-o-USB interface and that the interconnect and FMA Fault proxying are properly configured.

You can also check the /var/adm/messages file and/or host console logs to confirm that, despite the warning, the re-connect was successful

Jul  1 15:49:57  usba: [ID 691482 kern.warning] WARNING: /pci@340/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0 (xhci0): Connecting device on port 6 failed
Jul  1 15:50:02  last message repeated 1 time
Jul  1 15:50:06  mac: [ID 469746 kern.info] NOTICE: usbecm2 registered
Jul  1 15:50:06  usba: [ID 912658 kern.info] USB 1.10 device (usb430,a4a2) operating at full speed (USB 1.x) on USB 2.0 root hub: communications@6, usbecm2 at bus address 1
Jul  1 15:50:06  usba: [ID 349649 kern.info]    SunMicro Virtual Eth Device
Jul  1 15:50:06  genunix: [ID 936769 kern.info] usbecm2 is /pci@340/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6
Jul  1 15:50:06  genunix: [ID 408114 kern.info] /pci@340/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm2) online
Jul  1 15:50:06  mac: [ID 435574 kern.info] NOTICE: usbecm2 link up, 10 Mbps, full duplex

  

The impact is only on the FMA Fault Proxying.

The FMA ereport forwarding is not impacted; ie. any error detected on the Solaris side and requiring an SP diagnosis to produce a fault (GM/FERG) is still forwarded to the SP via the SPP as it's using a different channel than the Interconnect.

 

Special case : Golden SPP failover


When a Host is composed of multiple DCUs, one SPP is selected as Pdomain-SPP (Golden SPP). The Golden SPP then hosts the interconnect communication with the Host.
At some point, the Golden SPP role may switch from an SPP to another for several reason.
Golden SPP switch is then reported in the event logs from the Active SP as following :

45220  Thu Sep 18 00:05:34 2014  Reset     Log       minor   
       /Servers/PDomains/PDomain_3 is now managed by PDomain SPP /SYS/SPP3.
...
45140  Wed Sep 17 13:34:39 2014  Reset     Log       minor   
       /Servers/PDomains/PDomain_3 is now managed by PDomain SPP /SYS/SPP1.


If the host is up when the Golden SPP failover occurs, from the Solaris /var/adm/messages file on the control domain

 

Sep 18 00:05:29 pdom03 mac: [ID 486395 kern.info] NOTICE: usbecm2 link down
Sep 18 00:06:22 pdom03 mac: [ID 469746 kern.info] NOTICE: usbecm0 registered
Sep 18 00:06:22 pdom03 usba: [ID 912658 kern.info] USB 1.10 device (usb430,a4a2) operating at full speed (USB 1.x) on USB 2.0 root hub: communications@6, usbecm0 at bus address 1
Sep 18 00:06:22 pdom03 usba: [ID 349649 kern.info]        SunMicro Virtual Eth Device
Sep 18 00:06:22 pdom03 genunix: [ID 936769 kern.info] usbecm0 is /pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6
Sep 18 00:06:22 pdom03 genunix: [ID 408114 kern.info] /pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm0) online
Sep 18 00:06:22 pdom03 genunix: [ID 408114 kern.info] /pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm2) removed

where
# grep usb /etc/path_to_inst
"/pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0" 0 "xhci"
"/pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6" 0 "usbecm"
"/pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0" 2 "xhci"
"/pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6" 2 "usbecm"

# prtdiag -v | grep -i usb
/SYS/SPP1/USB     PCIE  usb-pciexclass,0c0330                        5.0GT/x1   5.0GT/x1   
                        /pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0
/SYS/SPP3/USB     PCIE  usb-pciexclass,0c0330                        5.0GT/x1   5.0GT/x1   
                        /pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0



A Golden SPP role switch can occur :

  1. when restarting (stop/start) the host
  2. at any time while host is up&running



1- when restarting (stop/start) the host

If a Golden SPP role switch occurs while restarting the host (ie. Golden SPP changed between the stop and the start operations), Solaris and the respective svc:/network/ilomconfig-interconnect:default will properly initialized the usbecm instance and the interconnect communication path.
No extra step should be required in this case.
The commands as described above can be used to confirmed that it works properly.



2- at any time while host is up&running

If a Golden SPP switch occurs while the host is running Solaris, the interconnect communication may not be working anymore.
The former usbecm interface will be down and removed

Sep 18 01:42:48 pdom03 mac: [ID 486395 kern.info] NOTICE: usbecm2 link down
Sep 18 01:44:02 pdom03 genunix: [ID 408114 kern.info] /pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm2) removed


The new usbecm interface will be brought back up

Sep 18 01:44:08 pdom03 mac: [ID 469746 kern.info] NOTICE: usbecm0 registered
Sep 18 01:44:08 pdom03 usba: [ID 912658 kern.info] USB 1.10 device (usb430,a4a2) operating at full speed (USB 1.x) on USB 2.0 root hub: communications@6, usbecm0 at bus address 1
Sep 18 01:44:08 pdom03 usba: [ID 349649 kern.info]        SunMicro Virtual Eth Device
Sep 18 01:44:08 pdom03 genunix: [ID 936769 kern.info] usbecm0 is /pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6
Sep 18 01:44:08 pdom03 genunix: [ID 408114 kern.info] /pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm0) online
Sep 18 01:44:09 pdom03 mac: [ID 435574 kern.info] NOTICE: usbecm0 link up, 10 Mbps, full duplex

 

And the new interface is reported via dladm.

# dladm show-phys -P | grep usb
net79             usbecm2      Ethernet             r----
net71             usbecm0      Ethernet             -----

 

But the interface is not yet fully functional and must be configured manually

root@pdom03:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        static     ok           --         10.133.111.158/21
   net0/v6        addrconf   ok           --         fe80::210:e0ff:fe24:6acc/10
net71             ip         down         --         --
net79             ip         failed       --         --
   net79/v4       static     inaccessible --         169.254.182.77/24
root@pdom03:~# ipadm delete-addr net79/v4
root@pdom03:~# ipadm create-addr -T static -a 169.254.182.77/24 net71/v4
root@pdom03:~# ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
   lo0/v6         static     ok           --         ::1/128
net0              ip         ok           --         --
   net0/v4        static     ok           --         10.133.111.158/21
   net0/v6        addrconf   ok           --         fe80::210:e0ff:fe24:6acc/10
net71             ip         ok           --         --
   net71/v4       static     ok           --         169.254.182.77/24
net79             ip         failed       --         --


The commands as described above can be used to confirmed that it works properly.

At this point, you can restart ('svcadm restart' or 'svcadm disable/enable') the svc:/network/ilomconfig-interconnect:default service.
You can also disable/enable the interconnect ('ilomconfig disable|enable interconnect').
This should not be required.


Corner case :

If, for some reason, the usbecm interface fails to be registered

Sep 18 05:52:08 pdom03 mac: [ID 486395 kern.info] NOTICE: usbecm0 link down
Sep 18 05:52:11 pdom03 genunix: [ID 408114 kern.info] /pci@f40/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm0) removed
Sep 18 05:52:58 pdom03 usba: [ID 723738 kern.info] /pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm2): usbecm_restore_device_state: Device has been reconnected but data may have been lost
Sep 18 05:52:58 pdom03 mac: [ID 435574 kern.info] NOTICE: usbecm2 link up, 10 Mbps, full duplex
Sep 18 05:52:58 pdom03 genunix: [ID 408114 kern.info] /pci@740/pci@1/pci@0/pci@1/pci@0/pci@0/usb@0/communications@6 (usbecm2) online


All the usbecm instances are reported as removed.

root@pdom03:~# dladm show-phys -P | grep usb
net79             usbecm2      Ethernet             r----
net71             usbecm0      Ethernet             r----

Then restarting the svc:/network/ilomconfig-interconnect:default service will more likely report

# tail -f /var/svc/log/network-ilomconfig-interconnect:default.log
[ Sep 18 06:01:43 Executing start method ("/lib/svc/method/svc-ilomconfig-interconnect start"). ]
ERROR: Internal error
ERROR: Internal error
ERROR: Internal error
ERROR: Internal error
ERROR: Internal error
[ Sep 18 06:04:22 Method "start" exited with status 0. ]


As will do, disable/enable interconnect

# ilomconfig disable interconnect
ERROR: Internal error
# ilomconfig enable interconnect
ERROR: Internal error



At this point, it should be possible to manually configure the network interface (ipadm delete-addr/create-addr) as described above.
And FMA Fault Proxying should be able to communicate properly with the SPP.


# fmstat -T | grep ip-transport
  9   RUN ip-transport        server-name=169.254.182.76:24


But it may be recommended to reboot the primary domain so the usbecm interface is properly registered again.


Note that when the interconnect communication is re-establish, you may see a replay for some faults from the SP; ie. SP re-proxying some faults to the host. Fault replay (vs new fault) can be confirmed by checking the UUID (fmdump) and diag-time (fmdump -Vu UUID).





Note : at some point, it will be possible to manually switch the Golden SPP role via the Host initiate_sp_failover property.

set /Hostx initiate_sp_failover=true

After performing a manual Golden SPP failover, the same procedure as described in "2- at any time while host is up&running" should be implemented.




Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback