Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-2296998.1
Update Date:2018-05-30
Keywords:

Solution Type  Troubleshooting Sure

Solution  2296998.1 :   How to Troubleshoot ILOM Interconnect Problems  


Related Items
  • SPARC S7-2
  •  
  • SPARC T7-1
  •  
  • Netra SPARC T4-2 Server
  •  
  • SPARC T4-2
  •  
  • SPARC T5-8
  •  
  • SPARC T8-2
  •  
  • Netra SPARC S7-2
  •  
  • Netra SPARC T4-1 Server
  •  
  • SPARC T8-1
  •  
  • SPARC T7-4
  •  
  • SPARC T7-2
  •  
  • SPARC T4-1
  •  
  • SPARC T4-1
  •  
  • SPARC T5-2
  •  
  • SPARC T5-4
  •  
  • SPARC T8-4
  •  
  • SPARC T4-4
  •  
  • SPARC S7-2L
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T7
  •  




Applies to:

Netra SPARC T4-1 Server - Version All Versions and later
SPARC T5-2 - Version All Versions and later
SPARC T5-4 - Version All Versions and later
SPARC S7-2 - Version All Versions and later
SPARC T8-1 - Version All Versions and later
Information in this document applies to any platform.

Purpose

The Integrated Lights Out Manager (ILOM) Interconnect is an internal 10MB Ethernet-Over-USB interface between the ILOM and the server's Solaris host / primary LDom which is used to communicate Fault Management Architecture (FMA) and other data. The Fault Management Architecture uses the Interconnect to proxy the faults diagnosed on the Service Processor (SP) to the Host, and to proxy the faults diagnosed on the Solaris Host to the SP. This is known as FMA Fault Proxying which keeps the FMA faults in sync between the host and the SP on T5-x or newer servers. When the FMA Fault Proxying mechanism does not work, FMA on SP and Solaris Host still works but the faults diagnosed are no longer proxied to the other side.

The following faults can occur for many reasons, and some are related to the ILOM interconnect:

FMD-8000-D6 - alert.oracle.solaris.fmd.ip-transport.interconnect-down
ILOM-8000-EM - alert.ilom.fm.ip-transport.interconnect-down
FMD-8000-ET - alert.oracle.solaris.fmd.ip-transport.link-down
ILOM-8000-F7 - alert.ilom.fm.ip-transport.link-down

This document provides information on troubleshooting ILOM interconnect problems.

 

Troubleshooting Steps

Also see doc: 2281470.1

This interface was either Auto Configured via the OS or Oracle Hardware Management Pack (OHMP) at initial boot, or was Manually Configured by the system admin.

Please note that Solaris will detect this link go down if the ILOM reboots or becomes unresponsive.  Determine if the ILOM was down at the time of this fault (typically a FMD-8000-ET) prior proceeding with this troubleshooting doc.  Also determine if the ILOM was hung at the time of the interconnect outage since it will be affected.

OHMP's ilomconfig will enable the interconnect by default if ILOM 3.0.12 & Solaris 10 U11 are both loaded. OHMP was added to Solaris 11.2's distribution and is enabled by default. We recommend that OHMP version 2.3 or newer is utilized & is upgraded via a Solaris upgrade to a more current version. OHMP can be obtained at the following URL when Solaris 11.1 or earlier is in use:

http://www.oracle.com/technetwork/documentation/sys-mgmt-networking-190072.html#hwmgmt

 

Fault FMD-8000-ET [failed getpeername()] was detected after an upgrade to Solaris 11.3 SRU 22.3 (or newer) even though the ILOM interconnect was properly configured. If the IP transport is missing, then this is resolved by upgrading the system firmware to 9.7.4 (or newer) on T7-x systems and to 9.6.7.a (or newer) on T5-x systems. 

##### fma/fmstat-T.out #####
10 RUN ip-transport server-name=169.254.182.76:24

 

 

The user should ensure that the Solaris host / primary LDom is configured properly, as follows:

1.  Interconnect Service online:

#####  # svcs -av | grep :default  /  Explorer: sysconfig/svcs-av.out #####
STATE NSTATE STIME CTID FMRI
online  -    Apr_01  -  svc:/network/ilomconfig-interconnect:default

If the Solaris service is offline or not present, then use svcadm to enable the service:

svcadm enable svc:/network/ilomconfig-interconnect:default

The history of this service is found in explorer file: /var/svc/log/network-ilomconfig-interconnect:default.log

[ Feb 1 13:51:31 Executing start method ("/lib/svc/method/svc-ilomconfig-interconnect start"). ]
[ Feb 1 13:51:31 Method "start" exited with status 0. ] Host-to-ILOM interconnect successfully configured.
[ Feb 1 13:56:54 Stopping because service disabled. ]
[ Feb 1 13:56:55 Executing stop method ("/lib/svc/method/svc-ilomconfig-interconnect stop"). ]
ERROR: Cannot modify interconnect when disabled (use enable command)
ERROR: Cannot modify interconnect when disabled (use enable command)
ERROR: Cannot modify interconnect when disabled (use enable command)
[ Feb 1 13:57:09 Method "stop" exited with status 0. ]
[ Apr 16 13:48:10 Disabled. ]
[ Apr 17 22:08:52 Disabled. ]
[ Apr 17 22:54:40 Disabled. ]
[ Apr 19 11:43:32 Enabled. ]
[ Apr 19 11:43:32 Executing start method ("/lib/svc/method/svc-ilomconfig-interconnect start"). ]
[ Apr 19 11:43:32 Method "start" exited with status 0. ]

 

 

2. Interconnect enabled:

##### # /usr/sbin/ilomconfig list interconnect #####
Interconnect
============
State: enabled
Type: USB Ethernet
SP Interconnect IP Address: 169.254.182.76
Host Interconnect IP Address: 169.254.182.77
Interconnect Netmask: 255.255.255.0
SP Interconnect MAC Address: 03:23:23:57:47:16
Host Interconnect MAC Address: 03:23:23:57:47:17

If not enabled, then enable it via command:

# /usr/sbin/ilomconfig enable interconnect

If online, this Solaris interface can be tested via commands:

# ping 169.254.182.77
169.254.182.77 is alive

# ping 169.254.182.76
169.254.182.76 is alive

# ipmitool sunoem version
Version: 3.2.9.1.b r116246

If the Solaris services are are online and "ilomconfig list interconnect" appears normal, but isn't operational, then attempt to disable & enable it via OHMP's ilomconfig:

# /usr/sbin/ilomconfig disable interconnect
# /usr/sbin/ilomconfig enable interconnect

If the Host Interconnect IP Address is (none) or 0,  then there is a possibility that ipmitool is hung.  ipmitool is used by ilomconfig to initialize the ILOM Interconnect after system boot so this address isn't configured if ipmitool is hung.  The ILOM must be reset to get ipmitool working again, but any applications that use ipmitool should include the "-I lanplus" option which uses ipmitool version 2.

# ipmitool -I lanplus -H "SP ipaddress" -U root fru

ILOM firmware 3.2.4 (System firmware 9.3.0.b or 8.6.0.b) disables ipmitool version 1.5 by default to increase system security.  It could be re-enabled on the ILOM, as follows:

-> set /SP/services/ipmi v1_5_sessions=enabled

 

 

3.  usbecm2's net is online & is addressed as 169.254.182.77 (unless customer modified server's configuration):

##### ifconfig -a / Explorer: sysconfig/ifconfig-a.out #####
net8: flags=100001000843"UP,BROADCAST,RUNNING,MULTICAST,IPv4,PHYSRUNNING" mtu 1500 index 7
inet 169.254.182.77 netmask ffffff00 broadcast 169.254.182.255 ether 2:21:28:57:47:17

##### # dladm show-phys -Z | grep usb / Explorer: netinfo/dladm/dladm_show-phys_-Z.out #####
LINK  ZONE   MEDIA  STATE SPEED DUPLEX DEVICE
net8 global Ethernet  up    10   full  usbecm2

##### # netstat -in / Explorer: netinfo/netstat-in.out #####
Name Mtu           Net/Dest Address       Ipkts Ierrs  Opkts Oerrs Collis Queue
net8 1500  169.254.182.0 169.254.182.77  424960   0   419901   0      0     0

 

The following may only be found on M-series servers!

##### # hotplug list -v | grep usb / Explorer: sysconfig/hotplug_list_-v.out #####
usb@0 <pci.0,0> ONLINE
usb@0,1 <pci.0,1> ONLINE
usb@0,2 <pci.0,2> ONLINE

##### hotplug list -l | grep usb / Explorer: sysconfig/hotplug_list_-l.out #####
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,1
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/device@4
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/device@4/keyboard@0
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/device@4/mouse@1
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/storage@2
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/storage@2/disk@0,0
/pci@400/pci@2/pci@0/pci@f/pci@0/usb@0,2/hub@2/hub@3/communications@3

##### hotplug list -l | grep "<pci.0" / Explorer: sysconfig/hotplug_list_-l.out #####
/pci@400/pci@1 <pci.0,0> ONLINE
/pci@400/pci@1/pci@0 <pci.0,0> ONLINE
/pci@400/pci@1/pci@0/pci@4 <pci.0,0> ONLINE
/pci@400/pci@1/pci@0/pci@6 <pci.0,0> ONLINE
/pci@400/pci@1/pci@0/pci@6 <pci.0,1> ONLINE
/pci@400/pci@1/pci@0/pci@6 <pci.0,2> ONLINE
/pci@400/pci@1/pci@0/pci@6 <pci.0,3> ONLINE
/pci@400/pci@2 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@0 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@0/pci@0 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@4 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@6 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@6 <pci.0,1> ONLINE
/pci@400/pci@2/pci@0/pci@7 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@7 <pci.0,1> ONLINE
/pci@400/pci@2/pci@0/pci@f <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@f/pci@0 <pci.0,0> ONLINE
/pci@400/pci@2/pci@0/pci@f/pci@0 <pci.0,1> ONLINE
/pci@400/pci@2/pci@0/pci@f/pci@0 <pci.0,2> ONLINE

The net# will vary depending on the server's network configuration, and is net8 in the example above. Notice that the interface is up at a speed of 10MHz.

The IP address for net8 is 169.254.182.77 which is the default configured by OHMP for usbecm2.  If the IP address is modified by the customer during server configuration, it must be located in address range 169.254.x.x.

Oracle RAC HAIP initially used an IP address range which overlapped this range, but was most likely modified years ago to allow usage of a different range.  See SR 3-5732004651.

 If the usbecm2 interface isn't configured or operational, then attempt to disable & enable the interconnect via OHMP's ilomconfig:

# /usr/sbin/ilomconfig disable interconnect
# /usr/sbin/ilomconfig enable interconnect

 If the interface fails to configure (especially with "ERROR: Internal error"), then manually configure the network interface:

root@pdom03:~# ipadm
NAME CLASS/TYPE STATE UNDER ADDR
net8 ip failed -- --
net8/v4 static inaccessible -- 169.254.182.77/24
root@pdom03:~# ipadm delete-addr net8/v4
root@pdom03:~# ipadm create-addr -T static -a 169.254.182.77/24 net8/v4
root@pdom03:~# ipadm
NAME CLASS/TYPE STATE UNDER ADDR
net8 ip ok -- --
net8/v4 static ok -- 169.254.182.77/24

 Finally, ensure that a failed USB device (like a memory stick) is not inserted on any of the USB connectors.  This could disable the entire bus.

 

4.  If LDoms are in use, the primary LDom must own the Interconnect's PCI path.

#####  # prtdiag -v  /  Explorer: sysconfig/prtdiag-v.out  #####

================================= IO Devices =================================
Slot            Bus   Name                      Speed       Path
/SYS/MB/USB     PCIE  usb-pciexclass,0c0330    5.0GTx1      /pci@340/pci@1/pci@0/pci@3/usb@0


##### # grep usbecm /etc/path_to_inst / Explorer: grep usbecm etc/path_to_inst #####

"/pci@340/pci@1/pci@0/pci@3/usb@0/hub@2/communications@3" 2 "usbecm"

#####  # ldm list -l  /  Explorer: sysconfig/ldm_list_-l.out  #####

NAME       STATE    FLAGS   CONS     VCPU   MEMORY    UTIL   NORM   UPTIME
primary   active   -n-cv-   UART      32      32G     2.4%   2.3%   66d 14h 50m
  pci@340 pci_1
bldgx0    active   -n----   5000      48      40G     0.4%   0.3%   31d 22h 26m
bldgx1    active   -n----   5001      64      88G     0.7%   0.6%   24d 6h 7m

Notice that the the USB interconnect is in path /pci@340, in this case.  The primary LDom owns pci@340 which is the PCI path.  If not, then this path must be moved so that the primary LDom controls it & both domains should be reset  followed by an ILOM reset.

 

5. Solaris's IP filter can also disable communication on this link.

Files /etc/ipf/ipf.conf or /etc/ipf/ippool.conf should be configured to allow communication to the interconnect's ILOM IP address which is 169.254.182.76 by default. For example, ipf.conf may contain the following:

# ssh:
pass in quick proto tcp from 100.63.22.0/23 to any port = 22
pass in quick proto tcp from 100.81.22.0/23 to any port = 22

# block everything else
block in quick all

In this case, communication is blocked by Solaris' IP filter since the interconnects address is not included. This file must also include to following for the internal interconnect to work:

pass in quick proto tcp from 169.254.182.76 to any port = 24

  

In one case where IP filter misconfiguration prevented the interconnect's operation, the Solaris IP address for usbecm2 was configured to 0:0:0:0 & a different network contained 169.254.182.77. This system was repaired by:

properly configure ipf.conf,
remove the misprogrammed network,
disable & enable the interconnect with ilomconfig,
power cycle the system,
clear the related faults.

  

If Solaris IP filtering isn't the problem, then use OHMP to disable & enable the interface:

# /usr/sbin/ilomconfig disable interconnect
# /usr/sbin/ilomconfig enable interconnect

  

 

The ILOM should have the following configuration for proper operation:

1. Ensure that the ILOM is online

See ILOM issue T5 related to ILOM hangs.

2. Ensure the /SP/network & /SP/network/interconnect are properly configured.

/SP/network state = enabled    or    ipv4-only iff ipv6 is disabled
/SP/network/interconnect state = enabled     May be missing if host managed with older firmware
/SP/network/interconnect allowed_services = fault-transport, ipmi, snmp   <--- fault-transport only needed for T5-x through T8-x FMA Fault Proxying
/SP/network/interconnect hostmanaged = true                  <--- Set when AutoConfig'd
/SP/network/interconnect ipaddress = 169.254.182.76

Please note that /SP/network/interconnect "state" will be missing on older system FW when the interface is host managed.  If ipv6 is disabled, then the state must be ipv4-only.
To online the network, do:

-> set /SP/network state=enabled

The interconnect allowable services are: fault-transport, https, ipmi, ssh, snmp.

Also note that IP address: 169.254.182.76 is the default configured by the ILOM for the interface, but may be changed by the customer.
This interface can be tested on the ILOM via commands:

-> set /SP/network/test ping=169.254.182.76
Ping of 169.254.182.76 succeeded

-> set /SP/network/test ping=169.254.182.77
Ping of 169.254.182.77 succeeded

 

 

The related M5/M6 doc 1683087.1


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback