Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2209786.1
Update Date:2016-12-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  2209786.1 :   OVN (Xsigo), Two F1-15 Fabric Interconnects Show Duplicate HCA GUID, OpenSM Won't Come Out of Discover  


Related Items
  • Oracle Fabric Interconnect F1-15
  •  
  • Oracle Fabric Interconnect F1-4
  •  
  • Oracle Fabric Interconnect F1-15
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Oracle Virtual Networking
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-13550566031>

Applies to:

Oracle Fabric Interconnect F1-15 - Version All Versions to All Versions [Release All Releases]
Oracle Fabric Interconnect F1-4 - Version All Versions to All Versions [Release All Releases]
Oracle Fabric Interconnect F1-15
Information in this document applies to any platform.

Symptoms

In a meshed IB Fabric, OpenSM never comes out of Discover state to Master or Standby for either of the two Fabric Interconnects:

Example:

admin@xsigo1[xsigo] show diagnostics sm-info
- SM is running on xsigo1
- SM Lid 2
- SM Guid 0x13970201001960
- SM key 0x0
- SM priority 0
- SM State DISCOVER

opensm.log shows:

Nov 07 15:12:36 873584 [B5688B70] 0x01 -> Directed Path Dump of 3 hop path: Path = 0,1,1,36
Nov 07 15:12:36 873599 [B5688B70] 0x01 -> Directed Path Dump of 3 hop path: Path = 0,1,1,36
Nov 07 15:12:36 919884 [B5688B70] 0x81 -> report_duplicated_guid: ERR 0D01: Found duplicated node GUID.

Changes

Found when moving from two separate IB Fabrics (both Fabric Interconnects configured as OpenSM master - not meshed) to moving to meshed IB Fabric (OpenSM master and Standby)   Following instructions to set subnet manager faslse (#set system is-subnet-manager false), and then running "guid2lid" to zero out the lid table, resulted in one or both Fabric Interconnects staying in "Discover" state. 

Cause

Both Fabric Interconnect's HCA had the same port GUID.   This is a manufacturing defect, and at this time it isn't known if this was a one time defect, or if there are other Front Panels that have duplicated HCA Port GUIDs in inventory.

How to find if both Fabric Interconnects HCAs have the same HCA Port GUID:

Run the command below, logged to Fabric Interconnect Command Line Interface (CLI) as user 'root' on both Fabric Interconnects:

Example output:

root@xsigo1:~# cat /sys/class/infiniband/mlx4_0/ports/1/gids/0
fe80:0000:0000:0000:0013:9702:0100:1960

root@xsigo2:/# cat /sys/class/infiniband/mlx4_0/ports/1/gids/0
fe80:0000:0000:0000:0013:9702:0100:1960

Solution

Replace the Front Panel in one of the Fabric Interconnects using this KB:

How to replace a Gen2 Front Panel on Oracle Fabric Interconnects (Xsigo) (Doc ID 1663431.1)

NOTE:   Customer noted in order to get all server-profiles and v-star devices to come fully up/up after replacing one of the Front Panels, had to "Disconnect and Reconnect" all server-profiles. 

 

References

<NOTE:1663431.1> - How to replace a Gen2 Front Panel on Oracle Fabric Interconnects (Xsigo)

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback