Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1963309.1
Update Date:2018-03-19
Keywords:

Solution Type  Problem Resolution Sure

Solution  1963309.1 :   Master Subnet Manager is not same across all switches  


Related Items
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Sun Network QDR InfiniBand Gateway Switch
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  




In this Document
Symptoms
Changes
Cause
Solution


Created from <SR 3-10131423344>

Applies to:

Sun Datacenter InfiniBand Switch 36 - Version All Versions and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Information in this document applies to any platform.

Symptoms

 There are 2 kinds of symptoms:

1) Running patchmgr command with ibswitch_precheck returns errors like:
[ERROR    ]  No master InfiniBand switch found for <switch name>

[ERROR    ]  Master Subnet Manager is not same across all switches

2)  Running sminfo command on spine switch returns errors like:
ibwarn: [2080] mad_rpc: _do_madrpc failed; dport (Lid 30)

sminfo: iberror: failed: query

 

Changes

 Environment must be multi-rack.

Cause

These errors show up because we are using the fat-tree routing protocol on IB switches. The fat tree routing does not set up a route between spine switches. This is because it would introduce a potential for deadlock.
Hence, running sminfo from a spine switch with master SM being on another spine will not work unless you specify a DR path (direct route).
Knowing the correct DR path requires knowledge of the exact cabling because it is based on port numbers along the path.

Solution

 These errors can be safely ignored if:

  • multi-rack environment is present
  • concerns spine switch(es) only


Example checking using direct routes: 

How to determine the direct route:
# ibnetdiscover -s
DR path slid 0; dlid 0; 0,11,20 -> known remote switch {002128557e82c0a0} portnum 0 lid 0-0"SUN IB QDR GW switch infiniband36GW 10.10.10.23

# sminfo -D 0,11,20
sminfo: sm lid 0 sm guid 0x2128557e82c0a0, activity count 8570151 priority 5 state 2 SMINFO_STANDBY


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback