Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2344681.1
Update Date:2018-01-18
Keywords:

Solution Type  Problem Resolution Sure

Solution  2344681.1 :   Infiniband Switch ib02 Subnet Manager Stuck In DISCOVER State  


Related Items
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Sun Network QDR InfiniBand Gateway Switch
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-16522026954>

Applies to:

Exalogic Elastic Cloud X3-2 Hardware - Version X3 and later
Sun Datacenter InfiniBand Switch 36 - Version All Versions and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Information in this document applies to any platform.

Symptoms

Infiniband switch ib02 is stuck in DISCOVER state

[root@ib02 ~]# showfruinfo

Sun_Man1R:
  UNIX_Timestamp32 : Thu Jul 27 09:15:03 2017
  Sun_Fru_Description : ASSY,NM2-GW
  Vendor_ID_Code : 13 A6
  Vendor_ID_Code_Source : 01
  Vendor_Name_And_Site_Location : 5030 CELESTICA CORP. SRIRACHA CHONBURI TH
  Sun_Part_Number : 7057249
  Sun_Serial_Number : 465769T+1326RT02NG
  Serial_Number_Format : 4V3F1-2Y2W2X4S
  Initial_HW_Dash_Level : 99
  Initial_HW_Rev_Level : 01
  Sun_Fru_Shortname : NM2 gateway
  Sun_Hazard_Class_Code : Y
  Sun_SpecPartNo : 7054735

Sun_FRU_LabelR:
  Sun_Serial_Number : AK0000001
  FRU_Part_Dash_Number : 7054724


[root@ib02 ~]# getmaster
Local SM enabled and running, state DISCOVER <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Last change in Master SubnetManager status detected at: Thu Dec 28 13:53:55 GMT 2017
Master SubnetManager on sm lid 1 sm guid 0x10e02edaaaaaa0 : SUN IB QDR GW switch ib03 10.10.10.23
Master SubnetManager Activity Count: 31857 Priority: 5

 


Action Taken
[root@ib02 ~]# smsubnetprotection list active
No active secret mkeys configured on the system

[root@ib03 ~]# smsubnetprotection list active
# File_format_version_number 1
# Sun DCS IB mkey config file
# This file is generated, do not edit
# secretmkey=enabled
# nodeid=ib02.us.com
# time= 4 Sep 23:38:30
# checksum=76e970f569a14507faab158ed4e9a40d
#! commit_number : 3
Mkey Untrusted Mkey Smkey Attribute
------------------ ------------------ ------------------ ---------
0xa000000000000001 0xafecb1b0cad65d59 0x35cc89c81432d02e C

 

The following is logged in the /var/log/messages:
Dec 28 09:38:42 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:38:52 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:02 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:12 ib02 OpenSM[3532]: SM port is down#012

And the following in the /var/log/opensm.log:

Dec 28 15:00:07 419471 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10e02edbbbbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:07 419471 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:07 420470 [B5D07B70] 0x80 -> SM port is down
SM port is down

Dec 28 15:00:17 426422 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10e02edbbbbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:17 426422 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:17 427422 [B5D07B70] 0x80 -> SM port is down

Cause

This switch recently replaced

The following document wasn't properly properly followed to ensure the secret M-Key was configured and propagated correctly?

Infiniband Switch Replacement - Follow-up Actions (Doc ID 2125203.1)


Solution

 

 

First you need to ensure the following has the IP addresses of the switches running opensm. This should produce the same output on all switches running opensm:

# smnodes list

 

Then the steps to check and propagate the secret M-Key:

6. Check/propagate secret M-Key policy from the running SM master.
On the switch running as the current Master, check if secret M-Key policy is in use. To check that, run the following command on the current Master switch:

#smsubnetprotection list active

Only if the output above shows secret M-keys, run the following commands on this Master switch:

#smsubnetprotection start
#smsubnetprotection commit

This will make sure that secret M-Keys policy (if used) is propagated to all switches listed in the smnodes list.

Prior to commit, please ensure all IB switches, participating in this secret M-keys replication, have the identical replication password inside /conf/mkey_password.

Issue resolved when procedure followed

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback