Asset ID: |
1-72-2344681.1 |
Update Date: | 2018-01-18 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2344681.1
:
Infiniband Switch ib02 Subnet Manager Stuck In DISCOVER State
Related Items |
- Exalogic Elastic Cloud X3-2 Hardware
- Sun Datacenter InfiniBand Switch 36
- Sun Network QDR InfiniBand Gateway Switch
|
Related Categories |
- PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
|
In this Document
Created from <SR 3-16522026954>
Applies to:
Exalogic Elastic Cloud X3-2 Hardware - Version X3 and later
Sun Datacenter InfiniBand Switch 36 - Version All Versions and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Information in this document applies to any platform.
Symptoms
Infiniband switch ib02 is stuck in DISCOVER state
[root@ib02 ~]# showfruinfo
Sun_Man1R:
UNIX_Timestamp32 : Thu Jul 27 09:15:03 2017
Sun_Fru_Description : ASSY,NM2-GW
Vendor_ID_Code : 13 A6
Vendor_ID_Code_Source : 01
Vendor_Name_And_Site_Location : 5030 CELESTICA CORP. SRIRACHA CHONBURI TH
Sun_Part_Number : 7057249
Sun_Serial_Number : 465769T+1326RT02NG
Serial_Number_Format : 4V3F1-2Y2W2X4S
Initial_HW_Dash_Level : 99
Initial_HW_Rev_Level : 01
Sun_Fru_Shortname : NM2 gateway
Sun_Hazard_Class_Code : Y
Sun_SpecPartNo : 7054735
Sun_FRU_LabelR:
Sun_Serial_Number : AK0000001
FRU_Part_Dash_Number : 7054724
[root@ib02 ~]# getmaster
Local SM enabled and running, state DISCOVER <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Last change in Master SubnetManager status detected at: Thu Dec 28 13:53:55 GMT 2017
Master SubnetManager on sm lid 1 sm guid 0x10e02edaaaaaa0 : SUN IB QDR GW switch ib03 10.10.10.23
Master SubnetManager Activity Count: 31857 Priority: 5
Action Taken
[root@ib02 ~]# smsubnetprotection list active
No active secret mkeys configured on the system
[root@ib03 ~]# smsubnetprotection list active
# File_format_version_number 1
# Sun DCS IB mkey config file
# This file is generated, do not edit
# secretmkey=enabled
# nodeid=ib02.us.com
# time= 4 Sep 23:38:30
# checksum=76e970f569a14507faab158ed4e9a40d
#! commit_number : 3
Mkey Untrusted Mkey Smkey Attribute
------------------ ------------------ ------------------ ---------
0xa000000000000001 0xafecb1b0cad65d59 0x35cc89c81432d02e C
The following is logged in the /var/log/messages:
Dec 28 09:38:42 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:38:52 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:02 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:12 ib02 OpenSM[3532]: SM port is down#012
And the following in the /var/log/opensm.log:
Dec 28 15:00:07 419471 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10e02edbbbbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:07 419471 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:07 420470 [B5D07B70] 0x80 -> SM port is down
SM port is down
Dec 28 15:00:17 426422 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10e02edbbbbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:17 426422 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:17 427422 [B5D07B70] 0x80 -> SM port is down
Cause
This switch recently replaced
The following document wasn't properly properly followed to ensure the secret M-Key was configured and propagated correctly?
Infiniband Switch Replacement - Follow-up Actions (Doc ID 2125203.1)
Solution
First you need to ensure the following has the IP addresses of the switches running opensm. This should produce the same output on all switches running opensm:
# smnodes list
Then the steps to check and propagate the secret M-Key:
6. Check/propagate secret M-Key policy from the running SM master.
On the switch running as the current Master, check if secret M-Key policy is in use. To check that, run the following command on the current Master switch:
#smsubnetprotection list active
Only if the output above shows secret M-keys, run the following commands on this Master switch:
#smsubnetprotection start
#smsubnetprotection commit
This will make sure that secret M-Keys policy (if used) is propagated to all switches listed in the smnodes list.
Prior to commit, please ensure all IB switches, participating in this secret M-keys replication, have the identical replication password inside /conf/mkey_password.
Issue resolved when procedure followed
Attachments
This solution has no attachment