Asset ID: |
1-72-2362194.1 |
Update Date: | 2018-02-26 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2362194.1
:
getmaster returns "Local SM enabled and running, state MASTER NOT UP"
Related Items |
- Sun Datacenter InfiniBand Switch 36
- Exalogic Elastic Cloud X5-2 Eighth Rack
- Sun Network QDR InfiniBand Gateway Switch
|
Related Categories |
- PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
|
In this Document
Created from <SR 3-16868015291>
Applies to:
Exalogic Elastic Cloud X5-2 Eighth Rack - Version X5 and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases]
Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Symptoms
el01gw02 ~]# getmaster
Local SM enabled and running, state MASTER NOT UP
20180215 13:05:47 No Master SubnetManager seen in the system
Switch 36 "S-0010e06087a8c0a0" # "SUN IB QDR GW switch el01gw01 192.169.55.231" enhanced port 0 lid 0 lmc 0
Switch 36 "S-0010e061c188c0a0" # "SUN IB QDR GW switch el01gw02 192.169.55.232" enhanced port 0 lid 0 lmc 0
NOTE: the lid is zero for both switches as well as some or all of the nodes.
Changes
Manual manipulation of partition files
Cause
In snapshot @var@log@opensm.log
Feb 15 17:56:24 588136 [B75546C0] 0x02 -> osm_prtn_config_parse_version: Cannot open config file '/conf/partitions.conf.051': No such file or directory
Snapshot @tmp@json@smpartition:
smpartition({"smstatus":"Master","configuration":"Valid","version":"0051"});
Snapshot @conf@partitions.current
# Sun DCS IB partition config file
# This file is generated, do not edit
#! version_number : 51
Default=0x7fff, ipoib :
ALL_CAS=both,
ALL_SWITCHES=full,
# ls -l /conf/part*
-rw-r--r-- 1 root root 206 2015-05-27 10:31 partitions.conf.000
-rw-r--r-- 1 root root 2779 2018-02-14 16:19 partitions.conf.052
-rw-r--r-- 1 root root 2779 2018-02-14 16:13 partitions.conf.052_bkup_20180214
lrwxrwxrwx 1 root root 25 2018-02-12 09:05 partitions.current -> /conf/partitions.conf.052
# grep version /conf/part*
/conf/partitions.conf.000:#! version_number : 0
/conf/partitions.conf.052:#! version_number : 51
/conf/partitions.conf.052_bkup_20180214:#! version_number : 52
/conf/partitions.current:#! version_number : 51
The /conf/partitions.conf files were manually manipulated and broke opensm.
Solution
Correct the version number in the /conf/partitions.conf.052 to contain the correct version of 52
# rm /conf/partitions.conf.052
# mv partitions.conf.052_bkup_20180214 /conf/partitions.conf.052
# disablesm
# enablesm
Then verify you are connected to the Subnet Manager Master.
# getmaster
Local SM enabled and running, state STANDBY
Last change in Master SubnetManager status detected at: Fri Feb 16 07:31:25 GMT 2018
Master SubnetManager on sm lid 13 sm guid 0x10e0ceeaa0c0a0 : SUN IB QDR GW switch el01gw02 192.169.55.232
Master SubnetManager Activity Count: 359586 Priority: 5
el01gw02# smpartition start
el01gw02# smpartition commit
Login to each switch running opensm and verify the version numbers are all in sync:
#smpartition list active
# Sun DCS IB partition config file
# This file is generated, do not edit
#! version_number : 53
Default=0x7fff, ipoib :
ALL_CAS=both,
ALL_SWITCHES=full,
Verify subnet manager is running on all switches that are supposed to be running subnet manager.
#getmaster
References
<NOTE:1606569.1> - All 3 IB Switches on the single rack fabric are LID 0 And SM Master Is Not Running
Attachments
This solution has no attachment