Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2362194.1
Update Date:2018-02-26
Keywords:

Solution Type  Problem Resolution Sure

Solution  2362194.1 :   getmaster returns "Local SM enabled and running, state MASTER NOT UP"  


Related Items
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Exalogic Elastic Cloud X5-2 Eighth Rack
  •  
  • Sun Network QDR InfiniBand Gateway Switch
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-16868015291>

Applies to:

Exalogic Elastic Cloud X5-2 Eighth Rack - Version X5 and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases]
Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

el01gw02 ~]# getmaster
Local SM enabled and running, state MASTER NOT UP
20180215 13:05:47 No Master SubnetManager seen in the system

Switch 36 "S-0010e06087a8c0a0" # "SUN IB QDR GW switch el01gw01 192.169.55.231" enhanced port 0 lid 0 lmc 0
Switch 36 "S-0010e061c188c0a0" # "SUN IB QDR GW switch el01gw02 192.169.55.232" enhanced port 0 lid 0 lmc 0

NOTE: the lid is zero for both switches as well as some or all of the nodes.

Changes

 Manual manipulation of partition files

Cause

In snapshot @var@log@opensm.log
Feb 15 17:56:24 588136 [B75546C0] 0x02 -> osm_prtn_config_parse_version: Cannot open config file '/conf/partitions.conf.051': No such file or directory


Snapshot @tmp@json@smpartition:
smpartition({"smstatus":"Master","configuration":"Valid","version":"0051"});

Snapshot @conf@partitions.current
# Sun DCS IB partition config file
# This file is generated, do not edit
#! version_number : 51
Default=0x7fff, ipoib :
ALL_CAS=both,
ALL_SWITCHES=full,



# ls -l /conf/part*
-rw-r--r-- 1 root root 206 2015-05-27 10:31 partitions.conf.000
-rw-r--r-- 1 root root 2779 2018-02-14 16:19 partitions.conf.052
-rw-r--r-- 1 root root 2779 2018-02-14 16:13 partitions.conf.052_bkup_20180214
lrwxrwxrwx 1 root root 25 2018-02-12 09:05 partitions.current -> /conf/partitions.conf.052

# grep version /conf/part*
/conf/partitions.conf.000:#! version_number : 0
/conf/partitions.conf.052:#! version_number : 51
/conf/partitions.conf.052_bkup_20180214:#! version_number : 52
/conf/partitions.current:#! version_number : 51


The /conf/partitions.conf files were manually manipulated and broke opensm.
 

 

Solution

Correct the version number in the /conf/partitions.conf.052 to contain the correct version of 52

# rm /conf/partitions.conf.052
# mv partitions.conf.052_bkup_20180214 /conf/partitions.conf.052
# disablesm
# enablesm

Then verify you are connected to the Subnet Manager Master.

# getmaster

Local SM enabled and running, state STANDBY

Last change in Master SubnetManager status detected at: Fri Feb 16 07:31:25 GMT 2018
Master SubnetManager on sm lid 13 sm guid 0x10e0ceeaa0c0a0 : SUN IB QDR GW switch el01gw02 192.169.55.232
Master SubnetManager Activity Count: 359586 Priority: 5

el01gw02# smpartition start

el01gw02# smpartition commit

 

 

Login to each switch running opensm and verify the version numbers are all in sync:

#smpartition list active
# Sun DCS IB partition config file
# This file is generated, do not edit
#! version_number : 53
Default=0x7fff, ipoib :
ALL_CAS=both,
ALL_SWITCHES=full,

 

Verify subnet manager is running on all switches that are supposed to be running subnet manager.

#getmaster

 


 

References

<NOTE:1606569.1> - All 3 IB Switches on the single rack fabric are LID 0 And SM Master Is Not Running

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback