![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2084598.1 : smpartition start - Fails With Unable To Get Rpc Version On Some Nodes In The Fabric
In this Document
Created from <SR 3-11769456148> Applies to:Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]Sun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. Symptomssmpartition start (i.e. /usr/local/sbin/smpartition start) can fail if peer checks by partitiond (i.e. /usr/local/util/partitiond) fails. Here are some example scenarios. 1) - /var/log/messages - 2) - /var/log/messages - 3) - /var/log/messages -
ChangesIB environments can use IB partitions and smpartition start starts a session to edit IB partitions.
CauseThe IB switch on which smpartition start is invoked has to be the master subnet manager (i.e. SMINFO_MASTER). This is because IB partitions can only be modified on SMINFO_MASTER. partitiond on SMINFO_MASTER talks to other partitiond on other IB switches specified in the smnodes list. In short, the smnodes list includes only the IP addresses of IB switches running the subnet manager (i.e. OpenSM). Every IB switch maintains its own smnodes list. Scenarios 1 means partitiond on SMINFO_MASTER is not able to acquire RPC version from the peer partitiond on the peer IB switch with IP address, W.X.Y.Z. This can happen if the peer IB switch is not running portmap (i.e. /sbin/portmap) OR portmap is running but partitiond is not running. partitiond is an RPC program, so when it starts, it has to register itself with portmap. Scenario 2 means partitiond on SMINFO_MASTER is not able to communicate with the peer partitiond on the peer IB switch with IP address, W.X.Y.Z. This can happen if the peer IB switch is not running partitiond, hence no communication at all. Scenario 3 means the smnodes list on SMINFO_MASTER is not the same as that on the peer IB switch with IP address, W.X.Y.Z.
SolutionFor scenario 1, check portmap and start it if is not running, then check partitiond and start it if it is not running. On SMINFO_MASTER, just run: # rpcinfo -p W.X.Y.Z On the peer IB switch with IP address, W.X.Y.Z, just run: # rpcinfo -p # service portmap status # service portmap start <<-- if not already running # ps -ef | grep 'portmap' On the peer IB switch with IP address, W.X.Y.Z, just run: # service partconfigd status # enablesm <<-- if not already running # ps -ef | grep 'part' # rpcinfo -p # netstat -lnp For scenario 2, check partitiond and start it if it is not running. On the peer IB switch with IP address, W.X.Y.Z, just run: # service partconfigd status # enablesm <<-- if not already running # ps -ef | grep 'part' For scenario 3, compare the smnodes list between that on SMINFO_MASTER and that on the peer IB switch with IP address, W.X.Y.Z. On the peer IB switch with IP address, W.X.Y.Z, just run: # smnodes list # smnodes delete <...> # smnodes add <...>
enablesm (i.e. /usr/local/sbin/enablesm) starts opensm (i.e. via service opensmd start ) followed by partitiond (i.e. via service partconfigd start). disablesm (i.e. /usr/local/sbin/disablesm) stops the processes started by enablesm in reverse order. opensm is the running process name as shown in the "ps -ef" outputs whereas /etc/init.d/opensmd is the corresponding service script. partitiond is the running process name as shown in the "ps -ef" outputs whereas /etc/init.d/partconfigd is the corresponding service script.
Attachments This solution has no attachment |
||||||||||||||||||
|