Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1573312.1
Update Date:2016-10-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  1573312.1 :   "sminfo" On The Infiniband Switch Reports: ibwarn: [11010] mad_rpc: _do_madrpc failed; dport  


Related Items
  • Exadata X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  


sminfo on the IB switch reports "mad_rpc:_do_madrpc failed"

In this Document
Symptoms
Changes
Cause
Solution


Created from <SR 3-7564255315>

Applies to:

Exadata X3-2 Hardware - Version All Versions and later
Information in this document applies to any platform.

Symptoms

Executing sminfo command gives ibwarn and sminfo errors:

ibwarn: [11010] mad_rpc: _do_madrpc failed; dport (Lid 2)   
sminfo: iberror: failed: query

  

Below messages logged in /var/log/messages when switch/opensmd was restarted:

  

OpenSM[1639]: Entering DISCOVERING state  
OpenSM[1639]: Entering MASTER state  
partitiond: No valid partition file
whereismaster[1800]: No Master SubnetManager seen in the system
whereismaster[1800]: No Master SubnetManager seen in the system

  

 

Also, most of the switch ports/host HCA ports are in INIT state  (This can be identified from switch command output #/usr/bin/ibdiagnet -skip dup_guids -pm ).

 

Changes

The partition file may have been inadvertently deleted, and/or the partition valid flag in /conf/configvalid may have recently changed value to false(0)

 

Cause

If the switch is running firmware version lower than 2.0, the partition valid flag in /conf/configvalid may be set to false(0) and stay in this state even after reboot.

This happens if partitiond is not able to signal the SM when SM becomes master. If this happens the SM will not be fully operational.

Error in /var/log/opensm.log shows that partitiond is not able to find a valid partition file.

partitiond: No valid partition file

 

Solution

Check if the switch is running firmware version lower than 2.0 and if so,  check if the partition file if exists.

#ls -l /conf/partition.conf
 

 If exists check to see if configvalid file has the value set to '1'.

  

#cat /conf/configvalid

0

 

 

 

If configvalid file is set to '0' , change the value to '1' :

  

#disablesm
#echo 1 > /conf/configvalid
#enablesm

  

If the switch is running firmware version 2.0 or newer,  check if this switch is running opensm

#service opensmd status

If it is runing opensm, then check the output of the following command

#smnodes list

The output of this must be identical to that on the IB switch currently running as Master, and it must contain the management ip addresses of all IB switches running opensm in this IB fabric.

Once that is verified and fixed, propagate IB partitions to all IB switches running opensm by running the following two commands on the IB switch running as the Master.

#smpartition start
#smpartition commit

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback