Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1985159.1
Update Date:2018-02-14
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1985159.1 :   Updating IB partitions after replacing an Infiniband HCA in any nodes within IB network - steps to do after replacing HCA  


Related Items
  • Sun Infiniband HCA
  •  
  • Oracle Exadata Hardware
  •  
  • Exadata X4-2 Full Rack
  •  
  • Oracle SuperCluster Specific Software
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  




In this Document
Purpose
Details


Applies to:

Sun Infiniband HCA - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster Specific Software
Oracle Exadata Hardware
Exadata X4-2 Full Rack
Information in this document applies to any platform.

Purpose

 This documents shows steps to update infiniband partitions after replacing an infiniband HCA in any of the nodes within the infiniband network.  This is applicable to exadata, exalogic or supercluster, or any infiniband network where IB partitions are created.

Details

If an HCA is replaced in any of the nodes within an infiniband network where IB partitions exist, and if the port guids of the replaced HCA are in any of the IB partitions other than the default partition,  it is important to make sure that the port GUIDs of the HCA is updated in all those partitions.  The following are the steps to do this.

 

1.  On the IB switch running as MASTER, run the following command to find out details of IB partitioning:

    (Note: The command sminfo run on any node will show who the MASTER is )

     #smpartition list active

   If there are no IB partitions other than default partitions,  then there is nothing more to do.  All port GUIDs of all HCAs will automatically become members of the default partition.

   If there are no user-created IB partitions, the output of the above command may look like the following:

                # smpartition list active
                # Sun DCS IB partition config file
                # This file is generated, do not edit
                #! version_number : 29
                Default=0x7fff,ipoib,defmember=full:    <<<<<< default
                ALL_CAS=full,
                ALL_SWITCHES=full,
                SELF=full;
                SUN_DCS=0x0001, ipoib :    <<<<<<<<  This may be seen in all.  And, all GUIDS are members of this by default.
                ALL_SWITCHES=full;

   The rest of the steps within this document need to be done only if there are user-created IB partitions other than what is shown above.

2.  If there are user-created IB partitions, and if the port GUIDs of the replaced HCA belonged to any of those partitions,  the port GUIDs of the old HCA have to be replaced with the port GUIDs of the new HCA.

So, before removing the old HCA, run the following command on that node.

    #ibstat

Here is an example:

# ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.8130
        Hardware version: b0
        Node GUID: 0x0021280001cf8e4e
        System image GUID: 0x0021280001cf8e51
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 8
                LMC: 0
                SM lid: 7
                Capability mask: 0x02510868
                Port GUID: 0x0021280001cf8e4f     <<<<<<<< 
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 19
                LMC: 0
                SM lid: 1
                Capability mask: 0x02510868
                Port GUID: 0x0021280001cf8e50     <<<<<<<<<<
                Link layer: InfiniBand

Also, if SR_IOV is in use (Supercluster) and Virtual functions are created on the IB HCA,  the virtual port GUIDs in all the guest domains will have to be found out by running the above command in all those guest domains as well.

Note: if the card is faulty, and if this command does not give the correct output,  check if this data is available in any of the previously collected data.

         It may be possible to derive the port GUIDs of the old HCA from the label placed on it, if old HCA is available.  Normally, node GUID and port GUIDs are adjacent numbers.  Have a look at the sample ibstat output shown above.

  If there is no way to know the port GUIDs of the old HCA,  go to step 5.

3. Collect the output of ibstat after replacing HCA.  This will give port GUIDs of the new HCA.  Collect the output of ibstat in domains with VFs(Virtual Functions) as well, if Virtual functions are in use.

    Now, you have port GUIDs and virtual ports GUIDs  of the old HCA as well as the new HCA.

4.  Login to the IB switch which is running as the MASTER and do the following:

      On the switch running MASTER

            #smpartition start
                 This will create a file /conf/partitions.conf.tmp
                 Edit this file and replace the port GUID of the previous HCA with the port GUID of the new HCA.  (Step 2 has given old port GUIDs and step 3 has given new GUIDs )
                  Then,
             #smpartition commit

 

       Skip step 5, as it is not needed now

5. This steps is to be done if port GUIDs of old HCA are not known.

If port GUIDs of the old HCA are not available, one needs to find out what partition pkeys each of the ports of the HCA belonged.  This information has to be found out from the user (customer).  This information may be derived from the available data such as explorers or sosreport.  The exact steps to find this out is hard to list here as it depends on the customer environment.

The following command on the IB switch running as MASTER will show the IB partitions and member guids of all those partitions:

 

         #smpartition list active

 

  Now, from the information available, as well as the information given by the user (customer), find out which partitions each of the port GUIDs of this HCA must belong.  Once it is known, add the port GUID of the new HCA to each of those partitions as follows:

 

       On the IB switch running as MASTER,

         #smpartition start

         #smpartition add -n <name of partition> -port <port guid> -m <membership type>

               where, the name of the partition is the name (as shown in the output of "smpartition list active") to which the port GUID must belong

            Do the above commands for all the needed partitions for both port GUIDs.

               Note:  Need to add the new port GUIDs to the needed partitions only.

            After doing all that,

        #smpartition commit

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback