Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2044499.1
Update Date:2018-04-18
Keywords:

Solution Type  Technical Instruction Sure

Solution  2044499.1 :   How to Replace a Infiniband HCA Card in Oracle SuperCluster Compute / DB nodes  


Related Items
  • Oracle SuperCluster T5-8 Full Rack
  •  
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SuperCluster-Mx
  •  


Replacement of HCA IB network port cards in SuperCluster requires additional steps to reconfigure the HCA port GUID's in the Infiniband Fabric.

In this Document
Goal
Solution
 Process
 1. Identify failing card from fmadm or explorer analysis
 2. Collect firmware and port GUID information
 3. Determine which IB switch is master.
 4. Collect IB partition data
 5. Collect ibstat(1M)
 6. Ordering parts.
 7. Preparing the server
 8. Record new component GUID's and replace the card.
 9. Update the IB fabric master partition by changing the port GUIDs on the master IB siwtch.
 10. Power on and reboot the primary LDOM
 11a. Check HCA Firmware.
 11b. Flashing HCA Firmware
 11c. Checking the new HCA card port GUID's and other sanity checks
 12. Restore operation.
 13. Verify Infiniband topology.
 14. DB Node Startup Verification


Applies to:

Oracle SuperCluster T5-8 Hardware - Version All Versions and later
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
Oracle SuperCluster T5-8 Half Rack - Version All Versions and later
SPARC SuperCluster T4-4 Full Rack - Version All Versions and later
Oracle Solaris on SPARC (64-bit)
SPARC

Goal

 

Replacement of HCA IB network port cards in SuperCluster requires additional steps to ensure the new component has the correct firmware installed and to reconfigure the HCA port GUID's in the Infiniband Fabric. Failure to do so leaves the system in a dysfunctional state and causes further downtime for the customer. This document describes how to replace a faulty Infiniband card in SuperCluster Compute and Database Nodes and ensure that the FW is updated and the port GUID's are correct.

Solution

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Hot swap of PCI adapters is not currently supported in any SuperCluster system models, primarliry because we have a High Availability cluster set up and it should be possible to fail over all apps, etc. to another server node. The server that contains the faulty Infiniband HCA card should have its services offline and the server itself powered off.

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

The instructions below assume the customer DBA is available and working with the field engineer onsite to manage the host OS and DB/ASM services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer DBA wants or allows or needs help with any of the steps.

Further more it is up to the customer system administrators to make these configuration changes. Customers with Platinum Service may request remote assistace from Oracle Support.

NOTE: This process may require a IB card firmware update. SUNWfwflash package is required and is installed by default on SuperCluster systems.
           Restoring this package should it have been removed is beyond the scope of this document.

Process

Steps involved to replace a card are as follows and assume the customer has opened a service request for the service action:

1. Identify failing card from fmadm or explorer analysis

o. The first step is to identidy the correct component from available explorer data and / or Solaris Fault Management (FMA). Reference the following document for further assistance:

-> How to identify Infiniband cards on Oracle SuperCluster (Doc ID 2021618.1)

2. Collect firmware and port GUID information

o. On the ldom containing the hardware, run fwflash -c IB -l command to gather *all* IB card and port GUID details and post these details in the SR

root@orlt4db01:~# fwflash -c IB -l
List of available devices:
Device[0] /devices/pci@400/pci@2/pci@0/pci@2/pciex15b3,673c@0:devctl
Driver mcxnex
Class [IB]
GUID: System Image - 0021280001cee60d
Node Image - 0021280001cee60a
Port 1 - 0021280001cee60b    <<<<<<<< PORT GUID's
Port 2 - 0021280001cee60c    <<<<<<<< PORT GUID's
Mac 1 - 0000002128cee60a
Mac 2 - 0000002128cee60b
Firmware revision : 2.11.2010 <<<<<<< FW version
Product : 375-3697-01 B0
PSID : SUN0160000002
Description : Sun QMirage

3. Determine which IB switch is master.

o. The following demonstrates how to build a file with a list of switches and run 'getmaster' from one node on all switches using 'dcli': 

root@orlt4db02:~# grep sw /etc/hosts | grep ib
10.141.177.146 orlt4sw-ib1.us.oracle.com orlt4sw-ib1
10.141.177.147 orlt4sw-ib2.us.oracle.com orlt4sw-ib2
10.141.177.148 orlt4sw-ib3.us.oracle.com orlt4sw-ib3
root@orlt4db02:~# grep sw /etc/hosts | grep ib | awk '{print $3}' >> sw
root@orlt4db02:~# export PATH=$PATH:/opt/oracle.supercluster/bin
root@orlt4db02:~# dcli -g sw -l root getmaster
Unable to connect to cells: ['orlt4sw-ib1']
orlt4sw-ib2: Local SM enabled and running, state MASTER
orlt4sw-ib2: 20170526 13:47:47 Master SubnetManager on sm lid 5 sm guid 0x21284694a9a0a0 : SUN DCS 36P QDR orlt4sw-ib2 10.141.177.147
orlt4sw-ib3: Local SM enabled and running, state STAND BY
orlt4sw-ib3: 20170526 13:47:11 Master SubnetManager on sm lid 5 sm guid 0x21284694a9a0a0 : SUN DCS 36P QDR orlt4sw-ib2 10.141.177.147
root@orlt4db02:~#

4. Collect IB partition data

o. On the Master switch run smpartition list active and retain the output:

[root@orlt4sw-ib2 ~]# smpartition list active
# Sun DCS IB partition config file
# This file is generated, do not edit
#! version_number : 57
Default=0x7fff, ipoib :
ALL_CAS=full,
ALL_SWITCHES=full,
SELF=full;
SUN_DCS=0x0001, ipoib :
ALL_SWITCHES=full;
ic1s10 = 0x0501,ipoib,defmember=full:
0x0021280001cee61b,
0x0021280001cf023b;
ic2s10 = 0x0502,ipoib,defmember=full:
0x0021280001cee61c,
0x0021280001cf023c;
sto = 0x0503,ipoib,defmember=full:
0x0021280001cee60c,  <<<< BAD CARD GUID
0x0021280001cee60b,  <<<< BAD CARD GUID
0x0021280001cee61c,
0x0021280001cee61b,
0x0021280001cee604,
0x0021280001cee603,
0x0021280001cee6f0,
0x0021280001cee6ef,
0x0021280001cebd44,
0x0021280001cebd43,
0x0021280001cf023c,
0x0021280001cf023b,
0x0021280001cec2d4,
0x0021280001cec2d3,
0x0021280001cee51c,
0x0021280001cee51b,
0x0021280001ced843,
0x0021280001ced844,
0x0021280001cf1a23,
0x0021280001cf1a24;
ic1s11 = 0x0511,ipoib,defmember=full:
0x0021280001cee603,
0x0021280001cec2d3;
ic2s11 = 0x0512,ipoib,defmember=full:
0x0021280001cee604,
0x0021280001cec2d4;

5. Collect ibstat(1M)

o. Before removing the old HCA, run the following command in the LDOM to which the card being replaced belongs:

root@orlt4db01:~# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.11.2010
Hardware version: 176
Node GUID: 0x0021280001cee60a
System image GUID: 0x0021280001cee60d
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 16
LMC: 0
SM lid: 5
Capability mask: 0x02100000
Port GUID: 0x0021280001cee60b   <<<< BAD CARD GUID
Link layer: IB
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 17
LMC: 0
SM lid: 5
Capability mask: 0x02100000
Port GUID: 0x0021280001cee60c   <<<< BAD CARD GUID
Link layer: IB
NOTE: If there are any IO domains (guest LDOMS created by SuperCluster Virtual Assistant / IO Domain creation tool) that have Virtual functions created
          on the IB HCA being replaced, this process should not affect them, as the virtual functions retreive the VF GUIDs from the node which provides them.

6. Ordering parts.

o. Hardware TSC team needs to make sure they order the *correct version* of the card.

WARNING: SuperCluster Systems are built with Mellanox Infiniband HCA's. The Solaris driver is mcxnex. Currently we have a type M2 and a type M3 card.
                
                ALL IB HCA CARDS IN THE FABRIC MUST BE THE SAME REVISION AND AT THE SAME FW LEVEL.

7. Preparing the server

o. During outage window the physical hardware must be shut down and powered off (i.e.,  stop /HOST has to be issued from the ILOM shell).
o. Some SSC systems have multiple physical domains (PDOMs). Only the PDOM containing the hardware, and subsequently any logical guest
    domains (LDOMs) configured in the PDOM need to be shut down.
o. Reference:

-> SuperCluster - How to cleanly shutdown and startup an Oracle SuperCluster T4-4 or T5-8 (Doc ID 1487791.1)
-> SuperCluster M6-32 - How to cleanly shutdown and startup an Oracle SuperCluster M6-32 (Doc ID 1674297.1)
-> COMING SOON - SuperCluster - How to cleanly shutdown and startup an Oracle SuperCluster M7/M8 (Doc ID TBA)

NOTE: These documents describe how to cleanly shut down and power off the entire system, then how to restore power and operation.
           In some cases powering off the entire system is not warranted or desired.
           In the case off PCIe HCA card replacement discussed in this document, ONLY the SPARC compute server in which the card to be replaced
           needs to be powered off. This is typically known and understood by Field Services.
         
           DO NOT POWER OFF THE ENTIRE CHASSIS OR THE IB SWITCHES FOR THIS PROCEDURE.

8. Record new component GUID's and replace the card.

NOTE: On all SuperCluster systems with the exception of M7 and M8, the primary control LDOM boot device is on an internal disk.
           On M7/M8 SSC systems a Versaboot image is used which ultimately depends on being able to mount an iSCSI LUN over the IB network.
           Therefor for all SPARC based SuperClusteer systems one should manually record the new port GUID's from the new component or the
           component shipping materials before insertion.

o. The new HCA component will have new GUID's recorded on the card and with the shipping materials.
o. Please make sure the field service engineer records these values before insertion.
o. In all cases one must replace the card following canned action plan specific to the server in question.

9. Update the IB fabric master partition by changing the port GUIDs on the master IB siwtch.

o. Log in to the IB switch running the master subnet manager and change the GUID's of replaced HCA port nodes by following the steps outlined below.

NOTE: The primary IB partition used by SuperCluster LDOMS and zones for iSCSI luns is 0x8503.
           The 'smpartition add' CLI ignores the highest significant bit, and so we actually use the value of '503' in the command:
o. The main command sequence is as follows if working in partition 0x8503:

# smpartition start
# smpartition remove -pkey 503 -port <GUID>

o.For EVERY IB switch port in which the old GUID's appear, REPLACE the old GUID with the new: 

# smpartition add -pkey 503 -port <GUID> (-m full)

o. Then check it to make sure..

# smpartition list modified

o. Then commit it and check again:

# smpartition commit
# smpartiion list active

10. Power on and reboot the primary LDOM

o. Refer to the correct platform's "How to cleanly shutdown and startup an Oracle SuperCluster" document.
o. The goal here is to restore power and bring up the primary LDOM0 of PDOM0 of the compute node and then do some basic sanity checks before restoring normal operation.

11a. Check HCA Firmware.

o. Use the 'fwflash' command to check FW revisions. If there is a missmatch of FW then the new card will need to be updated. In the following example we check to see if FW needs to be updated, and we find that it does:

root@orlt4db02# fwflash -l -c IB | grep revision
Firmware revision : 2.11.2010
Firmware revision : 2.11.2010
Firmware revision : 2.7.8130  <<<<<<<< NEEDS UPDATING
Firmware revision : 2.11.2010
Firmware revision : 2.11.2010
Firmware revision : 2.11.2010
Firmware revision : 2.11.2010
Firmware revision : 2.11.2010
root@orlt4db02#

11b. Flashing HCA Firmware

o. Download the firmware update from MOS patch 16340059 and unzip the file. Three will be a file with a .bin extension. Take the device info from fwflash -c IB -l output and run the fwflash command to update the card. Here is an example:

root@orlt4db02# fwflash -d /devices/pci@440/pci@1/pci@0/pci@c/pciex15b3,673c@0:devctl -f fw-ConnectX2-rel-2_11_2010-be-375-3696-01.bin
The current HCA firmware version is : 2.7.8130
Will be updated to HCA firmware ver of : 2.11.2010
About to update firmware on /devices/pci@440/pci@1/pci@0/pci@c/pciex15b3,673c@0:devctl
with file fw-ConnectX2-rel-2_11_2010-be-375-3696-01.bin.
Do you want to continue? (Y/N): Y
. . . . . . . . . . . . . . . . . . . . +
fwflash: New firmware will be activated after you reboot
root@orlt4db02#

11c. Checking the new HCA card port GUID's and other sanity checks

o. Run 'fwflash -c IB -l' and 'ibstat' again in all LDOMs and IO domains to check that the port GUIDs are correct.

root@orlt4db01:~# fwflash -c IB -l
List of available devices:
Device[0] /devices/pci@400/pci@2/pci@0/pci@2/pciex15b3,673c@0:devctl
Driver mcxnex
Class [IB]
GUID: System Image - 0021280001cee60d
Node Image - 0021280001cee60a
Port 1 - 0010e0000159ee7d <<<<<<<< NEW PORT GUID's
Port 2 - 0010e0000159ee7e <<<<<<<< NEW PORT GUID's
Mac 1 - 0000002128cee63a
Mac 2 - 0000002128cee63b
Firmware revision : 2.7.8130 <<<<<<<< FW version
Product : 375-3697-01 B0
PSID : SUN0160000002
Description : Sun QMirage

12. Restore operation.

o. Before starting zones / apps, etc. the system administrator should verify the system is functioning correctly. Some other suggested actions they can take to verify are to confirm port GUID changes appear in fwflash and ibstat output:

# ibstat
CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 2
        Firmware version: 2.11.1280
        Hardware version: 0
        Node GUID: 0x0010e0000159ee7c
        System image GUID: 0x0010e0000159ee7f
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 5
                LMC: 0
                SM lid: 2
                Capability mask: 0x02514868
                Port GUID: 0x0010e0000159ee7d
                Link layer: IB
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 6
                LMC: 0
                SM lid: 2
                Capability mask: 0x02514868
                Port GUID: 0x0010e0000159ee7e
                Link layer: IB
  • Ensure both Port 1 & Port2:
    • State is "Active"
    • Physical state: "LinkUp"
    • Rate: "40"
    • Port GUID has correct / expected value

13. Verify Infiniband topology.

o. Perform a sanity test on the IB network Most important at this stage are lines 2 and 3 of the output below:

# /opt/oracle.SupportTools/ibdiagtools/verify-topology

        [ DB Machine Infiniband Cabling Topology Verification Tool ]
Every node is connected to two leaf switches in a single rack.......................................................[SUCCESS]
Every inter-leaf switch link is connected correctly in a single rack................................................[SUCCESS]
Every leaf switch in an interconnected quarter rack is correctly connected to other rack in a multi-rack group......[NOT APPLICABLE]
Every leaf switch is connected to every spine switch in a multi-rack group..........................................[NOT APPLICABLE]
Every rack has balanced inter-leaf-and-spine switch links in a multi-rack group.....................................[NOT APPLICABLE]
No spine switch is connected to another spine switch in a multi-rack group..........................................[NOT APPLICABLE]
Every spine switch is connected to two external spine switches in a multi-rack group................................[NOT APPLICABLE]
No external spine switch is connected to a leaf switch in a multi-rack group........................................[NOT APPLICABLE]
No external spine switch is connected to another external spine switch in a multi-rack group........................[NOT APPLICABLE]

o. Ping other nodes over the Infiniband subnet

NOTE: If anything appears other than expected it must be addressed before restarting guest LDOMS/zones//RAC/CRS/databases and applications, etc.

14. DB Node Startup Verification

o. Starting crs, asm, db's and apps where applicable in each zone can be automated. See the "How to cleanly shutdown and startup an Oracle SuperCluster" doc relevant to the system in question.
o. If all is well CRS services and MGMTDB/ASM/RDBMS instances should now be started in DB nodes:
o. Check this from global zone with ps -ef -o zone,comm:

root@orlt5zadm0101:~# zoneadm list
global
orlt5zdbadm010101

root@orlt5zadm0101:~# ps -ef -o zone,comm | grep crs
orlt5zdbadm010101 /u01/app/12.1.0.2/grid/bin/crsd.bin

root@orlt5zadm0101:~# ps -ef | grep smon
root 11832 22264 0 16:46:42 pts/10 0:00 grep smon
root 6251 1 0 Apr 26 ? 596:48 /u01/app/12.1.0.2/grid/bin/osysmond.bin
0001000 6147 1 0 Apr 26 ? 1:47 asm_smon_+ASM1
0001001 4368 1 0 Jun 13 ? 0:09 ora_smon_dbm01z11
0001000 7150 1 0 Apr 26 ? 2:08 mdb_smon_-MGMTDB
root@orlt5zadm0101:~#

o. Application nodes should also be checked to confirm operation.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback