![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Technical Instruction Sure Solution 2044499.1 : How to Replace a Infiniband HCA Card in Oracle SuperCluster Compute / DB nodes
Replacement of HCA IB network port cards in SuperCluster requires additional steps to reconfigure the HCA port GUID's in the Infiniband Fabric. In this Document
Applies to:Oracle SuperCluster T5-8 Hardware - Version All Versions and laterSPARC SuperCluster T4-4 Half Rack - Version All Versions and later Oracle SuperCluster T5-8 Full Rack - Version All Versions and later Oracle SuperCluster T5-8 Half Rack - Version All Versions and later SPARC SuperCluster T4-4 Full Rack - Version All Versions and later Oracle Solaris on SPARC (64-bit) SPARC Goal
Replacement of HCA IB network port cards in SuperCluster requires additional steps to ensure the new component has the correct firmware installed and to reconfigure the HCA port GUID's in the Infiniband Fabric. Failure to do so leaves the system in a dysfunctional state and causes further downtime for the customer. This document describes how to replace a faulty Infiniband card in SuperCluster Compute and Database Nodes and ensure that the FW is updated and the port GUID's are correct. SolutionWHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?: Hot swap of PCI adapters is not currently supported in any SuperCluster system models, primarliry because we have a High Availability cluster set up and it should be possible to fail over all apps, etc. to another server node. The server that contains the faulty Infiniband HCA card should have its services offline and the server itself powered off. WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?: The instructions below assume the customer DBA is available and working with the field engineer onsite to manage the host OS and DB/ASM services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer DBA wants or allows or needs help with any of the steps. Further more it is up to the customer system administrators to make these configuration changes. Customers with Platinum Service may request remote assistace from Oracle Support. NOTE: This process may require a IB card firmware update. SUNWfwflash package is required and is installed by default on SuperCluster systems.
Restoring this package should it have been removed is beyond the scope of this document. ProcessSteps involved to replace a card are as follows and assume the customer has opened a service request for the service action: 1. Identify failing card from fmadm or explorer analysiso. The first step is to identidy the correct component from available explorer data and / or Solaris Fault Management (FMA). Reference the following document for further assistance: -> How to identify Infiniband cards on Oracle SuperCluster (Doc ID 2021618.1) 2. Collect firmware and port GUID informationo. On the ldom containing the hardware, run fwflash -c IB -l command to gather *all* IB card and port GUID details and post these details in the SR root@orlt4db01:~# fwflash -c IB -l
List of available devices: Device[0] /devices/pci@400/pci@2/pci@0/pci@2/pciex15b3,673c@0:devctl Driver mcxnex Class [IB] GUID: System Image - 0021280001cee60d Node Image - 0021280001cee60a Port 1 - 0021280001cee60b <<<<<<<< PORT GUID's Port 2 - 0021280001cee60c <<<<<<<< PORT GUID's Mac 1 - 0000002128cee60a Mac 2 - 0000002128cee60b Firmware revision : 2.11.2010 <<<<<<< FW version Product : 375-3697-01 B0 PSID : SUN0160000002 Description : Sun QMirage 3. Determine which IB switch is master.o. The following demonstrates how to build a file with a list of switches and run 'getmaster' from one node on all switches using 'dcli': root@orlt4db02:~# grep sw /etc/hosts | grep ib
10.141.177.146 orlt4sw-ib1.us.oracle.com orlt4sw-ib1 10.141.177.147 orlt4sw-ib2.us.oracle.com orlt4sw-ib2 10.141.177.148 orlt4sw-ib3.us.oracle.com orlt4sw-ib3 root@orlt4db02:~# grep sw /etc/hosts | grep ib | awk '{print $3}' >> sw root@orlt4db02:~# export PATH=$PATH:/opt/oracle.supercluster/bin root@orlt4db02:~# dcli -g sw -l root getmaster Unable to connect to cells: ['orlt4sw-ib1'] orlt4sw-ib2: Local SM enabled and running, state MASTER orlt4sw-ib2: 20170526 13:47:47 Master SubnetManager on sm lid 5 sm guid 0x21284694a9a0a0 : SUN DCS 36P QDR orlt4sw-ib2 10.141.177.147 orlt4sw-ib3: Local SM enabled and running, state STAND BY orlt4sw-ib3: 20170526 13:47:11 Master SubnetManager on sm lid 5 sm guid 0x21284694a9a0a0 : SUN DCS 36P QDR orlt4sw-ib2 10.141.177.147 root@orlt4db02:~# 4. Collect IB partition datao. On the Master switch run smpartition list active and retain the output: [root@orlt4sw-ib2 ~]# smpartition list active
# Sun DCS IB partition config file # This file is generated, do not edit #! version_number : 57 Default=0x7fff, ipoib : ALL_CAS=full, ALL_SWITCHES=full, SELF=full; SUN_DCS=0x0001, ipoib : ALL_SWITCHES=full; ic1s10 = 0x0501,ipoib,defmember=full: 0x0021280001cee61b, 0x0021280001cf023b; ic2s10 = 0x0502,ipoib,defmember=full: 0x0021280001cee61c, 0x0021280001cf023c; sto = 0x0503,ipoib,defmember=full: 0x0021280001cee60c, <<<< BAD CARD GUID 0x0021280001cee60b, <<<< BAD CARD GUID 0x0021280001cee61c, 0x0021280001cee61b, 0x0021280001cee604, 0x0021280001cee603, 0x0021280001cee6f0, 0x0021280001cee6ef, 0x0021280001cebd44, 0x0021280001cebd43, 0x0021280001cf023c, 0x0021280001cf023b, 0x0021280001cec2d4, 0x0021280001cec2d3, 0x0021280001cee51c, 0x0021280001cee51b, 0x0021280001ced843, 0x0021280001ced844, 0x0021280001cf1a23, 0x0021280001cf1a24; ic1s11 = 0x0511,ipoib,defmember=full: 0x0021280001cee603, 0x0021280001cec2d3; ic2s11 = 0x0512,ipoib,defmember=full: 0x0021280001cee604, 0x0021280001cec2d4; 5. Collect ibstat(1M)o. Before removing the old HCA, run the following command in the LDOM to which the card being replaced belongs: root@orlt4db01:~# ibstat
CA 'mlx4_0' CA type: MT26428 Number of ports: 2 Firmware version: 2.11.2010 Hardware version: 176 Node GUID: 0x0021280001cee60a System image GUID: 0x0021280001cee60d Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 16 LMC: 0 SM lid: 5 Capability mask: 0x02100000 Port GUID: 0x0021280001cee60b <<<< BAD CARD GUID Link layer: IB Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 17 LMC: 0 SM lid: 5 Capability mask: 0x02100000 Port GUID: 0x0021280001cee60c <<<< BAD CARD GUID Link layer: IB NOTE: If there are any IO domains (guest LDOMS created by SuperCluster Virtual Assistant / IO Domain creation tool) that have Virtual functions created
on the IB HCA being replaced, this process should not affect them, as the virtual functions retreive the VF GUIDs from the node which provides them. 6. Ordering parts.o. Hardware TSC team needs to make sure they order the *correct version* of the card. WARNING: SuperCluster Systems are built with Mellanox Infiniband HCA's. The Solaris driver is mcxnex. Currently we have a type M2 and a type M3 card.
ALL IB HCA CARDS IN THE FABRIC MUST BE THE SAME REVISION AND AT THE SAME FW LEVEL. 7. Preparing the servero. During outage window the physical hardware must be shut down and powered off (i.e., stop /HOST has to be issued from the ILOM shell). -> SuperCluster - How to cleanly shutdown and startup an Oracle SuperCluster T4-4 or T5-8 (Doc ID 1487791.1) NOTE: These documents describe how to cleanly shut down and power off the entire system, then how to restore power and operation.
In some cases powering off the entire system is not warranted or desired. In the case off PCIe HCA card replacement discussed in this document, ONLY the SPARC compute server in which the card to be replaced needs to be powered off. This is typically known and understood by Field Services. DO NOT POWER OFF THE ENTIRE CHASSIS OR THE IB SWITCHES FOR THIS PROCEDURE. 8. Record new component GUID's and replace the card.NOTE: On all SuperCluster systems with the exception of M7 and M8, the primary control LDOM boot device is on an internal disk.
On M7/M8 SSC systems a Versaboot image is used which ultimately depends on being able to mount an iSCSI LUN over the IB network. Therefor for all SPARC based SuperClusteer systems one should manually record the new port GUID's from the new component or the component shipping materials before insertion. o. The new HCA component will have new GUID's recorded on the card and with the shipping materials. 9. Update the IB fabric master partition by changing the port GUIDs on the master IB siwtch.o. Log in to the IB switch running the master subnet manager and change the GUID's of replaced HCA port nodes by following the steps outlined below. NOTE: The primary IB partition used by SuperCluster LDOMS and zones for iSCSI luns is 0x8503.
The 'smpartition add' CLI ignores the highest significant bit, and so we actually use the value of '503' in the command: o. The main command sequence is as follows if working in partition 0x8503:
# smpartition start # smpartition remove -pkey 503 -port <GUID> o.For EVERY IB switch port in which the old GUID's appear, REPLACE the old GUID with the new: # smpartition add -pkey 503 -port <GUID> (-m full) o. Then check it to make sure.. # smpartition list modified o. Then commit it and check again: # smpartition commit # smpartiion list active 10. Power on and reboot the primary LDOMo. Refer to the correct platform's "How to cleanly shutdown and startup an Oracle SuperCluster" document. 11a. Check HCA Firmware.o. Use the 'fwflash' command to check FW revisions. If there is a missmatch of FW then the new card will need to be updated. In the following example we check to see if FW needs to be updated, and we find that it does: root@orlt4db02# fwflash -l -c IB | grep revision
Firmware revision : 2.11.2010 Firmware revision : 2.11.2010 Firmware revision : 2.7.8130 <<<<<<<< NEEDS UPDATING Firmware revision : 2.11.2010 Firmware revision : 2.11.2010 Firmware revision : 2.11.2010 Firmware revision : 2.11.2010 Firmware revision : 2.11.2010 root@orlt4db02# 11b. Flashing HCA Firmwareo. Download the firmware update from MOS patch 16340059 and unzip the file. Three will be a file with a .bin extension. Take the device info from fwflash -c IB -l output and run the fwflash command to update the card. Here is an example: root@orlt4db02# fwflash -d /devices/pci@440/pci@1/pci@0/pci@c/pciex15b3,673c@0:devctl -f fw-ConnectX2-rel-2_11_2010-be-375-3696-01.bin
The current HCA firmware version is : 2.7.8130 Will be updated to HCA firmware ver of : 2.11.2010 About to update firmware on /devices/pci@440/pci@1/pci@0/pci@c/pciex15b3,673c@0:devctl with file fw-ConnectX2-rel-2_11_2010-be-375-3696-01.bin. Do you want to continue? (Y/N): Y . . . . . . . . . . . . . . . . . . . . + fwflash: New firmware will be activated after you reboot root@orlt4db02# 11c. Checking the new HCA card port GUID's and other sanity checkso. Run 'fwflash -c IB -l' and 'ibstat' again in all LDOMs and IO domains to check that the port GUIDs are correct. root@orlt4db01:~# fwflash -c IB -l 12. Restore operation.o. Before starting zones / apps, etc. the system administrator should verify the system is functioning correctly. Some other suggested actions they can take to verify are to confirm port GUID changes appear in fwflash and ibstat output: # ibstat
CA 'mlx4_0' CA type: MT4099 Number of ports: 2 Firmware version: 2.11.1280 Hardware version: 0 Node GUID: 0x0010e0000159ee7c System image GUID: 0x0010e0000159ee7f Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 5 LMC: 0 SM lid: 2 Capability mask: 0x02514868 Port GUID: 0x0010e0000159ee7d Link layer: IB Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 6 LMC: 0 SM lid: 2 Capability mask: 0x02514868 Port GUID: 0x0010e0000159ee7e Link layer: IB
13. Verify Infiniband topology.o. Perform a sanity test on the IB network Most important at this stage are lines 2 and 3 of the output below: # /opt/oracle.SupportTools/ibdiagtools/verify-topology [ DB Machine Infiniband Cabling Topology Verification Tool ] o. Ping other nodes over the Infiniband subnet NOTE: If anything appears other than expected it must be addressed before restarting guest LDOMS/zones//RAC/CRS/databases and applications, etc. 14. DB Node Startup Verificationo. Starting crs, asm, db's and apps where applicable in each zone can be automated. See the "How to cleanly shutdown and startup an Oracle SuperCluster" doc relevant to the system in question. root@orlt5zadm0101:~# zoneadm list
global orlt5zdbadm010101 root@orlt5zadm0101:~# ps -ef -o zone,comm | grep crs orlt5zdbadm010101 /u01/app/12.1.0.2/grid/bin/crsd.bin root@orlt5zadm0101:~# ps -ef | grep smon root 11832 22264 0 16:46:42 pts/10 0:00 grep smon root 6251 1 0 Apr 26 ? 596:48 /u01/app/12.1.0.2/grid/bin/osysmond.bin 0001000 6147 1 0 Apr 26 ? 1:47 asm_smon_+ASM1 0001001 4368 1 0 Jun 13 ? 0:09 ora_smon_dbm01z11 0001000 7150 1 0 Apr 26 ? 2:08 mdb_smon_-MGMTDB root@orlt5zadm0101:~# o. Application nodes should also be checked to confirm operation. Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||
|