Asset ID: |
1-71-1390273.1 |
Update Date: | 2018-04-16 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1390273.1
:
How to Replace a Failed InfiniBand (HCA) Card on a Exalogic Compute Node (Baremetal)
Related Items |
- Exalogic Elastic Cloud X4-2 Eighth Rack
- Exalogic Elastic Cloud X3-2 Quarter Rack
- Exalogic Elastic Cloud X5-2 Hardware
- Exalogic Elastic Cloud X4-2 Full Rack
- Exalogic Elastic Cloud X3-2 Full Rack
- Oracle Exalogic Elastic Cloud X2-2 Qtr Rack
- Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack
- Exalogic Elastic Cloud X5-2 Eighth Rack
- Oracle Exalogic Elastic Cloud X2-2 Half Rack
- Exalogic Elastic Cloud X3-2 Eighth Rack
- Exalogic Elastic Cloud X5-2 Half Rack
- Exalogic Elastic Cloud X4-2 Quarter Rack
- Exalogic Elastic Cloud X4-2 Half Rack
- Oracle Exalogic Elastic Cloud X2-2 Hardware
- Exalogic Elastic Cloud X5-2 Full Rack
- Exalogic Elastic Cloud X5-2 Quarter Rack
- Oracle Exalogic Elastic Cloud X2-2 Full Rack
- Exadata Database Machine X2-2 Hardware
- Oracle Exalogic Elastic Cloud X2-2 Half Rack
- Exalogic Elastic Cloud X3-2 Half Rack
- Exalogic Elastic Cloud X6-2 Hardware
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP
Applies to:
Exalogic Elastic Cloud X3-2 Half Rack - Version X3 to X5 [Release X3 to X5]
Exalogic Elastic Cloud X4-2 Eighth Rack - Version X4 to X5 [Release X4 to X5]
Exalogic Elastic Cloud X5-2 Hardware - Version X5 to X5 [Release X5]
Exalogic Elastic Cloud X5-2 Quarter Rack - Version X5 to X5 [Release X5]
Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack - Version X2 to X5 [Release X2 to X5]
Information in this document applies to any platform.
Goal
HowTo Replace a Failed InfiniBand (HCA) Card on a Exalogic Compute Node.
Solution
DISPATCH INSTRUCTIONS
- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
The FSE needs to be Exalogic Trained.
- TIME ESTIMATE: 60 minutes
- TASK COMPLEXITY: 3
FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
- PROBLEM OVERVIEW:
- WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE
RESOLUTION ACTIVITY?:
The system administrator should prepare the system for service by performing any application related functions required to shutdown the compute node. This might include but is not limited to performing a system backup, failover of application or services, and finally a system shutdown. The Field Service Engineer should work closely with the administrator to also ensure any pre or post work is completed.
If this is a Virtual Deployment, see the following document :
Replacing a faulty InfiniBand HCA in Virtual Exalogic Deployment (Doc ID 1621976.1)
For Physical follow these instructions:
If the InfiniBand HCA Card is still at least partially functional and online it would be advisable to capture the output from 'ibstat' to be used later in the replacement process.
[root@elcn04 MegaCli]# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.7.8130
Hardware version: b0
Node GUID: 0x0021280001a0d70c
System image GUID: 0x0021280001a0d70f
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 43
LMC: 0
SM lid: 10
Capability mask: 0x02510868
Port GUID: 0x0021280001a0d70d
Link layer: IB
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 44
LMC: 0
SM lid: 10
Capability mask: 0x02510868
Port GUID: 0x0021280001a0d70e
Link layer: IB
- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:
Please see the "Servicing PCIe Cards" within the "Sun Fire X4170 M2 Server Service Manual", "Sun Server X3-2 Service Manual", "Sun Server X4-2 Service Manual", "Oracle Server X5-2 Service Manual", "Oracle Server X6-2 Service Manual".
- Power-off the target node for service.
- Pull the InfiniBand Cables from the IB Card at the rear of the server.
- Transition the target node to the service position.
- Remove the top cover.
- Locate and Remove the PCIe Riser that includes the IB Card.
- Remove and Replace the defective IB Card.
- Re-Install the PCIe Riser.
- Install the top cover.
- Reconnect any cables disconnected earlier.
- Slide the node back into the rack operating position.
- Power on the system either via ILOM or via the push button on the front of the server.
OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
- Update IB switch VNIC's with new Port GUID from new HCA card
We will need to update all VNIC's associated with this HCA card by deleting the current VNIC's and recreating the VNIC's with the updated Port GUID for the HCA Card. There will normally be a minimum of at least two VNIC's (one per connected IBGW switch), but there can be multiple VNIC's as well.
- Determine HCA card information with 'ibstat'.
[root@elcn04]# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.7.8130
Hardware version: b0
Node GUID: 0x0021280001a0d70c
System image GUID: 0x0021280001a0d70f
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 43
LMC: 0
SM lid: 10
Capability mask: 0x02510868
Port GUID: 0x0021280001a0d571
Link layer: IB
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 44
LMC: 0
SM lid: 10
Capability mask: 0x02510868
Port GUID: 0x0021280001a0d572
Link layer: IB
Note: The GUID values here differ from the values collected prior to replacement, if this information was collected.
- Get IB switch connection information from 'iblinkinfo.pl'.
[root@elcn04]# iblinkinfo.pl -R | grep elcn04
10 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 44 2[ ] "elcn04 EL-C 192.168.10.4 HCA-1" ( )
1 7[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 43 1[ ] "elcn04 EL-C 192.168.10.4 HCA-1" ( )
Note: So lid's 10 & 1 are for the switches connected to each port.
- Get IB switch information from 'ibswitches'.
[root@elcn04]# ibswitches
Switch : 0x002128568dc2c0a0 ports 36 "SUN IB QDR GW switch el-gw02 192.168.1.202" enhanced port 0 lid 10 lmc 0
Switch : 0x002128568c62c0a0 ports 36 "SUN IB QDR GW switch el-gw01 192.168.1.201" enhanced port 0 lid 1 lmc 0
Note: Now we know which two switches must be modified to fix the Port GUID information.
- Determine VNIC's associated with the HCA card using 'showvnics'. This step must be completed for both connected IB GW Switches.
[root@el-gw01]# showvnics
ID STATE FLG IOA_GUID NODE IID MAC VLN PKEY GW
--- -------- --- ----------------------- ---------- ---- ----------------- --- ---- --------
.
.
13 UP N 0x0021280001a0d70d elcn04 EL-C 192.168.10.4 0000 A0:D7:0D:10:00:04 NO ffff 0A-ETH-1
.
.
Note: This information above is abbreviated and only include the specific information for CN elcn04.
- Delete current VNIC's associated with the HCA card. This step must be completed for both connected IB GW Switches.
[root@el-gw01]# deletevnic 0A-ETH-1 3
vNIC ID 3 deleted
IO Adapter for vNIC deleted
- Create new VNIC's with new Port GUID. This step must be completed for both connected IB GW Switches.
[root@el-gw01]# createvnic 0A-ETH-1 -GUID 00:21:28:00:01:a0:d5:71 -MAC A0:D7:0D:10:00:04 -PKEY DEFAULT
vNIC created
Notes:
- The Port GUID here is now from the new HCA Card.
- In the above 'createvnic' command the MAC was unchanged from before 'deletevnic'. This also means that the MAC is no longer standard as compared to the EIS Checklist.
- If you want to fully adhere to the EIS Checklist for VNIC MAC addresses you must also change the MAC address in the 'createvnic' command above to "A0:D5:71:10:00:04". If this is done, you must also modify the MAC in the corresponding "/etc/sysconfig/network-scripts/ifcfg-ethX" file.
- If the server is not able to fully communicate over several of the Exalogic networks shown here
http://docs.oracle.com/cd/E18476_01/doc.220/e25258/netw.htm#BABEDEDH
You may need to run a script to replace the OLD_GUID, see script here (Script to replace OLD_GUID[12] to NEW_GUID[12] in partition table. Handles HCA replacement on an OVS compute node)
This script will need edited
OLD_GUID[12]: Find 2 GUIDs in /conf/partitions.current with both membership in 0x8001-0x8005 that do not match ibstat port GUIDs on any compute node
NEW_GUID[12]: Use port GUIDs from ibstat on compute node with replaced HCA
- Verify VNIC creation and test (ping default router which should be setup for the EoIB network). This step must be completed for both connected IB GW Switches.
[root@el-gw01]# showvnics
ID STATE FLG IOA_GUID NODE IID MAC VLN PKEY GW
--- -------- --- ----------------------- ---------- ---- ----------------- --- ---- --------
.
.
13 UP N 0x0021280001a0d571 elcn04 EL-C 192.168.10.4 0000 A0:D7:0D:10:00:04 NO ffff 0A-ETH-1
.
.
[root@elcn04 tools]# ping <default-router>
If a system uses custom non-default InfiniBand partitions [e.g., Exalogic (virtual/physical/hybrid), Exadata (virtual/physical), SuperCluster, BDA] thenthe HCA Port GUIDs might need to be updated in the InfiniBand partition(s) after replacing an HCA. See MOS Note 1985159.1
- Run CheckHWnFWProfile - Mainly checking IB connectivity & HCA card firware level, but this is a good overall status check for the compute node.
# /opt/exalogic.tools/tools/CheckHWnFWProfile
Verifying Hardware...
System product name: SUN FIRE X4170 M2 SERVER
System product manufacturer: SUN MICROSYSTEMS
.
.
Verifying InfiniBand devices...
Has required number of Infiniband devices
Infiniband device id: 19:00.0
Infiniband device width: 8
.
.
Supported Infiniband Firmware Version: 2.7.8130
Current Infiniband Firmware Version : 2.7.8130
Infiniband Firmware is at the supported version.
.
.
- Test InfiniBand connectivity
# cd /opt/exalogic.tools/tools/check_ib_ports
Has required ports.
Port 1 is in ACTIVE state
Port 2 is in ACTIVE state
You can also run and review output from 'inifinicheck' and or 'infinicheck-node' (see Example below).
- Optional, for OEL Linux, you can force a failover of the slave interface for bond0 and test/verify IP connectivity.
- Verify network using 'ping' or 'infinicheck' (see infinicheck example below).
# ping <IP>
Where <IP> = default router or another node on the same network
- Determine active interface for bond0
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: ib1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000
Slave Interface: ib0
MII Status: up
Link Failure Count: 2
Permanent HW addr: 80:00:00:4a:fe:80
Slave Interface: ib1
MII Status: up
Link Failure Count: 2
Permanent HW addr: 80:00:00:4b:fe:80
- Shutdown currently active interface
# ifconfig ib1 down
- Verify interface failover
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000
Slave Interface: ib0
MII Status: up
Link Failure Count: 2
Permanent HW addr: 80:00:00:4a:fe:80
Slave Interface: ib1
MII Status: down
Link Failure Count: 3
Permanent HW addr: 80:00:00:4b:fe:80
- Test IP connectivity with either 'ping' or 'infinicheck-node'
# ping <IP>
Where IP = A valid IP on the InfiniBand network, you can get this information from '/usr/sbin/ibhosts'
# /opt/exalogic.tools/tools/inifinicheck-node -H <IP-list> -S <IP-list>
-H = comma seperated list of IP's for Compute Nodes
-S = comma seperated list of IP's for Storage Nodes
Note: You can use '/usr/sbin/ibhosts' to obtain a list of valid host and IP's for the InfiniBand network.
Example:
/usr/sbin/infinicheck-node -H 192.168.10.1,192.168.10.4,192.168.10.5,192.168.10.6 -S 192.168.10.15,192.168.10.16
[SUCCESS]........Ports verification on 192.168.8.2 succeeded
(0 seconds approx.)
Has required number of ports - 2
All ports are in ACTIVE state
[WARNING]........ibping_test failed on first attempt. Retrying...
[SUCCESS]........Connectivity from 192.168.8.2 verified
(2 seconds approx.)
can talk to all nodes
can talk to all storage nodes
- Enable previous disabled interface and verify operation
# ifconfig ib1
# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008)
Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
Primary Slave: None
Currently Active Slave: ib0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 5000
Down Delay (ms): 5000
Slave Interface: ib0
MII Status: up
Link Failure Count: 2
Permanent HW addr: 80:00:00:4a:fe:80
Slave Interface: ib1
MII Status: up
Link Failure Count: 3
Permanent HW addr: 80:00:00:4b:fe:80
- Optional, for Solaris 11, you can force a IPMP interface failover and test/verify IP connectivity.
- Verify network using 'ping' or 'infinicheck' (see example above).
# ping <IP>
Where <IP> = default router or another node on the same network
- Determine Currently active interface using 'ipmpstat'.
# ipmpstat -i
INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE
net1 yes ipmp0 ------- up disabled ok
net0 yes ipmp0 --mbM-- up disabled ok
- Disable currently active interace using 'ipadm'.
# ipadm disable-if -t net0
- Verify failover and test IP connectivity with 'ping' or 'infinicheck' (see example above).
# ipmpstat -i
INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE
net1 yes ipmp0 --mbM-- up disabled ok
# ping <IP>
Where <IP> = default router or another node on the same network
- Enable previously disabled interface and verify active with 'ipmpstat'.
# ipadm enable -t net0
# ipmpstat -i
INTERFACE ACTIVE GROUP FLAGS LINK PROBE STATE
net0 yes ipmp0 ------- up disabled ok
net1 yes ipmp0 --mbM-- up disabled ok
- Verify network using 'ping' or 'infinicheck' (see example above).
# ping <IP>
Where <IP> = default router or another node on the same network
Upgrading the HCA Card
HCA firmwares on Linux OS can be updated via a mellanox tool. It is called “flint” or “mstflint”
depending on which one is bundled in the OS image.
Upgrading may be required where HCA cards are shipped with old firmware - having old firmware may
prevent the node connecting to infiniband.
Process :
mstflint -y -d "$ib_drive" -i "$ib_fw_binary" -allow_psid_change -nofs burn
We get ib_drive value like this
"lspci -d 15b3: | sed -e 's/:/\//g' | cut -d ' ' -f 1".
Usually it returns one value which you append to "/proc/bus/pci/".
If more than one value returned then take the first value
"lspci -d 15b3: | sed -e 's/:/\//g' | cut -d ' ' -f 1 | head -1"
e.g.:
[root@myrackcn_30 ~]# lspci -d 15b3: | sed -e 's/:/\//g' | cut -d ' ' -f 1 | head -1
30/00.0
so ib_drive is
[root@myrack_cn30 ~]# ls /proc/bus/pci/30/00.0
/proc/bus/pci/30/00.0
NOTE ifconfig down all ib0 and ib1 devices prior to update (and then backup)
example:
ifconfig ib0 down
ifconfig ib1 down
mstflint -y -d /proc/bus/pci/30/00.0 -i /opt/exalogic/firmware/infiniband/rel-2_11_2010.bin -allow_psid_change -nofs burn
ifconfig ib0 up
ifconfig ib1 up
you can get the binary like this
===========
To upgrade the firmware on Solaris, use the 'fwflash' command. The following example demonstrates how to upgrade the firmware to 2.11.2010.
1) Run command '/opt/exalogic.tools/tools/CheckHWnFWProfile' to check what Infiniband Firmware version is supported by the existing exalogic.
If the newly replaced HCA firmware is not at a supported version, you will see some output like below:
Supported Infiniband Firmware Version: 2.11.2010
Current Infiniband Firmware Version : 2.9.1000
Infiniband Firmware is not at the supported version. It requires firmware update...
Supported Infiniband Firmware at: /opt/exalogic/firmware/infiniband/rel-2_11_2010.bin <<<
2) From the above output, you will see the location of Supported Infiniband Firmware, note down the location.
3) Run following command to upgrade the firmware
IB_DRIVE='/usr/sbin/fwflash -c IB -l | grep Device | tr -s "\t " " " | cut -d' ' -f2'
CX_PATH=/opt/exalogic/firmware/infiniband/rel-2_11_2010.bin (Here the firmware file is got from step 2)
/usr/sbin/fwflash -y -d ${IB_DRIVE} -f ${CX_PATH}
PARTS NOTE:
https://support.us.oracle.com/handbook_internal/Systems/Exalogic_X2_2/components.html
REFERENCE INFORMATION:
Exalogic Machine Owner's Guide: https://docs.oracle.com/cd/E18476_01/index.htm
Sun Fire X4170 M2 Server Service Manual: https://docs.oracle.com/cd/E19762-01/E22369-02/E22369-02.pdf
How to Remove and Replace a X4170/X4270 PCI Card:ATR:1936:0: DocumentID 1347366.1
Attachments
This solution has no attachment