Asset ID: |
1-71-1670096.1 |
Update Date: | 2017-03-21 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1670096.1
:
How to Replace an InfiniBand (HCA) Card on an Exadata Storage Cell and Database Node (X3-2/X4-2)
Related Items |
- SPARC SuperCluster T4-4 Full Rack
- Oracle SuperCluster T5-8 Full Rack
- Exadata X4-2 Hardware
- Exadata X3-2 Quarter Rack
- Oracle SuperCluster T5-8 Half Rack
- Exadata X4-2 Full Rack
- Exadata X4-2 Quarter Rack
- Exadata X3-2 Eighth Rack
- Exadata X3-2 Half Rack
- Exadata X3-2 Full Rack
- Zero Data Loss Recovery Appliance X4 Hardware
- Exadata X3-8 Hardware
- Exadata X4-2 Half Rack
- Solaris Operating System
- SPARC SuperCluster T4-4 Half Rack
- Exadata X3-2 Hardware
- Exadata X3-8b Hardware
- SPARC SuperCluster T4-4
- Oracle SuperCluster T5-8 Hardware
- Oracle SuperCluster M6-32 Hardware
- Exadata X4-2 Eighth Rack
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: internal support doc
Applies to:
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
SPARC SuperCluster T4-4 Full Rack - Version All Versions and later
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
Zero Data Loss Recovery Appliance X4 Hardware - Version All Versions and later
Oracle SuperCluster M6-32 Hardware - Version All Versions and later
x86_64
Goal
How to Replace an InfiniBand (HCA) Card on an Exadata Storage Cell (X3-2/X3-8/X4-2) and Database Node (X3-2/X4-2)
Solution
CAP PROBLEM OVERVIEW: InfiniBand (HCA) Card Replacement
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE ENGINEER NEED:
Exadata Server Training.
TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 3-FRU
FIELD ENGINEER INSTRUCTIONS
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :
If the system is still up and functioning, the customer should perform an orderly and graceful shutdown of applications and OS. Then power off the server and remove the AC power cords from the system.
For an Exadata DB node refer to the steps in:
Doc ID 1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration
For an Exadata Storage Cell refer to:
Doc ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
1. Prepare the server for service.
- Power off the server and disconnect the power cords from the power supplies.
- Extend the server to the maintenance position in the rack.
- Attach an anti-static wrist strap.
- Remove the top cover.
2. Perform the card replacement
For a X3-8 Database node- Use Document 1448314.1.
For a X3-2 or X4-2 Database node-
Locate and Remove the PCIe card.
- There are three external PCIe slots in the system. The external PCIe slots are numbered 1, 2, and 3 from left to right when you view the server from the rear. For the Exadata X3-2 and X4-2 systems the IB card is always installed in PCIe slot 3. Note that this card installed in a riser that also connects to the internal SAS controller so additional steps are needed during the card removal.
- Locate the InfiniBand HCA in PCIe slot 3 and unplug both IB cables from the PCIe card making note of their locations so that they can be re-installed in the same configuration (label if needed).
- Open the green-tabbed latch located on the rear of the server's chassis next to the PCIe slot 3 to release the PCIe card holding bracket.
- To release the riser from the motherboard connector, lift the riser's green-tabbed release lever to the open position.
- Slide the plastic PCIe card retainer, which is mounted on the side of the chassis, forward to release the cards installed in the riser
- Grasp the riser with both hands and remove it from the server
- Disconnect the SAS storage drive (HDD) cables from the internal HBA card installed in PCIe slot 4 Note the cable locations and their order to ensure they are reinstalled in their proper locations.
- Remove the InfiniBand PCIe card from the PCIe riser. Hold the riser in one hand and use your other hand to carefully pull the PCIe card connector out of the riser.
- Disconnect the rear bracket that is attached to the InfiniBand PCIe card from the rear of the PCIe riser.
- Place the riser and the InfiniBand PCIe card on an antistatic mat.
Replace the PCIe card.
- Remove the replacment InfiniBand PCIe card from it's anti-static bag.
- Insert the rear bracket that is attached to the InfiniBand PCIe card into the PCIe riser.
- Hold the riser in one hand and use your other hand to carefully insert the PCIe card connector into the Riser.
- Install the PCIe riser with the installed PCIe cards into the server.
- Reconnect the SAS cable(s) to the internal HBA card. Be sure to connect the SAS cable for storage drives 0 through 3 (HDDs 0-3) to the connector that is farther from the riser in which the HBA card is installed.
- Raise the PCIe riser green-tabbed release lever to the open (up) position and gently press the riser into the motherboard connector until it seats.
- Ensure that the rear bracket on the internal HBA card in PCIe slot 4 is connected to the slot in the server's chassis side wall. If the bracket is not connected, remove the riser and reposition it so that the rear bracket connects to the side wall, then gently press the riser into the motherboard connector.
- Slide the plastic PCIe card retainer that is mounted on the side of the chassis toward the back of the server to secure the card(s) installed in the riser.
- Press the green-tabbed release lever on the PCIe riser to the closed (down) position.
- To secure the PCIe card's rear bracket to the server, close the green-tabbed latch on the rear of the server's chassis.
- Reconnect the IB cables to the PCIe card that were unplugged during the removal procedure making sure to connect them in the same configuration as when they were disconnected.
For a Storage Cell-
Locate and Remove the InfiniBand HCA card.
- The server has six PCIe slots.They are numbered 1 through 6 from left to right when you view the server from the rear. For the Exadata X3-2 and X4-2 systems the IB card is always installed in PCIe slot 3.
- Locate the InfiniBand HCA in PCIe slot 3.
- Unplug both IB cables from the PCIe card making note of their locations so that they can be re-installed in the same configuration (label if needed).
- Rotate the PCIe card locking mechanism, and then lift up on the PCIe card to disengage it from the motherboard connectors.
- Place the PCIe card on an antistatic mat.
Install the replacement PCIe card.
- Remove the replacment InfiniBand PCIe card from it's anti-static bag.
- Make sure to re-install the card into the same location from which the previous card was removed (PCIe Slot 3).
- Insert the PCIe card into slot 3, and rotate the PCIe locking mechanism to secure the PCIe card in place.
- Reconnect the IB cables to the PCIe card that were unplugged during the removal procedure making sure to connect them in the same configuration as when they were disconnected.
3.Return the Server to operation
- Replace the top cover
- Remove any anti-static measures that were used.
- Return the server to it's normal operating position within the rack.
- Re-install the AC power cords and any additional data cables that were removed but not yet replaced.
- Login to the ILOM and check to see if any faults were logged against the PCIe slot. If so clear the fault by logging into the fault managment shell. -
> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
faultmgmtsp> fmadm faulty
...
FRU : /SYS/MB/PCIE3
...
faultmgmtsp> fmadm repaired /SYS/MB/PCIE3
faultmgmtsp> fmadm faulty -a
No faults found
faultmgmtsp> exit
- Power on server. Verify that the Power/OK indicator led lights steady on. Coordinate the power on of the host with the customer.
Note: if the system uses custom non-default InfiniBand partitions then the HCA Port GUIDs might need to be updated in the InfiniBand partition(s) after replacing an HCA. see MOS document 1985159.1 for this procedure if needed.
OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
The system administrator should verify the system is functioning correctly. Some suggested actions they can take to verify are:
1. On the host that the card was replaced run:
# ibstat
CA 'mlx4_0'
CA type: MT26428
Number of ports: 2
Firmware version: 2.7.0
Hardware version: a0
Node GUID: 0x00212800013e6c22
System image GUID: 0x00212800013e6c25
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 26
LMC: 0
SM lid: 10
Capability mask: 0x02510868
Port GUID: 0x00212800013e6c23
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 25
LMC: 0
SM lid: 10
Capability mask: 0x02510868
Port GUID: 0x00212800013e6c24
Ensure both Port 1 & Port2:
State is "Active"
Physical state: "LinkUp"
Rate: "40"
2. Run Verify Infiniband topology (example of fully-operational system):
[root@db01 ~]# /opt/oracle.SupportTools/ibdiagtools/verify-topology
[ DB Machine Infiniband Cabling Topology Verification Tool ]
Is every external switch connected to every internal switch......[SUCCESS]
Are any external switches connected to each other................[SUCCESS]
Are any hosts connected to spine switch..........................[SUCCESS]
Check if all hosts have 2 CAs to different switches..............[SUCCESS]
Leaf switch check: cardinality and even distribution.............[SUCCESS]
Check if each rack has an valid internal ring....................[SUCCESS]
[root@cn01 ibdiagtools]#
For a Quarter Rack or Half Rack you need to use the "-t" option to specify the topology.
Example: ./verify-topology -t quarterrack
Example: ./verify-topology -t halfrack
3. Ping other nodes over the Infiniband subnet
REFERENCE INFORMATION:
Sun Server X3-2 Documentation
http://docs.oracle.com/cd/E22368_01/index.html
Sun Server X3-2L Documentation
http://docs.oracle.com/cd/E23393_01/index.html
Sun Server X4-2 Documentation
http://docs.oracle.com/cd/E36975_01/index.html
Sun Server X4-2L Documentation
http://docs.oracle.com/cd/E36974_01/index.html
How to Replace a Failed InfiniBand (HCA) Card on an Exadata X2-8 Compute Node (Doc ID 1448314.1)
References
<NOTE:1448314.1> - How to Replace a Failed InfiniBand (HCA) Card on an Exadata Compute Node (X2-8/X3-8)
Attachments
This solution has no attachment