Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1624384.1
Update Date:2018-01-22
Keywords:

Solution Type  Technical Instruction Sure

Solution  1624384.1 :   How to replace a PCA X5-2, X4-2, X3-2 management or compute node InfiniBand (HCA) card  


Related Items
  • Oracle Virtual Compute Appliance X4-2 Hardware
  •  
  • Private Cloud Appliance X6-2 Server Upgrade
  •  
  • Oracle Fabric Interconnect F1-15
  •  
  • Oracle Virtual Compute Appliance X3-2 Hardware
  •  
  • Private Cloud Appliance X7-2 Server Upgrade
  •  
  • Private Cloud Appliance X5-2 Hardware
  •  
  • Private Cloud Appliance
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP

Applies to:

Oracle Virtual Compute Appliance X3-2 Hardware - Version All Versions and later
Oracle Virtual Compute Appliance X4-2 Hardware - Version All Versions and later
Private Cloud Appliance X5-2 Hardware - Version All Versions and later
Private Cloud Appliance - Version 2.0.2 and later
Private Cloud Appliance X6-2 Server Upgrade - Version All Versions and later
Information in this document applies to any platform.

Goal

How to Replace a Failed InfiniBand (IB-HCA) Card on an PCA compute or management node.

Solution

DISPATCH INSTRUCTIONS
- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:

The FSE needs to be OVCA Trained.

- TIME ESTIMATE: 60 minutes
- TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
- PROBLEM OVERVIEW: An InfiniBand HCA has failed in an OVCA Compute or Management Node.
- WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE
RESOLUTION ACTIVITY?:

The system administrator should prepare the system for service by performing any application related functions required to shutdown the compute or storage node. This might include but is not limited to performing a system backup, failover of application or services, and finally a system shutdown.

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

  1. If the server is a compute node, put the server in Maintenance Mode.  Make sure the customer has prepared the node for service.
    If needed, use Steps to Gracefully Shutdown and Power Off a Node in Oracle Private Cloud Appliance Prior to Maintenance (Doc ID 2256834.1)

     

  2. Pull out the stabilizing bars before pulling out any server for service.
  3. Power-off the target node for service.
  4. Transition the target node to the service position.
  5. Detach the power cords from the node.
  6. Pull the InfiniBand Cables from the IB Card at the rear of the server.
  7. Remove the top cover.
  8. Locate and Remove the PCIe Riser that includes the IB Card.
  9. Remove and Replace the defective IB Card
  10. Re-Install the PCIe Riser.
  11. Install the top cover.
  12. Reconnect any cables disconnected earlier.
  13. Slide the node back into the rack operating position.
  14. Retract the stabilizing bars.
  15. Power on the system either via ILOM or via the push button on the front of the server.
  16. Continue the next section "Fabric Interconnect Reconfiguration" to reconfigure both F1-15 switches.
  17. After both switches have been reconfigured, if the node is a compute node, take it out of maintenance mode.

           The customer will need to follow the steps under "Returning the node to operation" in the same MOS note:
Steps to Gracefully Shutdown and Power Off a Node in Oracle Private Cloud Appliance Prior to Maintenance (Doc ID 2256834.1)

Fabric Interconnect Reconfiguration

In a meshed Fabric Interconnect environment, you would terminate the server-profile to one HCA port GUID, and then on the other Fabric Interconnect you would terminate the HCA Port GUID that wasn’t used on the other Fabric Interconnect.  Meaning a dual ported HCA has three GUIDs, the HCA GUID itself, and each port on the HCA has a unique GUID.   Each server-profile per Fabric Interconnect needs to be terminated to a unique HCA port GUID, *not the same HCA Port GUID* on both Fabric Interconnects.

After the node up, you will need to reconfigure both F1-15 switches in the rack. 
The following example shows the process when the IB HCA PCI card was replaced in ovcacn26r1.

In the examples that follow, you will see two different ovcacn26r1 connection names:

  ovcacn26r1@ExtSw-2128f56755a0a0-Port31         Connection to Linux/2.6.39-300.32.5.el5uek/x86_64 host ovcacn26r1 (up)
  ovcacn26r1@ExtSw-2128f569d0a0a0-Port31         Connection to Linux/2.6.39-300.32.5.el5uek/x86_64 host ovcacn26r1 (up)

You will use one connection name when you configure the first switch, and use the other connection name when you configure the second switch.

From the management node, you will ssh to both switches as shown below.

[root@ovcamn05r1 ~]# ssh 192.168.4.204 -l admin

admin@ovcann15r1[xsigo] show server-profile
name              state          descr        connection                                    def-gw        vnics        vhbas        
-------------------------------------------------------------------------------------------------------------------------------------
ovcacn07r1        up/up                       ovcacn07r1@ExtSw-2128f56755a0a0-Port6                       4            0            
ovcacn08r1        up/up                       ovcacn08r1@ExtSw-2128f56755a0a0-Port5                       4            0            
ovcacn09r1        up/up                       ovcacn09r1@ExtSw-2128f56755a0a0-Port8                       4            0            
ovcacn10r1        up/up                       ovcacn10r1@ExtSw-2128f569d0a0a0-Port7                       4            0            
ovcacn11r1        up/up                       ovcacn11r1@ExtSw-2128f56755a0a0-Port10                      4            0            
ovcacn12r1        up/up                       ovcacn12r1@ExtSw-2128f56755a0a0-Port9                       4            0            
ovcacn13r1        up/up                       ovcacn13r1@ExtSw-2128f56755a0a0-Port12                      4            0            
ovcacn14r1        up/up                       ovcacn14r1@ExtSw-2128f56755a0a0-Port11                      4            0            
ovcacn26r1        up/down                     10e0000128db69                                              4            0            
ovcacn27r1        up/up                       ovcacn27r1@ExtSw-2128f56755a0a0-Port34                      4            0            
ovcacn28r1        up/up                       ovcacn28r1@ExtSw-2128f56755a0a0-Port33                      4            0            
ovcacn29r1        up/up                       ovcacn29r1@ExtSw-2128f569d0a0a0-Port36                      4            0            
ovcacn30r1        up/up                       ovcacn30r1@ExtSw-2128f569d0a0a0-Port35                      4            0            
ovcacn31r1        up/up                       ovcacn31r1@ExtSw-2128f569d0a0a0-Port29                      4            0            
ovcacn32r1        up/up                       ovcacn32r1@ExtSw-2128f569d0a0a0-Port30                      4            0            
ovcacn33r1        up/up                       ovcacn33r1@ExtSw-2128f569d0a0a0-Port27                      4            0            
ovcacn34r1        up/up                       ovcacn34r1@ExtSw-2128f569d0a0a0-Port28                      4            0            
ovcamn05r1        up/up                       ovcamn05r1@ExtSw-2128f56755a0a0-Port4                       4            0            
ovcamn06r1        up/up                       ovcamn06r1@ExtSw-2128f56755a0a0-Port3                       4            0            
19 records displayed


admin@ovcann15r1[xsigo] set server-profile ovcacn26r1 disconnect

Disconnecting active servers will stop all virtual I/O on the physical server.  Are you sure you want to disconnect ovcacn26r1
(y/n)?y

admin@ovcann15r1[xsigo] set server-profile ovcacn26r1 connect [press tab key twice]
[make sure there is a space after the word "connect" then press the tab key twice to see the list below]
The output list is long, it has been truncated here.

Possible completions:
  hca-10e0000128f050@ExtSw-2128f56755a0a0-Port2  Connection to host hca-10e0000128f050 (up)
[snip]
  ovcacn26r1@ExtSw-2128f56755a0a0-Port31         Connection to Linux/2.6.39-300.32.5.el5uek/x86_64 host ovcacn26r1 (up)
  ovcacn26r1@ExtSw-2128f569d0a0a0-Port31         Connection to Linux/2.6.39-300.32.5.el5uek/x86_64 host ovcacn26r1 (up)
[snip]

Use the first ovcacn26r1 connection name:

admin@ovcann15r1[xsigo] set server-profile ovcacn26r1 connect ovcacn26r1@ExtSw-2128f56755a0a0-Port31

Check that the node shows up/up

admin@ovcann15r1[xsigo] show server-profile
name              state        descr        connection                                    def-gw        vnics        vhbas        
-------------------------------------------------------------------------------------------------------------------------------------
ovcacn07r1        up/up                     ovcacn07r1@ExtSw-2128f56755a0a0-Port6                       4            0            
ovcacn08r1        up/up                     ovcacn08r1@ExtSw-2128f56755a0a0-Port5                       4            0            
ovcacn09r1        up/up                     ovcacn09r1@ExtSw-2128f56755a0a0-Port8                       4            0            
ovcacn10r1        up/up                     ovcacn10r1@ExtSw-2128f569d0a0a0-Port7                       4            0            
ovcacn11r1        up/up                     ovcacn11r1@ExtSw-2128f56755a0a0-Port10                      4            0            
ovcacn12r1        up/up                     ovcacn12r1@ExtSw-2128f56755a0a0-Port9                       4            0            
ovcacn13r1        up/up                     ovcacn13r1@ExtSw-2128f56755a0a0-Port12                      4            0            
ovcacn14r1        up/up                     ovcacn14r1@ExtSw-2128f56755a0a0-Port11                      4            0            
ovcacn26r1        up/up                     ovcacn26r1@ExtSw-2128f56755a0a0-Port31                      4            0            
ovcacn27r1        up/up                     ovcacn27r1@ExtSw-2128f56755a0a0-Port34                      4            0            
ovcacn28r1        up/up                     ovcacn28r1@ExtSw-2128f56755a0a0-Port33                      4            0            
ovcacn29r1        up/up                     ovcacn29r1@ExtSw-2128f569d0a0a0-Port36                      4            0            
ovcacn30r1        up/up                     ovcacn30r1@ExtSw-2128f569d0a0a0-Port35                      4            0            
ovcacn31r1        up/up                     ovcacn31r1@ExtSw-2128f569d0a0a0-Port29                      4            0            
ovcacn32r1        up/up                     ovcacn32r1@ExtSw-2128f569d0a0a0-Port30                      4            0            
ovcacn33r1        up/up                     ovcacn33r1@ExtSw-2128f569d0a0a0-Port27                      4            0            
ovcacn34r1        up/up                     ovcacn34r1@ExtSw-2128f569d0a0a0-Port28                      4            0            
ovcamn05r1        up/up                     ovcamn05r1@ExtSw-2128f56755a0a0-Port4                       4            0            
ovcamn06r1        up/up                     ovcamn06r1@ExtSw-2128f56755a0a0-Port3                       4            0            
19 records displayed


admin@ovcann15r1[xsigo] exit

Perform the same process on the other F1-15 at 192.168.4.205 this time using the other ovcacn26r1 connection name.

[root@ovcamn05r1 ~]# ssh 192.168.4.205 -l admin

admin@ovcann22r1[xsigo] show server-profile
name              state          descr        connection                                    def-gw        vnics        vhbas        
-------------------------------------------------------------------------------------------------------------------------------------
ovcacn07r1        up/up                       ovcacn07r1@ExtSw-2128f569d0a0a0-Port6                       4            0            
ovcacn08r1        up/up                       ovcacn08r1@ExtSw-2128f569d0a0a0-Port5                       4            0            
ovcacn09r1        up/up                       ovcacn09r1@ExtSw-2128f569d0a0a0-Port8                       4            0            
ovcacn10r1        up/up                       ovcacn10r1@ExtSw-2128f56755a0a0-Port7                       4            0            
ovcacn11r1        up/up                       ovcacn11r1@ExtSw-2128f569d0a0a0-Port10                      4            0            
ovcacn12r1        up/up                       ovcacn12r1@ExtSw-2128f569d0a0a0-Port9                       4            0            
ovcacn13r1        up/up                       ovcacn13r1@ExtSw-2128f569d0a0a0-Port12                      4            0            
ovcacn14r1        up/up                       ovcacn14r1@ExtSw-2128f569d0a0a0-Port11                      4            0            
ovcacn26r1        up/down                     10e0000128db6a                                              4            0            
ovcacn27r1        up/up                       ovcacn27r1@ExtSw-2128f569d0a0a0-Port34                      4            0            
ovcacn28r1        up/up                       ovcacn28r1@ExtSw-2128f569d0a0a0-Port33                      4            0            
ovcacn29r1        up/up                       ovcacn29r1@ExtSw-2128f56755a0a0-Port36                      4            0            
ovcacn30r1        up/up                       ovcacn30r1@ExtSw-2128f56755a0a0-Port35                      4            0            
ovcacn31r1        up/up                       ovcacn31r1@ExtSw-2128f56755a0a0-Port29                      4            0            
ovcacn32r1        up/up                       ovcacn32r1@ExtSw-2128f56755a0a0-Port30                      4            0            
ovcacn33r1        up/up                       ovcacn33r1@ExtSw-2128f56755a0a0-Port27                      4            0            
ovcacn34r1        up/up                       ovcacn34r1@ExtSw-2128f56755a0a0-Port28                      4            0            
ovcamn05r1        up/up                       ovcamn05r1@ExtSw-2128f569d0a0a0-Port4                       4            0            
ovcamn06r1        up/up                       ovcamn06r1@ExtSw-2128f569d0a0a0-Port3                       4            0            
19 records displayed


admin@ovcann22r1[xsigo] server-profile ovcacn26r1 disconnect
Disconnecting active servers will stop all virtual I/O on the physical server.  Are you sure you want to disconnect ovcacn26r1
(y/n)?y

admin@ovcann22r1[xsigo] set server-profile ovcacn26r1 connect [tab tab]
The output list is long, it has been truncated here.


Possible completions:
  hca-10e0000128f050@ExtSw-2128f56755a0a0-Port2  Connection to host hca-10e0000128f050 (up)
  hca-10e0000128f050@ExtSw-2128f569d0a0a0-Port2  Connection to host hca-10e0000128f050 (up)
[snip]
  ovcacn26r1@ExtSw-2128f56755a0a0-Port31         Connection to Linux/2.6.39-300.32.5.el5uek/x86_64 host ovcacn26r1 (up)
  ovcacn26r1@ExtSw-2128f569d0a0a0-Port31         Connection to Linux/2.6.39-300.32.5.el5uek/x86_64 host ovcacn26r1 (up)
[snip]

admin@ovcann22r1[xsigo] set server-profile ovcacn26r1 connect ovcacn26r1@ExtSw-2128f569d0a0a0-Port31

admin@ovcann22r1[xsigo] show server-profile
name              state        descr        connection                                    def-gw        vnics        vhbas        
-------------------------------------------------------------------------------------------------------------------------------------
ovcacn07r1        up/up                     ovcacn07r1@ExtSw-2128f569d0a0a0-Port6                       4            0            
ovcacn08r1        up/up                     ovcacn08r1@ExtSw-2128f569d0a0a0-Port5                       4            0            
ovcacn09r1        up/up                     ovcacn09r1@ExtSw-2128f569d0a0a0-Port8                       4            0            
ovcacn10r1        up/up                     ovcacn10r1@ExtSw-2128f56755a0a0-Port7                       4            0            
ovcacn11r1        up/up                     ovcacn11r1@ExtSw-2128f569d0a0a0-Port10                      4            0            
ovcacn12r1        up/up                     ovcacn12r1@ExtSw-2128f569d0a0a0-Port9                       4            0            
ovcacn13r1        up/up                     ovcacn13r1@ExtSw-2128f569d0a0a0-Port12                      4            0            
ovcacn14r1        up/up                     ovcacn14r1@ExtSw-2128f569d0a0a0-Port11                      4            0            
ovcacn26r1        up/up                     ovcacn26r1@ExtSw-2128f569d0a0a0-Port31                      4            0            
ovcacn27r1        up/up                     ovcacn27r1@ExtSw-2128f569d0a0a0-Port34                      4            0            
ovcacn28r1        up/up                     ovcacn28r1@ExtSw-2128f569d0a0a0-Port33                      4            0            
ovcacn29r1        up/up                     ovcacn29r1@ExtSw-2128f56755a0a0-Port36                      4            0            
ovcacn30r1        up/up                     ovcacn30r1@ExtSw-2128f56755a0a0-Port35                      4            0            
ovcacn31r1        up/up                     ovcacn31r1@ExtSw-2128f56755a0a0-Port29                      4            0            
ovcacn32r1        up/up                     ovcacn32r1@ExtSw-2128f56755a0a0-Port30                      4            0            
ovcacn33r1        up/up                     ovcacn33r1@ExtSw-2128f56755a0a0-Port27                      4            0            
ovcacn34r1        up/up                     ovcacn34r1@ExtSw-2128f56755a0a0-Port28                      4            0            
ovcamn05r1        up/up                     ovcamn05r1@ExtSw-2128f569d0a0a0-Port4                       4            0            
ovcamn06r1        up/up                     ovcamn06r1@ExtSw-2128f569d0a0a0-Port3                       4            0            
19 records displayed

You may need to wait a few seconds but on both switches you will should see the node status as "up/up".


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback