Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2401920.1
Update Date:2018-05-28
Keywords:

Solution Type  Technical Instruction Sure

Solution  2401920.1 :   How to Replace an Exadata X7-2 Compute Node Server InfiniBand HCA Card  


Related Items
  • Exadata X7-2 Hardware
  •  
  • Zero Data Loss Recovery Appliance X7 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Exadata internal only for Oracle support engineers use and approved HW partners

Applies to:

Exadata X7-2 Hardware - Version All Versions and later
Zero Data Loss Recovery Appliance X7 Hardware - Version All Versions and later
Information in this document applies to any platform.

Goal

How to Replace an Exadata X7-2 Compute Node Server InfiniBand HCA Card.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
Exadata X7-2 Training

TIME ESTIMATE: 60 minutes

TASK COMPLEXITY: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS

PROBLEM OVERVIEW: An Exadata X7-2 Compute Node Server InfiniBand HCA card needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

IMPORTANT NOTE TO TSC ENGINEER: CUT & PASTE the “CUSTOMER ACTIVITY” sections of the Pre-Replacement and Post-Replacement steps into a SR Note and ensure the customer is aware to do these steps prior to the scheduled field engineer activity, and during and after the replacement activity.

CUSTOMER ACTIVITY:

Shutdown of the database node is required prior to the part replacement:

If running Linux or Solaris native - follow Steps 1 to 9 of MOS Note:
How to shutdown the Exadata database nodes and storage cells in a rolling fashion so certain hardware tasks can be performed. (Doc ID 1539451.1)

If running OVM then follow MOS Note:
How to Shutdown and Startup Exadata database nodes running OVM (Doc ID 2367609.1)


WHAT ACTION DOES THE ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Prepare the Server for Service

The customer should have already prepared the server and powered it off. If not, provide them the instructions in the previous section.

1. Extend the server to the maintenance position
2. Disconnect the power cords from the power supplies
3. Attach an anti-static wrist strap to your wrist and to a metal area on the chassis or the rack.
4. Remove the server top cover. Use a Torx T10 screwdriver to unlock the release button latch.

Caution - Ensure that all power is removed from the server before removing or installing the InfiniBand HCA. You must disconnect the power cables from the system before performing these procedures.

 

Caution - These procedures require that you handle components that are sensitive to electrostatic discharge. This sensitivity can cause the components to fail. To avoid damage, ensure that you follow anti-static practices.


Removing the InfiniBand HCA PCIe card

The InfiniBand HCA PCIe card is located in Slot 2 of the Compute Node configuration.

1. Disconnect the external InfiniBand cables from the card, by pulling on the pull tabs to unlock the cable, and sliding them completely out of the card slots.

2. Remove the InfiniBand HCA and PCIe Riser from Slot 2:

        a. Lift the green-tabbed latch on the rear of the server's chassis next to the PCIe Slot 2 to release the PCIe card's rear bracket.
        b. Lift the riser release lever with one hand and use your other hand to remove the riser from the motherboard
        c. Place the riser and card on an anti-static mat.
        d. Remove the PCIe card from the PCIe riser. Hold the riser in one hand and use your other hand to carefully pull the PCIe card connector from the riser.
        e. Disconnect the rear bracket that is attached to the PCIe card from the rear of the PCIe riser.
        f. Place the PCIe card on an antistatic mat.


Installing the InfiniBand HCA PCIe card

1. Remove the replacement InfiniBand HCA card from it's anti-static bag and place on an anti-static mat.

2. Re-install the InfiniBand HCA and PCIe Riser into Slot 2:

       a. Insert the rear bracket that is attached to the PCIe card into the PCIe riser.
       b. Hold the riser in one hand and use your other hand to carefully insert the PCIe card connector into the Riser.
       c. Raise the Slot 2 PCIe riser release lever (marked with a green tab) to the open (up) position

       d. Gently press the riser into the motherboard connector until it seats and press the green-tabbed, riser release lever to the closed (down) position.
       e. Close the green-tabbed latch on the rear of the server's chassis to secure the PCIe card's rear bracket to the server's chassis.

3. Re-install the InfiniBand cables into the replacement InfiniBand HCA slots, ensuring they go back into the correct original ports. Port 1 is on the right away from the PCIe connector, and status LEDs are the upper two.  Port 2 is on the left nearest the PCIe connector, and status LEDs are the lower two.



Return the Server to Operation

1. Install the server top cover. Use a Torx T10 screwdriver to lock the release button latch.
2. Reconnect the power cords to the server power supply and connect any other cables to their original locations.
3. Return the server to the normal rack position.
4. Once the power cords have been re-attached and the ILOM has booted you will see a slow blink on the green LED for the server. Power on the server by pressing the power button on the front of the unit.
5. Connect to the server console via the ILOM and monitor the boot.
    By default the ILOM serial console displays the primary console output.
    In the event of unexpected boot behavior, it is advisable to connect to both ILOM serial and ILOM graphics consoles at the same time and monitor.

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

FIELD SERVICE ENGINEER and CUSTOMER ACTIVITY:

1. Verify all expected hardware is visible to the server and the fault is cleared. Assistance from the customer for server login access will be required.

2. Verify there are no outstanding faults in ILOM:

# ipmitool sunoem cli 'show faulty'
Connected. Use ^D to exit.
-> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
-> Session closed
Disconnected
#

If there are faults still outstanding that did not auto-clear in ILOM after replacement, refer to the post-repair procedures section of Doc ID 1155200.1 to clear the fault.

3. Verify there are no outstanding alerts in the Database Node:

# dbmcli -e list alerthistory

4. Verify the InfiniBand HCA ports are linked. The status LED's should report Green steady on for physical link, and Amber for logical IB link steady on or blinking. and Amber. Port 1 is on the right away from the PCIe connector, and status LEDs are the upper two. Port 2 is on the left nearest the PCIe connector, and status LEDs are the lower two.

The "ibstatus" command should report state 'ACTIVE', phy state 'LinkUp' and rate "40 Gb/sec (4x QDR)" for both ports:

# ibstatus
Infiniband device 'mlx4_0' port 1 status:
        default gid:     fe80:0000:0000:0000:0010:e000:01cb:6761
        base lid:        0x1f
        sm lid:          0x2
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)
        link_layer:      InfiniBand

Infiniband device 'mlx4_0' port 2 status:
        default gid:     fe80:0000:0000:0000:0010:e000:01cb:6762
        base lid:        0x20
        sm lid:          0x2
        state:           4: ACTIVE
        phys state:      5: LinkUp
        rate:            40 Gb/sec (4X QDR)
        link_layer:      InfiniBand

5. Re-enable and restart the Database services:

If running Linux or Solaris native - follow Steps 11 to 14 of MOS Note:
How to shutdown the Exadata database nodes and storage cells in a rolling fashion so certain hardware tasks can be performed. (Doc ID 1539451.1)

If running OVM then follow MOS Note:
How to Shutdown and Startup Exadata compute nodes running OVM (Doc ID 2367609.1)

 

PARTS NOTE:

7092757 [F] Dual 40Gb/Sec (4x) QDR InfiniBand Host Channel Adapter Module M3

 

REFERENCE INFORMATION:

Oracle Exadata Database Machine Maintenance Guide: https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-database-servers.htm#DBMMN22020

Oracle Server X7-2 Documentation https://docs.oracle.com/cd/E72435_01/index.html

How to shutdown the Exadata database nodes and storage cells in a rolling fashion so certain hardware tasks can be performed. (Doc ID 1539451.1)

How to Shutdown and Startup Exadata compute nodes running OVM (Doc ID 2367609.1)

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback