![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 2401920.1 : How to Replace an Exadata X7-2 Compute Node Server InfiniBand HCA Card
In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:Exadata X7-2 Hardware - Version All Versions and laterZero Data Loss Recovery Appliance X7 Hardware - Version All Versions and later Information in this document applies to any platform. GoalHow to Replace an Exadata X7-2 Compute Node Server InfiniBand HCA Card. SolutionDISPATCH INSTRUCTIONS WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED: TIME ESTIMATE: 60 minutes TASK COMPLEXITY: 2 FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS PROBLEM OVERVIEW: An Exadata X7-2 Compute Node Server InfiniBand HCA card needs replacement WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?: IMPORTANT NOTE TO TSC ENGINEER: CUT & PASTE the “CUSTOMER ACTIVITY” sections of the Pre-Replacement and Post-Replacement steps into a SR Note and ensure the customer is aware to do these steps prior to the scheduled field engineer activity, and during and after the replacement activity. CUSTOMER ACTIVITY: Shutdown of the database node is required prior to the part replacement: If running Linux or Solaris native - follow Steps 1 to 9 of MOS Note: If running OVM then follow MOS Note:
Prepare the Server for Service The customer should have already prepared the server and powered it off. If not, provide them the instructions in the previous section. 1. Extend the server to the maintenance position Caution - Ensure that all power is removed from the server before removing or installing the InfiniBand HCA. You must disconnect the power cables from the system before performing these procedures.
Caution - These procedures require that you handle components that are sensitive to electrostatic discharge. This sensitivity can cause the components to fail. To avoid damage, ensure that you follow anti-static practices.
The InfiniBand HCA PCIe card is located in Slot 2 of the Compute Node configuration. 2. Remove the InfiniBand HCA and PCIe Riser from Slot 2:
1. Remove the replacement InfiniBand HCA card from it's anti-static bag and place on an anti-static mat. 2. Re-install the InfiniBand HCA and PCIe Riser into Slot 2: 3. Re-install the InfiniBand cables into the replacement InfiniBand HCA slots, ensuring they go back into the correct original ports. Port 1 is on the right away from the PCIe connector, and status LEDs are the upper two. Port 2 is on the left nearest the PCIe connector, and status LEDs are the lower two.
1. Install the server top cover. Use a Torx T10 screwdriver to lock the release button latch.
OBTAIN CUSTOMER ACCEPTANCE WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?: FIELD SERVICE ENGINEER and CUSTOMER ACTIVITY: 1. Verify all expected hardware is visible to the server and the fault is cleared. Assistance from the customer for server login access will be required. 2. Verify there are no outstanding faults in ILOM: # ipmitool sunoem cli 'show faulty'
Connected. Use ^D to exit. -> show faulty Target | Property | Value -------------------+-----------------------+----------------------------------- -> Session closed Disconnected # If there are faults still outstanding that did not auto-clear in ILOM after replacement, refer to the post-repair procedures section of Doc ID 1155200.1 to clear the fault. 3. Verify there are no outstanding alerts in the Database Node: # dbmcli -e list alerthistory
4. Verify the InfiniBand HCA ports are linked. The status LED's should report Green steady on for physical link, and Amber for logical IB link steady on or blinking. and Amber. Port 1 is on the right away from the PCIe connector, and status LEDs are the upper two. Port 2 is on the left nearest the PCIe connector, and status LEDs are the lower two. The "ibstatus" command should report state 'ACTIVE', phy state 'LinkUp' and rate "40 Gb/sec (4x QDR)" for both ports: # ibstatus
Infiniband device 'mlx4_0' port 1 status: default gid: fe80:0000:0000:0000:0010:e000:01cb:6761 base lid: 0x1f sm lid: 0x2 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand Infiniband device 'mlx4_0' port 2 status: default gid: fe80:0000:0000:0000:0010:e000:01cb:6762 base lid: 0x20 sm lid: 0x2 state: 4: ACTIVE phys state: 5: LinkUp rate: 40 Gb/sec (4X QDR) link_layer: InfiniBand 5. Re-enable and restart the Database services: If running Linux or Solaris native - follow Steps 11 to 14 of MOS Note: If running OVM then follow MOS Note:
PARTS NOTE: 7092757 [F] Dual 40Gb/Sec (4x) QDR InfiniBand Host Channel Adapter Module M3
REFERENCE INFORMATION: Oracle Exadata Database Machine Maintenance Guide: https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-database-servers.htm#DBMMN22020 Oracle Server X7-2 Documentation https://docs.oracle.com/cd/E72435_01/index.html How to shutdown the Exadata database nodes and storage cells in a rolling fashion so certain hardware tasks can be performed. (Doc ID 1539451.1) How to Shutdown and Startup Exadata compute nodes running OVM (Doc ID 2367609.1)
Attachments This solution has no attachment |
||||||||||||||||
|