![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 1383773.1 : How to Replace a Failed Sun Network QDR InfiniBand Gateway Switch
In this Document
Applies to:Exalogic Elastic Cloud X4-2 Half Rack - Version X4 and laterExalogic Elastic Cloud X4-2 Full Rack - Version X4 and later Sun Network QDR InfiniBand Gateway Switch - Version Not Applicable to Not Applicable [Release N/A] Exalogic Elastic Cloud X3-2 Quarter Rack - Version X3 and later Exalogic Elastic Cloud X3-2 Hardware - Version X3 and later Information in this document applies to any platform. GoalReplace a Sun Network QDR InfiniBand Gateway Switch (NM2-GW). Customer must be first referred to the following Document, on preparing for the on-site work: If this is an exalogic system, Doc ID 2218443.1 How to Prepare an Exalogic Infiniband Switch for Replacement (Pre-checks & Backup) Customer is requested to upload in SR-notes a confirmation that the the preparation checks have been done. The preparation steps including a plan for Restoring the Switch, whether using available backups or manual restoration, must be completed and documented in the SR-notes (as stated in MOS Note 1636229.1 Steps 3 and 4.4) ** prior to Dispatching the Part and/or FE if applicable **. An additional goal of this document is to cover the post-replacement steps needed to be taken. Certain steps are typically performed by the Field Engineer, including update of ASR and NEW process for updating the Installed-base, once the replacement has been performed - refer to step D. below. Refer also to additional post-replacement steps and links in Customer Acceptance section, typically performed by Customer-admin or their representatives. This document has distribution EXTERNAL, since the IB Switch is defined as a Customer-Replaceable-Unit in a limited number of Platforms, for example in custom-built solutions outside Engineered Systems.
SolutionDISPATCH INSTRUCTIONS WHAT SKILLS DOES THE ENGINEER NEED:
Customer has completed steps given in Doc id Doc id 2218443.1 (Exalogic Systems), or 1636229.1 (for all other systems) and the checklists (Steps 3 and 4.3) been confirmed in customer-visible SR-notes; the Plan for restoration of the configuration information has been documented in customer-visible SR-notes and the owner for the configuration restoration actions has been clearly identified. Oracle TSE should have ordered the correct Part# relevant to the IB Switch firmware needed to be used, as per <Document 2187802.1> . If the customer has not already powered off the Switch (for example, replacement during Production IB Fabric), then the IB Switch should be on and the Subnet Manager has been disabled (#disablesm) ** If the FE is unable to confirm that the checklist has been followed or is unable to view the SR in SR Viewer, then please phone back in to Support before attending site and ask for this to be confirmed **
WHAT ACTION DOES THE ENGINEER NEED TO TAKE: ( PLEASE READ ALL INSTRUCTIONS BEFORE PROCEEDING ) This procedure is comprised of 4 stages: A. Initial physical replacement in the rack
A. Initial physical replacement in the rack (no cabling yet - cables will be connected in later Step C) 1. Power off the Switch: The switch needing replacement will need to be powered off (if not already done by customer team). Power off both power supplies on the switch by removing both the power plugs. If customer has not taken a full IB Fabric downtime, then check with customer representative to confirm if everything is working normal on other parts of the IB Fabric, after powering off this switch. For example SM master may have moved to another switch (if this Switch had been the Master and admin team were unable to move the master earlier). If every configuration is as per the standard, this will have no effect on any operation of the system. If there is any problem or anomaly detected in the running IB Fabric, then work with customer / Support to get the issue resolved prior to proceeding with the replacement. 2. Now, disconnect the cables from the switch. All InfiniBand cables should have labels at both ends indicating their locations. If there are any cables that do not have labels, then label them - if needed, refer to cabling tables in customer's build documentation, for example, if Exalogic, then this would be Exalogic Machine Owner's Guide. Then, remove the switch being replaced, from the rack. Note: Read "Sun Network QDR InfiniBand Gateway Switch Installation Guide for Firmware Version 2.1". To remove the switch, you can just reverse the steps of installing. Refer to "Sun Network QDR InfiniBand Gateway Switch Installation Guide for Firmware Version 2.1" for detailed steps on installing a gateway switch.
Then follow the steps on "Powering On the Gateway", or the "Power on the Gateway" section of the pdf document "Sun Network QDR InfiniBand Gateway Switch Installation Guide for Firmware Version 2.1" In the above section you need to complete 1) Attach the Management Cables, 2) Attach the Power Cords, 3) Accessing the Management Controller and 4) Verify the Gateway Status. Do not do the section on "Start the subnet Manager" Note: The default password for root is changeme Set the Network Management Parameters (CLI). The initial setting up of the network management parameters may have to be done by accessing the switch through its USB management port. Make sure that the management IP address assigned to the replacement switch is the same as that of the old switch. If customer does not know that IP address, as well as its mask and default gateway, then ask customer to provide any available IP address in the same subnet of the other IB switches in this rack. This switch will get its correct management IP address when its configuration is restored in step 6 below. We can use a temporary address until then. Do not connect any Infiniband cables yet. Do not start subnet manager yet.
B. Replacement Switch Firmware Check & Upgrade 1. Check the firmware of the other switches in the rack to know what firmware version the replacement switch should be running. If firmware is to be downgraded, for example to match an older Engineered Systems PSU, then the following document must have been referred to by the Oracle TSE when ordering the part: <Document 2187802.1> Infiniband Switch - Firmware Downgrade To 2.1.6 Fails With Error: Cannot proceed with downgrade on this SP. 2. Download that firmware from MOS and upgrade the firmware of the replacement switch that version: For Sun Network QDR InfiniBand Gateway Switch (NM2-GW), be sure to follow the detailed firmware upgrade procedures given in the Product Note document for firmware 2.2 "Upgrading the Gateway Firmware (CLI)" (n.b. these procedures are also on page 20 to 24 of the PDF version of the Product Note). Note carefully to only use the protocols for download, that are supported by the IB Switch firmware, as per the protocols listed at step 3 of the Update procedure. Also note carefully to perform the double-upgrade as at steps 3 and 4. Ensure that all the steps for upgrading firmware are completed and the switch is restarted (as at step 6) and firmware integrity checked (as at steps 7, 8 and 9). Note: When upgrading firmware from 1.3.x to 2.1.7 or above, first upgrade to 2.1.6. ( Refer BUG 26735450 - NM2 GW Product Notes FW 2.1 Upgrade/Downgrade table is incorrect). Note, If firmware upgrade fails when upgrading to firmware version 2.1.8, Oracle employee may refer to the following internal document, please call Oracle support if you need access to this document: <Document 2109781.1> How to fix broken InfiniBand Switch after upgrade to 2.1.8 firmware
3. Disable SM on the replacement switch:
C. Cable up the replacement switch and check basic IB Fabric connectivity 1. Completely power off the replacement switch now by unplugging both the power supplies.
2. Now connect all the Infiniband cables Refer to the section "Connecting Data Cables" or, "Connecting Data cables" section of the pdf document "Sun Network QDR InfiniBand Gateway Switch Installation Guide for Firmware Version 2.1" 3. Power on the replacement switch by installing power cords to the InfiniBand switch power supply slots.
# listlinkup
D. Final housekeeping, documentation, warm-handover and wrap-up 1. ASR: Set the serial number, product level identity and ASR of this replaced switch as per the steps in the following document. Refer to: How to configure Datacenter InfiniBand Switch 36 & QDR InfiniBand Gateway Switches for ASR (Doc ID 1902710.1)
2. Installed Base: Update Installed Base, to ensure that the replacement-part serial# will be properly entitled. Within Oracle, the IB Switch is termed a "SuperFRU" which simply means that it is a whole chassis replacement including both chassis and internal main-board. Therefore, follow the relevant SuperFRU procedure: a. If the IB Switch has been replaced by end customer as i.e. by Parts-Only/CRU, then use the following procedure: Oracle Support Document 1575977.1 (How can customers update the System Serial Number after a SuperFru Part Replacement)
3. Oracle Field Engineer (where an FE has been dispatched) should now document in a visible note in the Task debrief, whether all of the above steps are complete and if any have been missed or skipped or any anomalies then these need to be clearly indicated. If the customer will be working further with Oracle Support at this time on restoration of the configuration, then a ** warm handover from the Field Engineer to the Oracle Support Engineer is required **.
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE: These document lists critical follow-up actions required to restore the configuration (from config backup where available) along with customer-specific configuration items such as smnodes, partitions and VNICs.
References<NOTE:2125242.1> - Infiniband Switch Replacement – Overview and guide to key articles<NOTE:2140928.1> - How to Prepare an Infiniband (IB) Fabric for Planned Outage of an IB Switch <NOTE:1341658.1> - How to Replace a Failed Sun Datacenter InfiniBand Switch 36 <NOTE:1636229.1> - How to Prepare an Infiniband Switch for Replacement <NOTE:2125203.1> - Infiniband Switch Replacement - Follow-up Actions Attachments This solution has no attachment |
||||||||||||||||
|