![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1636229.1 : How to Prepare an Infiniband Switch for Replacement
In this Document
Applies to:Oracle SuperCluster Specific SoftwareSun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases] Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases] Exadata Database Machine V2 - Version All Versions and later Information in this document applies to any platform. PurposeThis document helps the user to prepare an Infiniband Switch for replacement and collect information required by Oracle Support / On-site team prior to Dispatch of a replacement IB Switch part. Scope- The document distribution is EXTERNAL since it needs to be shared with and used by the Customer-admin, as well as referenced by Partners, Field Engineers, and Oracle Support. - Where Oracle Support has confirmed the need for an IB Switch replacement (whether that replacement will be performed by Customer-admin, Partner or Oracle Field Engineer), Customer-admin is requested to confirm by updating in MOS SR-notes that all the pre-check actions in this document have been completed - by copying/pasting the requested checklist(s) into the SR in MOS - prior to Oracle Dispatching the Part for replacement. - Oracle Support will not Dispatch until these checks have been performed unless customer has given a clear statement acknowledging the risks of not performing these checks prior to replacement. DetailsNote: For IB switches within an exalogic system, use Doc ID 2218443.1 instead of this document
1. Checks needed prior to Dispatch of Part/Onsite
1.1. Check the IB Fabric to ensure resilience to later booting (refer to companion KB article)Perform the checks/actions in the following document and confirm to Oracle Support that these checks/actions have been done:
1.2 Check/update the configuration backup - If the Infiniband Switch is still responsive on the Management Ethernet port: - Use the current IB Switch Firmware Product Guide document to backup the configuration of Infiniband switch (Switch ILOM “my.config” XML backup). The following links are for Firmware v2.1: (Note: Regardless whether a backup is taken using Exabr in the previous step, it is recommended to take backup using the ILOM of the switch in all cases) For Infiniband Switch 36 (nm2-36P): Back Up the Configuration (CLI) or, Back Up the Configuration (Web) For Infiniband Gateway Switch (nm2-GW): Back Up the Configuration (CLI) or, Back Up the Configuration (Web)
- If this switch is non-responsive or otherwise unable to booted, check if a recent configuration backup exists (must have been taken after any previous change in the IB Fabric). Refer above to the relevant backup files to look for in this case. - If a recent Backup is not available, then the Customer-admin will need to manually reconfigure the switch after replacing. Minimally, knowledge of the IB Switch management-port IP host-name/address information and the switch instance number " gwinstance" (if this is a multi-rack cabling with several Exalogic and/or Big Data Appliance racks), will be required.
When confirming in MOS SR-update that this list of pre-checks have been done, Customer-admin needs to comment specifically to this point 1, to confirm exactly what the configuration restoration strategy will be, the path to the configuration backup file that will be used (if any), or the list of commands to restore configuration that will be used. This plan is required to be completed before the Part/FE is Dispatched, so as to ensure that the Customer-Admin will be ready to step in and reconfigure during the replacement intervention. Please contact Oracle Support if any questions. As with all Oracle products, customers are expected to maintain regular backups.
1.3 Check the firmware version of the SwitchIf this is a replacement, check the firmware version of the switch that is being replaced and make sure that that firmware version is available to download (since it will need to be applied to the replacement switch). To check the firmware version, login to the switch that is being replaced and run the following command # version
Here is a sample output: # version SUN DCS gw version: 2.1.8-4 <<<<<<<< firmware version Build time: ... FPGA version: ... SP board info: ...
Then, check in MOS (Patches&Updates) the availability of this firmware for download. If this firmware is not available to download, it will not be possible to have this loaded on a new switch after replacing. In that case, upgrading the firmware to the latest or the next available firmware may be required. Inform the Oracle Support engineer prior to the Dispatch by updating this information into a MOS SR-note, so that it can be confirmed whether or not upgrading the firmware after replacing the switch can have any adverse effect on the IB Fabric.
1.4 Check for presence of workaround firewall rule on port 623Check if a workaround firewall rule to block incoming requests on port 623 is implemented on the IB switch. Refer to: IB Switch Messages Wrapping with "Possible SYN Flooding On Port 623" (Doc ID 2023539.1) : # iptables -L -n If you see an entry for port 623, then that indicates this was implemented. Keep a note that this should be restored later after replacement / re-image / restore
2. Complete the check-list template – IB Switch preparation for ReplacementAnswer yes/no, and/or provide plan/comment:
3. Provide a report on the replacement pre-checks to Oracle Support including outage type
Include the check-list template both from the linked IB Fabric outage preparation at step 1.1 and for the replacement itself at step 2 above.
4. Final pre-Dispatch preparation (IB Switch replacement in production IB Fabric)
If there will NOT be a full outage of the IB Fabric, then the following steps MUST be completed now prior to the Part/Onsite being Dispatch by Oracle Support. 4.1. Disable SM on the switch being replaced
#disablesm
4.2. Check if running ASR / block alerts if so
4.3. Update in MOS that step 4 actions are completedUpdate in MOS that all actions in Steps 4.1, 4.2, and 4.3 have been completed successfully and that you are ready for Dispatch.
5. Contact Oracle Support for Part/Onsite DispatchOnce Oracle Support has reviewed and approved the check-lists and plan above at step 3, and when you have updated Oracle Support that the actions in step 4 have also been completed for the case where there will not be a full IB Fabric downtime, Oracle Support will contact you to confirm and will request from you the details of the outage window for the change, so that the Part/FE can be Dispatched. The team responsible for replacing the Switch (whether Customer-Admin, Partner, or FE), will then follow all the steps in the relevant "How to Replace document" for the particular Switch-part involved 6. Replace the switch and perform Follow-up actionsOnce the new Part is received and at the time of the outage window: Ensure that the On-site team follows the relevant How to Replace action-plan for this model of IB Switch (Refer to Infiniband Switch Replacement – Overview and guide to key articles (Doc ID 2125242.1) - click on the How to Replace document relevant to your Switch part#), unless a special action-plan has been provided by Oracle Support. *After* the replacement, Customer-Admin will need to continue with the Follow-up Actions documented here: Infiniband Switch Replacement - Follow-up Actions (Doc ID 2125203.1) References<NOTE:2125203.1> - Infiniband Switch Replacement - Follow-up Actions<NOTE:2140928.1> - How to Prepare an Infiniband (IB) Fabric for Planned Outage of an IB Switch <NOTE:1341658.1> - How to Replace a Failed Sun Datacenter InfiniBand Switch 36 <NOTE:2125242.1> - Infiniband Switch Replacement – Overview and guide to key articles <NOTE:1383773.1> - How to Replace a Failed Sun Network QDR InfiniBand Gateway Switch Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||
|