Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1341658.1
Update Date:2018-05-09
Keywords:

Solution Type  Technical Instruction Sure

Solution  1341658.1 :   How to Replace a Failed Sun Datacenter InfiniBand Switch 36  


Related Items
  • Exadata X4-2 Hardware
  •  
  • Exalogic Elastic Cloud X5-2 Hardware
  •  
  • Big Data Appliance X3-2 Hardware
  •  
  • Exalogic Elastic Cloud X3-2 Eighth Rack
  •  
  • Oracle Virtual Compute Appliance X3-2 Hardware
  •  
  • Oracle SuperCluster T5-8 Full Rack
  •  
  • Exadata X3-2 Hardware
  •  
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Qtr Rack
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Zero Data Loss Recovery Appliance X6 Hardware
  •  
  • Big Data Appliance X4-2 Hardware
  •  
  • Exalogic Elastic Cloud X4-2 Full Rack
  •  
  • Exalogic Elastic Cloud X6-2 Hardware
  •  
  • Big Data Appliance X4-2 Full Rack
  •  
  • Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack
  •  
  • Exalogic Elastic Cloud X3-2 Half Rack
  •  
  • Exadata X5-2 Eighth Rack
  •  
  • Exalogic Elastic Cloud X4-2 Hardware
  •  
  • Exalogic Elastic Cloud X4-2 Quarter Rack
  •  
  • Exadata X3-2 Half Rack
  •  
  • Exadata X6-2 Hardware
  •  
  • Exadata X6-8 Hardware
  •  
  • Exadata X4-2 Quarter Rack
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • Exadata X5-2 Hardware
  •  
  • Exadata X5-2 Full Rack
  •  
  • Exadata X4-2 Half Rack
  •  
  • Exadata X5-2 Quarter Rack
  •  
  • Exadata Database Machine X2-8
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Full Rack
  •  
  • Exadata X3-2 Full Rack
  •  
  • SPARC SuperCluster T4-4
  •  
  • Big Data Appliance X5-2 Hardware
  •  
  • Exadata Database Machine X2-2 Full Rack
  •  
  • Oracle Database Appliance X5-2
  •  
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
  • Oracle Virtual Compute Appliance X4-2 Hardware
  •  
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Big Data Appliance Hardware
  •  
  • Exadata X3-8 Hardware
  •  
  • Zero Data Loss Recovery Appliance X4 Hardware
  •  
  • Exadata X4-8 Hardware
  •  
  • Exadata X5-2 Half Rack
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Zero Data Loss Recovery Appliance X5 Hardware
  •  
  • Exalogic Elastic Cloud X3-2 Full Rack
  •  
  • Exadata X3-2 Eighth Rack
  •  
  • Exalogic Elastic Cloud X3-2 Quarter Rack
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
  • Exalogic Elastic Cloud X4-2 Half Rack
  •  
  • Exalogic Elastic Cloud X4-2 Eighth Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata X4-2 Full Rack
  •  
  • Exadata X3-2 Quarter Rack
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Hardware
  •  
  • Exadata X4-2 Eighth Rack
  •  
  • Exadata Database Machine V2
  •  
  • Big Data Appliance X6-2 Hardware
  •  
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Half Rack
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SaND-CAP VCAP
  •  




In this Document
Goal
Solution
References


Applies to:

Exadata X4-2 Hardware - Version All Versions and later
Oracle Database Appliance X5-2 - Version All Versions and later
Exadata X3-2 Quarter Rack - Version All Versions and later
Exalogic Elastic Cloud X5-2 Hardware - Version X5 and later
Exadata X4-2 Half Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

Replace a Sun Datacenter InfiniBand Switch 36 (36p).

Customer must be first referred to the following Document, on preparing for the on-site work:

If this is an exalogic system, Doc ID 2218443.1 How to Prepare an Exalogic Infiniband Switch for Replacement (Pre-checks & Backup)
       or, for all other systems, Document 1636229.1 How to prepare an Infiniband Switch for a Field Engineer Visit for servicing or replacing

Customer is requested to upload in SR-notes a confirmation that the the preparation checks have been done.  The preparation steps including a plan for Restoring the Switch, whether using available backups or manual restoration, must be completed and documented in the SR-notes (as stated in MOS Note 1636229.1 Steps 3 and 4.4) ** prior to Dispatching the Part and/or FE if applicable **.

An additional goal of this document is to cover the post-replacement steps needed to be taken.  Certain steps are typically performed by the Field Engineer, including update of ASR and NEW process for updating the Installed-base, once the replacement has been performed - refer to step D. below.   Refer also to additional post-replacement steps and links in Customer Acceptance section, typically performed by Customer-admin or their representatives.

This document has distribution EXTERNAL, since the IB Switch is defined as a Customer-Replaceable-Unit in a limited number of Platforms, for example in custom-built solutions outside Engineered Systems.

 

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED:
If this switch is part of an Exadata machine, the engineer must be Exadata trained.

If this switch is part of an Exalogic machine, the engineer must be Exalogic trained.
If this switch is not part of an Exadata machine, then engineer should be familiar with this type of switch.

TIME ESTIMATE: 150 minutes

TASK COMPLEXITY: 3

FIELD ENGINEER INSTRUCTIONS:
PROBLEM OVERVIEW:
Failed Sun Datacenter InfiniBand Switch 36 needs to be replaced.

This CAP document for replacing Sun Datacenter InfiniBand Switch 36 is available live at this link:  Document 1341658.1

 

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

    Customer has completed steps given in Doc id Doc id 2218443.1 (Exalogic Systems), or 1636229.1 (for all other systems) and the checklists (Steps 3 and 4.3) been confirmed in customer-visible SR-notes; the Plan for restoration of the configuration information has been documented in customer-visible SR-notes and the owner for the configuration restoration actions has been clearly identified.  Oracle TSE should have ordered the correct Part# relevant to the IB Switch firmware needed to be used, as per Document 2187802.1 . If the customer has not already powered off the Switch (for example, replacement during Production IB Fabric), then the IB Switch should be on and the Subnet Manager has been disabled (#disablesm) ** If the FE is unable to confirm that the checklist has been followed or is unable to view the SR in SR Viewer, then please phone back in to Support before attending site and ask for this to be confirmed **

 

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:   ( PLEASE READ ALL INSTRUCTIONS BEFORE PROCEEDING )

This procedure is comprised of 5 stages:

A.  Initial physical replacement in the rack
B.  Replacement Switch Firmware Check & Upgrade
C.  Cable up the replacement switch and check basic IB Fabric connectivity
D.  Final housekeeping, documentation, warm-handover and wrap-up

 

A.  Initial physical replacement in the rack (no cabling yet - cables will be connected in later Step C)

 

1. Power off the Switch: The switch needing replacement will need to be powered off (if not already done by customer team). Power off both power supplies on the switch by removing both the power plugs.

If customer has not taken a full IB Fabric downtime, then check with customer representative to confirm if everything is working normally on other parts of the IB Fabric, after powering off this switch. For example SM master may have moved to another switch (if this Switch had been the Master and admin team were unable to move the master earlier). If every configuration is as per the standard, this will have no effect on any operation of the system. If there is any problem or anomaly detected in the running IB Fabric, then work with customer / Support to get the issue resolved prior to proceeding with the replacement.

 

2. Now, disconnect the cables from the switch. All InfiniBand cables should have labels at both ends indicating their locations. If there are any cables that do not have labels, then label them - if needed, refer to cabling tables in customer's build documentation, for example, if Exalogic, then this would be Exalogic Machine Owner's Guide.

Then, remove the switch being replaced, from the rack.

       Note:  Read "Sun Datacenter InfiniBand Switch 36 Installation Guide for Firmware Version 2.1".  To remove the switch, you can just reverse the steps of installing.

 

3. Install the new switch in the rack. Do not connect any infiniband cables yet.

   Refer to "Sun Datacenter InfiniBand Switch 36 Installation Guide for Firmware Version 2.1"

     If you are replacing a spine switch sitting at the bottom of the rack, the following Otube/video may be viewed to get some tips on replacing the switch.

          Otube/Video : https://otube.oracle.com/media/Exadata+IB+Spine+Install/0_kq2ry0q7

 

4. Connect management port of this IB switch to the Cisco switch within the rack (to the same port where old IB switch's management port was connected).

       Then follow the steps on "Powering On the Switch", the "Power on the Switch" section of the pdf document  "Sun Datacenter InfiniBand Switch 36 Installation Guide for Firmware Version 2.1"

             In the above section, you need to complete 1) Attach the Management Cables, 2) Attach the Power Cords,  3) Accessing the Management Controller, and 4) Verify the Switch Status.

                    Do not do the section on "Start the subnet Manager"

                        Note: The default password for root is changeme

       Set the Network Management Parameters (CLI).  The initial setting up of the network management parameters may have to be done by accessing the switch through its USB management port.  Make sure that the management IP address assigned to this switch is the same as that of the old switch. If customer does not know that IP address, as well as its mask and default gateway, then ask customer to provide any available IP address in the same subnet of the other IB switches in this rack. This switch will get its correct management IP address when its configuration is restored in step 6 below. We can use a temporary address until then.

      Do not connect any Infiniband cables yet.

      Do not start subnet manager yet.

 

 

B.  Replacement Switch Firmware Check & Upgrade

1.  Check the firmware of the other switches in the rack to know what firmware version this switch should be running.  If firmware is to be downgraded, for example to match an older Engineered Systems PSU, then the following document must have been referred to by the Oracle TSE when ordering the part:

    <Document 2187802.1> Infiniband Switch - Firmware Downgrade To 2.1.6 Fails With Error: Cannot proceed with downgrade on this SP.

2.  Download that firmware from MOS and upgrade the firmware of the replacement switch that version: For Sun Datacenter InfiniBand Switch 36 (NM2-36p), be sure to follow the detailed firmware upgrade procedures given in the Product Note document for firmware 2.2 "Upgrading the Gateway Firmware (CLI)" (note, these procedures are also on page 18-24 of the PDF version of the Product Note).  Note carefully to only use the protocols for download, that are supported by the IB Switch firmware, as per the protocols listed in the Upgrade steps 3 in the Product Note.   Ensure that all the steps for upgrading firmware are completed and the switch is restarted (as at step 6) and firmware integrity checked (as at step 7 through 9).

Note, If firmware upgrade fails when upgrading to firmware version 2.1.8, Oracle employee may refer to the following internal document, please call Oracle support if you need access to this document:   <Document 2109781.1> How to fix broken InfiniBand Switch after upgrade to 2.1.8 firmware

 

3.. Disable SM on the replacement switch:
 
   # disablesm

 

C.  Cable up the replacement switch and check basic IB Fabric connectivity

1. Completely power off the replacement switch now by unplugging both the power supplies.

 

2. Now connect all the Infiniband cables

                       Refer to the section "Connecting InfiniBand cables"  or "Connecting InfiniBand cables" section of the pdf document "Sun Datacenter InfiniBand Switch 36 Installation Guide for Firmware Version 2.1"

 

 3. Power on the replacement switch by installing power cords to the InfiniBand switch power supply slots.

 4.   On-site team now needs to check basic Infiniband connectivity:   Run the following commands on the replaced switch for verification purposes:

     # listlinkup
            -> Ensure that all cabled ports are in " up (Enabled)" state for all links that are expected to be active with nodes up at the other end of the link.  Otherwise re-seat the cables/transceivers, or check if the cables/transceivers are damaged / need replacement.
     # ibswitches
             ->  Check that all Switches including the replaced Switch are listed
     # getmaster and # sminfo
             -> Ensure that it can see the master
     # service opensmd status
             -> Ensure that opensmd is not running.  If it is still running, disable it using #disablesm command

   With the above commands, basic IB Fabric connectivity is confirmed and the replaced switch is ready for the follow-up actions described in the subsequent document.

5. Check and make sure that you can (Ethernet) ping every IB switch from every other IB switch through its management interface.

 

 

D.  Final housekeeping, documentation, warm-handover and wrap-up

1. ASR:  Set the serial number, product level identity and ASR of this replaced switch as per the steps in the following document.

Refer to: How to configure Datacenter InfiniBand Switch 36 & QDR InfiniBand Gateway Switches for ASR (Doc ID 1902710.1)

 

2.  Installed Base: Update Installed Base, to ensure that the replacement-part serial# will be properly entitled. Within Oracle, the IB Switch is termed a "SuperFRU" which simply means that it is a whole chassis replacement including both chassis and internal main-board. Therefore, follow the relevant SuperFRU procedure:

    a. If the IB Switch has been replaced by end customer as i.e. by Parts-Only/CRU, then use the following procedure: Oracle Support Document 1575977.1 (How can customers update the System Serial Number after a SuperFru Part Replacement)

    b. If the IB Switch has been replaced by Oracle field engineer, then the Oracle FE should use the *NEW* process in the internal Oracle Global Desk Manual repository, by clinking the ptp.oraclecorp.com link directly here:  How to Update Install Base serial number entitlement for InfiniBand Switch FRU replacements

    c. Partners use the process they already use.

3.   Oracle Field Engineer (where an FE has been dispatched) should now document in a visible note in the Task debrief, whether all of the above steps are complete and if any have been missed or skipped or any anomalies then these need to be clearly indicated.   If the customer will be working further with Oracle Support at this time on restoration of the configuration, then a ** warm handover from the Field Engineer to the Oracle Support Engineer is required **.  COMPLETED – End of Technical Action Plan. 

 

OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Customer-admin is required to now follow the steps given in Document 2218689.1 Exalogic Infiniband Switch Replacement - Follow-up Actions (Restoration), if this is an Exalogic system, or 2125203.1 Infiniband Switch Replacement – Follow-up Actions, for all other systems.   The steps given in these documents are required for critical follow-up actions required to restore the configuration (from config backup where available) along with customer-specific configuration items such as smnodes, partitions and VNICs.

REFERENCE INFORMATION:
Exadata Database Machine Documentation  12c Release 1 (12.1)
http://amomv0115.us.oracle.com/archive/cd_ns/E50790_01/doc/index.htm


Sun Datacenter InfiniBand Switch 36 User's Guide:
http://download.oracle.com/docs/cd/E19197-01/820-7746-13/820-7746-13.pdf

Enterprise Installation Standards (EIS) Checklists:
http://eis.us.oracle.com/checklists/checklists.html Current Systems
http://eis.us.oracle.com/checklists/eolChecklists.html End-of-Life Systems

References

<NOTE:2125242.1> - Infiniband Switch Replacement – Overview and guide to key articles
<NOTE:2140928.1> - How to Prepare an Infiniband (IB) Fabric for Planned Outage of an IB Switch
<NOTE:1636229.1> - How to Prepare an Infiniband Switch for Replacement
<NOTE:1383773.1> - How to Replace a Failed Sun Network QDR InfiniBand Gateway Switch
<NOTE:2125203.1> - Infiniband Switch Replacement - Follow-up Actions

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback