Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1987712.1
Update Date:2017-10-11
Keywords:

Solution Type  Technical Instruction Sure

Solution  1987712.1 :   M7-16 How to replace a Faulty Switch Unit in a Switch Chassis [VCAP]  


Related Items
  • SPARC M7-16
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  
  • Microlearning>Video>ML-VID-VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: this is now a FRU

Applies to:

SPARC M7-16 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

CAP PROBLEM OVERVIEW: M7-16 How to replace a Faulty Switch Unit in a Switch Chassis - Switch Unit failure

*********************************************************************
To report errors or request improvements on this procedure, please go to
My Oracle Support, and put a comment on Doc ID: 1987712.1

*********************************************************************

ESD Caution:

  • Circuit boards and drives contain electronic components that are  extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
  • Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
  • Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

 

Contamination Caution:

  • Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.

Solution

 

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED: M7-16 product training

TASK COMPLEXITY: 2

TIME ESTIMATE: 60 minutes

HOT replacement

SWU FPGA update may be required during SWU replacement. See Below

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

 The Physical Domains must be shut down.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

- Important note : After replacing the SWU, make sure that the FPGA version is up to date and update as appropriate - Refer to doc 2076387.1

Preparing a Switch Unit for Removal

The switch unit replacement kit includes plastic covers to protect switch unit connectors. Return these covers to Oracle.

1. Use one of these Oracle ILOM commands to display faulty components:

  1.  -> show faulty
  2.  -> show /System/Open_Problems
  3.  faultmgmtsp> fmadm faulty

2. Locate the faulty SWU by its amber Service Required LED

3. Use a grounding strap to protect the equipment from ESD damage

4. Verify that the switch unit has been removed from service.

  1. -> show /System/Other_Removable_Devices/Scalability_Switch_Boards/Scalability_Switch_Board_x

5. Determine the next step

  1. If health returns a value of Service Required, go to step 6
  2. If health returns a value of Offline, go to step 7

6. Stop the server.

  1. -> stop /System

7. Prepare the switch unit for removal and verify that it is ready.

  1. -> set /System/Other_Removable_Devices/Scalability_Switch_Boards/Scalability_Switch_Board_x action=prepare_to_remove

 When the SWU is ready to remove, the above value will display Offline, and the blue Ready to Remove LED will light.

 

Removing a Switch Unit

1. Unpack the new SWU on a static-safe mat.

2. Remove the plastic covers from the connectors on the new SWU and set them aside for installation on the old SWU connectors, once you have removed it from the system.

3. Unseat the SWU

  1. Pull the ejector arm out to disengage the switch unit from the server
  2. Press the arm back toward the unit to prevent it from being damaged
  3. Pull the switch unit out of the server less than half way
If you are unseating a switch unit so you can replace it with a new one, ensure that you have already removed the fan modules from the switch unit.

4. Remove the fan modules from the switch unit for installation in the replacement switch unit.

  1. Check M7-16 - How to replace a Faulty Fan module in a Switch chassis (Doc ID 1952111.1) for details

5. Remove the SWU

  1. Carefully remove the switch from the server, to avoid bumping the rear connectors

6. Place the switch on an antistatic mat

 

Installing a SWU

1. Insert the SWU in its slot

  1. Open the ejector arm so that it is fully open
  2. Install the new switch unit into its slot in the server until the ejector arm begins to engage
  3. Press the arm back toward the switch unit, and then press the arm firmly against the switch unit to fully seat it back into the server

2. Reinstall the fan modules that you removed from the faulted switch unit.

  1. Check M7-16 - How to replace a Faulty Fan module in a Switch chassis (Doc ID 1952111.1) for details

3. After insertion, check the event logs to confirm if any SWU FPGA update is required

Example :

-> show /SP/logs/event/list/

Event
ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------
1278 Tue Dec 15 16:25:27 2015 Chassis Log major
/SYS/SWU0/FPGA update required.

4. If FPGA update is required, refer to SPARC M7 Series Servers : SWU FPGA firmware update (Doc ID 2085539.1).

Return the faulted component to Oracle.

Caution - The removed SWU must be properly repackaged to prevent damage during return transporation to Oracle.  The SWU should be repackaged in identical fashion as the delivered FRU. See the Service Manual for details about proper repackaging.
  1. If the replacement included safety covers on the connectors, install the covers on the component that you are
  2. In the shipping container that contained the replacement component:
    1. Using the same material used to pack the replacement component, position the component so that it is not free to move.
    2. Add any required paperwork or other documentation in the container.
    3. Except when packing CMIOUs, include any tools that were loaned to you by Oracle. Do not place tools inside a container that is being used to return a CMIOU.
  3. Close the shipping container and seal it with the packaging tape supplied by Oracle.
  4. Apply the shipping label to the shipping container.
  5. Notify Oracle or an authorized shipper that the carton container is ready for pickup.

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Verify that there is no faulty components

  1. -> show faulty
  2.  -> show /System/Open_Problems
  3.  faultmgmtsp> fmadm faulty

Perform one of the following tasks based on your verification results

  1. If the previous steps did not clear the fault, refer to doc 1309092.1 for information about the tools and methods you can use to diagnose and clear component faults.
  2. If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required

Before restarting the PDomain, confirm from restricted shell that all of the components are running the latest respective FPGA version :

[(restricted_shell) sp0:~]# hw version | grep "M7_FPGA"

Refer to SPARC M7 Series Servers : SWU FPGA firmware update (Doc ID 2085539.1) if any further FPGA update is required.

If required, for the respective hosts, change the values for the /HOSTx/diag default_level and hw_change_level before starting the host.

-> set /HOSTx/diag/ default_level=max
-> set /HOSTx/diag/ hw_change_level=max

Restart the PDomains

  1. -> start /System
  2. Restart software applications per applicable administration guides to resume system operation.

After all PDomains have restarted, repeat the steps to verify that there is no faulty components, to ensure starting the PDomains has not triggered new faults.

Return the /HOSTx/diag default_level and hw_change_level to their original value.

 

======================== Other info =====================

REFERENCE INFORMATION:  Service Manual: http://docs.oracle.com/cd/E55211_01/html/E55215/index.html


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback