Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1988126.1
Update Date:2017-10-24
Keywords:

Solution Type  Technical Instruction Sure

Solution  1988126.1 :   M7-16 How to replace a Faulty Service Processor Proxy (SPP) in a CMIOU chassis [VCAP]  


Related Items
  • Oracle SuperCluster M7 Hardware
  •  
  • SPARC M7-16
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  
  • Microlearning>Video>ML-VID-VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: M7 FRU CAP

Applies to:

Oracle SuperCluster M7 Hardware - Version All Versions and later
SPARC M7-16 - Version All Versions and later
Information in this document applies to any platform.

Goal

CAP PROBLEM OVERVIEW: M7-16 Service Processor Proxy (SPP) in a CMIOU chassis - SPP Failure

Same procedure as Faulty Service Processor (SP) in a CMIOU chassis or in a Switch chassis replacement procedure.

 

*********************************************************************
To report errors or request improvements on this procedure, please go to
My Oracle Support, and put a comment on Doc ID: 1988126.1

*********************************************************************

ESD Caution:

  • Circuit boards and drives contain electronic components that are  extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
  • Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
  • Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

 

Contamination Caution:

  • Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.

 

Solution

 

DISPATCH INSTRUCTIONS 

WHAT SKILLS DOES THE ENGINEER NEED: M7-16 Product Training/Experience

TASK COMPLEXITY: 3

TIME ESTIMATE: 60 minutes

HOT replacement

FIELD ENGINEER INSTRUCTIONS

- SPP FPGA update may be required during SPP replacement. See Below

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? : N/A

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Important note : After replacing the SP, make sure that the FPGA version is up to date and update as appropriate - Refer to doc 2085572.1

Determining which SPP requires service

1. Use one of these Oracle ILOM commands to display faulty components:

     -> show faulty
     -> show /System/Open_Problems
     faultmgmtsp> fmadm faulty

2. Locate the faulty SPP by its amber Service Required LED

 

Preparing to remove an SPP

1. Determine Which SPP Is Managing DCU Activity

  1. -> show /System/DCUs/DCU_x sp_name

 2. Ensure that the SPMs on the SPP that you are removing are not managing hardware

  1. -> show /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_0 state_sp
  2. -> show /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_1 state_sp

To list the state_sp for all of the SPP/SPMs :

-> show -t /System/Other_Removable_Devices/Proxy_Service_Processors -l 3 -format nowrap state_sp

3. Determine the next step

  1. If the state reports Running, you must change the DCU-SPM assignment. Go to step 4.
  2. If the state reports OffDuty, neither of the SPMs on SPPx are managing a DCU. Go to step 5.

4. Change the DCU-SPM assignment

  1. -> set /System/DCUs/DCU_x initiate_sp_failover=true

NOTE: Wait approximately 4-5mins before proceeding to step 5.  A message similar to the following will appear in the control domain of the owning PDomain when the failover is complete.

       NOTICE: pciehpc (pcieb11): card is inserted in the slot /SYS/SPP1/SPM0

User can also continue to check 'show /System/DCUs/DCU_x sp_name' to verify the new SPM role assignment is completed.

If the PDomain is running and Solaris is earlier than 11.3.4.5.0, the DCU-SPM failover will not complete until 'svcadm restart ilomconfig-interconnect' is run on the PDomain.  See bug 20697238.

5. Prepare each SPM for removal (i.e., power off the SPM)

  1. -> set /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_0 action=prepare_to_remove
  2. -> set /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_1 action=prepare_to_remove

NOTE: Wait approximately 3 mins before proceeding to step 6.

6. Verify that the SPP is ready to remove by verifying that its SPMs have stopped

  1. -> show /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_0 health
  2. -> show /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_1 health
  3. health should return a value of Offline

7. After ensuring that the SPM has been taken offline, wait a couple of minutes and then verify that the PCIe devices have been taken offline

  1. -> show /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_0 state_pcie
  2. -> show /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x/Service_Processor_Module_1 state_pcie
  3. If state_pcie returns a value of Online, log onto the host and prepare the PCIe card for removal by taking the devices offline for the desired SP/SPM slot. See “Service Manual / Servicing PCIe Cards”.

8. Prepare the SPP for removal

  1. -> set /System/Other_Removable_Devices/Proxy_Service_Processors/Proxy_Service_Processor_x action=prepare_to_remove

When the SPP is ready to remove, the health value will display Offline, and the blue Ready to Remove LED will light.

 

Removing an SPP

Only remove an SPP when you have verified that the blue Ready to Remove LED on the SPP is lit.

1. Use a grounding strap to protect the equipment from ESD damage

2. Locate the lit blue Ready to Remove LED from the rear of the server

3. Label, disconnect and relocate the cables attached to the serial and network ports

4. Pinch the ejector latches and open the ejector arms

5. Pull the SPP halfway out of the SP tray

6. Close the ejector arms

7. Carefully remove the SPP from the SP tray, using two hands, and avoid bumping the rear connectors

8. Place the SPP on an antistatic mat

 

Installing an SPP

1. Insert the SPP into the slot and slide it in until the extraction levers start to close

2. Close the extraction levers fully until they lock into place

3. Reinstall the serial management and network management cables

NOTE: Wait approximately 10mins before proceeding to step 4.

4. Refer to SPARC M7 Series Servers : SP or SPP FPGA firmware update (Doc ID 2085572.1) and check if any FPGA update is required. 

 

Return the faulted component to Oracle.

  1. If the replacement included safety covers on the connectors, install the covers on the component that you are
  2. In the shipping container that contained the replacement component:
    1. Using the same material used to pack the replacement component, position the component so that it is not free to move.
    2. Add any required paperwork or other documentation in the container.
    3. Except when packing CMIOUs, include any tools that were loaned to you by Oracle. Do not place tools inside a container that is being used to return a CMIOU.
  3. Close the shipping container and seal it with the packaging tape supplied by Oracle.
  4. Apply the shipping label to the shipping container.
  5. Notify Oracle or an authorized shipper that the carton container is ready for pickup.

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

 

Verify that there is no faulty components

  1. -> show faulty
  2.  -> show /System/Open_Problems
  3.  faultmgmtsp> fmadm faulty

Perform one of the following tasks based on your verification results

  1. If the previous steps did not clear the fault, refer to doc 1309092.1 for information about the tools and methods you can use to diagnose and clear component faults.
  2. If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required

The newly installed SP will update its system firmware from the Active-SP in the system.

  1. Confirm that the correct system firmware is running
  2. If needed, download the system firmware
    1. Refer to M7-8 / M7-16 - How to update System firmware (Doc ID 1987771.1)

 

======================== Other info =====================

REFERENCE INFORMATION:  Service Manual: http://docs.oracle.com/cd/E55211_01/html/E55215/index.html


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback