Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1988106.1
Update Date:2017-10-24
Keywords:

Solution Type  Technical Instruction Sure

Solution  1988106.1 :   M8-8 / M7-8 / M7-16 How to replace a Faulty Service Processor (SP) in a CMIOU chassis or in a Switch chassis [VCAP]  


Related Items
  • Oracle SuperCluster M7 Hardware
  •  
  • SPARC M7-16
  •  
  • SPARC M8-8
  •  
  • SPARC M7-8
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  
  • Microlearning>Video>ML-VID-VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: M7 FRU CAP

Applies to:

SPARC M7-16 - Version All Versions and later
SPARC M7-8 - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
SPARC M8-8 - Version All Versions and later
Information in this document applies to any platform.

Goal

CAP PROBLEM OVERVIEW:  M8-8 / M7-8 / M7-16 Service Processor (SP) in a CMIOU chassis or in a Switch chassis - SP Failure

*********************************************************************
To report errors or request improvements on this procedure, please go to
My Oracle Support, and put a comment on Doc ID: 1988106.1

*********************************************************************

ESD Caution:

  • Circuit boards and drives contain electronic components that are  extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
  • Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
  • Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

 

Contamination Caution:

  • Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.

Solution

 

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED: M8-8 / M7-8 / M7-16 Product Training/Experience

TASK COMPLEXITY: 3

TIME ESTIMATE: 60 minutes

HOT replacement

FIELD ENGINEER INSTRUCTIONS

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? : Depending on server model and populated CMIOU slots it may be necessary to stop the HOST controlled by the SPM on the target SP for removal.

ILOM will not permit SPM role failover for Degraded pcie path SPMs (e.g., half populated M7-8 SuperCluster chassis with CMIOU occupying only slots 0,3,5, and 7). The bug fix for 24449320 (9.7.3.b and higher) will permit you to failover the SPM role to a Degraded pcie path SPM when the HOST is stopped.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Important note : After replacing the SP, make sure that the FPGA version is up to date and updated as appropriate - Refer to doc 2085572.1

Determining which SP requires service

1. Use one of these Oracle ILOM commands to display faulty components:

  1.  -> show faulty
  2.  -> show /System/Open_Problems
  3.  faultmgmtsp> fmadm faulty

2. Locate the faulty SP by its amber Service Required LED

Preparing to remove an SP

It is recommended that you replace one SP in the system at a time.

1. Determine which SP is managing system activity

  1. for M7-8 and M7-16 servers,
    1. -> show /SP/redundancy/ fru_name

SPMs in SPARC M7-8 servers support PCIe connections. On these servers, hosts must either be powered off or must be running the Oracle Solaris OS for an SP to be removed. Do not remove SP PCIe devices when the server is booting or when the host is at the Open Boot prompt.

2. Ensure that the SPM on the SP is not managing hardware

  1. -> show /System/Other_Removable_Devices/Service_Processors/Service_Processor_x/Service_Processor_Module_0 state_sp

3. Determine the next step

  1. If state_sp reports Running, you must first change the Active SP assignment. Go to step 4.
  2. If state_sp reports OffDuty, the SP is not the Active SP and you can continue to step 5.

4.Change the Active SP assignment

  1. -> set /SP/redundancy initiate_failover_action=true

5. Prepare the SPM for removal

  1. -> set /System/Other_Removable_Devices/Service_Processors/Service_Processor_x/Service_Processor_Module_0 action=prepare_to_remove

6. For a SPARC M7-8 server with 2 PDomains only, change DCU1 SP assignment, when necessary

  1. Check which SPM is managing DCU1
    -> show /System/DCUs/DCU_1/ sp_name
  2. If the sp_name listed is on the target SP for removal, initiate role failover
    -> set /System/DCUs/DCU_1 initiate_sp_failover=true

NOTE: It may be necessary to stop the HOST controlled by the SPM on the target SP for removal.  ILOM will not permit SPM role failover for Degraded pcie path SPMs (e.g., half populated M7-8 SuperCluster chassis with CMIOU occupying only slots 0,3,5, and 7). The bug fix for 24449320 (9.7.3.b and higher) will permit you to failover the SPM role to a Degraded pcie path SPM when the HOST is stopped.

The following ILOM command can be helpful to see the status of all SPM and the health of pcie paths
-> show -t -l 3 -format nowrap /System/Other_Removable_Devices/Service_Processors/ state_sp state_pcie

7. Determine the next step

  1. For a SPARC M7-8 server with 2 PDomains, perform step 5 for the second SPM and then go to step 8.
  2. For a SPARC M7-8 server with one Pdomain, go to step 8.
  3. For a SPARC M7-16 server, go to step 9.

8. Verify that the PCIe devices have been taken offline

  1. -> show /System/Other_Removable_Devices/Service_Processors/Service_Processor_x/Service_Processor_Module_0 state_pcie
  2. If state_pcie returns a value of Online, log onto the host and prepare the PCIe card for removal by taking the devices offline for the desired SP/SPM slot. See “Service Manual / Servicing PCIe Cards”.

NOTE: The following ILOM command can be helpful to see the status of all SPM and the health of pcie paths
-> show -t -l 3 -format nowrap /System/Other_Removable_Devices/Service_Processors/ state_sp state_pcie

9. Verify that the SPM on the SP is ready for removal

  1. -> show /System/Other_Removable_Devices/Service_Processors/Service_Processor_x/Service_Processor_Module_0 health
  2. health should return a value of Offline
  3. For a SPARC M7-8 server with 2 Pdomains, perform this step for the second SPM.

10. Prepare the SP for removal

  1. -> set /System/Other_Removable_Devices/Service_Processors/Service_Processor_x action=prepare_to_remove

When the SP is ready to remove, the health value will display Offline, and the blue Ready to Remove LED will light.

11. If you can access the SP, back up the configuration information

  1. -> cd /SP/config
  2. -> dump -destination uri target

 

Removing an SP

Only remove an SP when you have verified that the blue Ready to Remove LED on the SP is lit.

1. Use a grounding strap to protect the equipment from ESD damage

2. Locate the lit blue Ready to Remove LED from the rear of the server

3. Label, disconnect and relocate the cables attached to the serial and network ports

4. Pinch the ejector latches and open the ejector arms

5. Pull the SP halfway out of the SP tray

6. Close the ejector arms

7. Carefully remove the SP from the SP tray, using two hands, and avoid bumping the rear connectors

8. Place the SP on an antistatic mat

 

Installing an SP

1. Insert the SP into the slot and slide it in until the extraction levers start to close

2. Close the extraction levers fully until they lock into place

Note: FE should connect to the serial port of the newly installed SP in order to collect POST and ILOM startup output. In case od unexpected FRU behavior the data should be uploaded into the SR

3. Refer to SPARC M7 Series Servers : SP or SPP FPGA firmware update (Doc ID 2085572.1) and check if any FPGA update is required.

4. Reinstall the serial management and network management cables

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

 

Verify that there is no faulty components

  1. -> show faulty
  2.  -> show /System/Open_Problems
  3.  faultmgmtsp> fmadm faulty

Perform one of the following tasks based on your verification results

  1. If the previous steps did not clear the fault, refer to doc 1309092.1 for information about the tools and methods you can use to diagnose and clear component faults.
  2. If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required

The newly installed SP will update its system firmware from the Active-SP in the system.

  1. Confirm that the correct system firmware is running
  2. If an update is needed or to upgrade to the latest version available, download the system firmware
    1. Refer to M8-8 / M7-8 / M7-16 - How to update System firmware (Doc ID 1987771.1)

Verify that the SP date is correct.

  1. -> show /SP/clock

If TPM was initialized on the replaced SP, the proper steps should be completed. See "Securing Systems and Attached Devices in Oracle® Solaris 11.3" in the Solaris documentation.

Verify that the Versaboot fallback image is installed.

  1. -> show /SP/firmware/host/miniroot version
  2. If no image is installed and if versaboot is used, reload the appropriate image following "How to Update the Fallback Image" from the Booting and Shutting Down Oracle® Solaris 11.3 Systems in the Solaris documentation.

Return the faulted component to Oracle.

  1. If the replacement included safety covers on the connectors, install the covers on the component that you are
  2. In the shipping container that contained the replacement component:
  3. Using the same material used to pack the replacement component, position the component so that it is not free to move.
  4. Add any required paperwork or other documentation in the container.
  5. Except when packing CMIOUs, include any tools that were loaned to you by Oracle. Do not place tools inside a container that is being used to return a CMIOU.
  6. Close the shipping container and seal it with the packaging tape supplied by Oracle.
  7. Apply the shipping label to the shipping container.
  8. Notify Oracle or an authorized shipper that the carton container is ready for pickup.

 

======================== Other info =====================

REFERENCE INFORMATION:  Service Manual: http://docs.oracle.com/cd/E55211_01/html/E55215/index.html


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback