Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1532486.1
Update Date:2016-01-20
Keywords:

Solution Type  Technical Instruction Sure

Solution  1532486.1 :   Mx-32 - How to Replace a Faulty Service Processor Proxy Module (SPP)  


Related Items
  • SPARC M5-32
  •  
  • SPARC M6-32
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  


Removing and Replacing Service Processor Proxy modules and servicing SPP battery

In this Document
Goal
Solution
 Removing a SPP
 Install a SPP 


Oracle Confidential INTERNAL - Do not distribute to customer (OracleConfidential).
Reason: FRU CAP

Applies to:

SPARC M5-32 - Version All Versions and later
Oracle SuperCluster M6-32 Hardware - Version All Versions and later
SPARC M6-32 - Version All Versions and later
Information in this document applies to any platform.
In a full up 4 DCU Mx-32 platform with a single PDomain, all four SPPs operate as a single configured unit. If an SPP fails, the SP attempts to:

■ Automatically halt the server
■ Unconfigure the failed SPP from the set
■ Configure the remaining SPPs into a new set
■ Restart the server with the reduced SPP configuration.

There is a backup battery installed on each SP/SPP. BATT,LITH,3V,125MA,1632,DISK otherwise known as CR1632

Goal

CAP PROBLEM OVERVIEW: Mx-32 - SPP Failure

*********************************************************************
To report errors or request improvements on this procedure, please go to
My Oracle Support, and put a comment on Doc ID: 1532486.1

********************************************************************* 

 

ESD Caution:
  • Circuit boards and drives contain electronic components that are  extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
  • Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
  • Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

 

Contamination Caution:
  • Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED: Mx-32 Product Training/Experience

TASK COMPLEXITY: 3

TIME ESTIMATE: 60 minutes WARM replacement

FIELD ENGINEER INSTRUCTIONS

Caution – Caution – Do not turn off an SPP while it is configured into an active PDomain. Doing so will cause the associated PDomain to panic.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

The Blue 'ready to remove' LED on the face of the failed SPP should be lit. The physical domain in which the SPP was previously configured may be up or down. As long as the SPP to be replaced is properly configured out of the system it can be replaced. If it is not configured out, then the physical domain into which it is configured must be shut down and 'stopped'.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Removing a SPP

1. Determining if a Service Processor Proxy Module can be removed without further intervention:

a. Use one of these Oracle ILOM commands to display faulty components:

-> show faulty

or

-> show /System/Open_Problems

b. If there is a failed or faulted SPP, it will show up, and one should continue with step 2 below. If you see nothing then they fault condition for which the service action was filed has been cleared already, and the pending service action to replace a SPP is now suspect.

2. Determine if you need to save the TPM keystore from the faulty SPP.  (See the Service Manual for full details with examples)

a. In Oracle ILOM, see if TPM is activated.

-> show -d properties /Servers/PDomains/PDomain_x/HOST/tpm mode

■ If mode = off, go to Step 3.
■ If mode = activated, continue to Step b.
■ If mode = deactivated, ask the system administrator if you should save the TPM keystore.

b. At the PDomain host, determine if TPM is in use.
■ If you see the migratable root key under the storage root key, TPM is in use. Continue to Step c.

$ tpmadm keyinfo

■ If you see an error message, TPM is not in use. Go to Step 3.

c. Verify that TPM is migratable, and the TPM key blob is available.

$ tpmadm keyinfo 00000000-0000-0000-0000-00000000000b

d. Export the TPM key blob and the TPM authorization key to the hard drive.

$ tpmadm migrate export 00000000-0000-0000-0000-0000000000b

The default file name of the TPM key blob is tpm-migration.dat. The default file name of the TPM authorization key is tpm-migration.key. Both files are located in /var/tpm/system/.


e. Record the migration key PIN that you created in Step d.


f. Verify that the files were created today.

 

3. Review affected components status' and LED states

a. Determine to which DCU the SPP belongs by referring to the fault report displayed in Step 1. In the line that lists the name of the faulted SPP (e.g /SYS/SPPx), 'x' is the DCU number.

b. Physical inspection of the SPP, the DCU and the IOU associated with the failed SPP is optimal. If all components are off, and the Blue 'Ready to Remove' LED is lit on the suspect SPP, then one may proceed with the replacement of the failed SPP in step 3. If physical inspection is for some reason not practical, the following commands will provide further information.

c. Determine DCU and SPP status

-> show /System/DCUs

d. Determine failed DCU status.

-> show /System/DCUs/DCU_x

If the above command fails on the same DCU_x as the failed SPPx, then it is likely the SPP / DCU is powered off.

e. Review SPP fault status. The 'show /SYS/SPPx' output contains a list of Properties one can use to confirm the fault state, power on status, part and serial number information, etc. Here is an example from a faulted SPP1:

-> show /SYS/SPP1
<snip>

    Properties:
        type = SP Board Module
        ipmi_name = /SPP1
        fru_description = M5-32 SP Proxy-Pilot3
        fru_manufacturer = Celestica Holdings PTE LTD
        fru_part_number = 07045727
        fru_rev_level = 06
        fru_serial_number = 465769T+1212J2002X
        fault_state = Faulted
        clear_fault_action = (none)
        power_state = Off


f. Assuming SPP is unconfigured and ready to remove, proceed with step 3. However, if there is any doubt as to the state of the SPP and how it will affect a running domain, then the domain should be shut down by the customer sysadmins, and then 'stopped' as follows:

-> stop /HOSTx

g. Stop the affected SPP

-> stop /SYS/SPPx

4. Replace failed SPP module

a. Use an approved and tested ground strap to protect the equipment from ESD damage. 

b. Remove the SPP:

    Use a T20 Torx screwdriver to loosen the captive screws (1) on the extraction levers.

c. Swing the extraction levers 90 degrees out.

d. Pull the SPP out of the slot. Place it on a static-safe mat.

 

Install a SPP 


1. Use an approved and tested grounding strap to protect the equipment from ESD damage.
2. Use a T20 Torx screwdriver to loosen the captive screws on the extraction levers.
3. Swing the extraction levers out 90 degrees.
4. Insert the SPP in the slot and push it until the extraction levers touch the edge of the slot and start to swing closed.
5. Push the SPP and close the extraction levers completely to seat the module.
6. Fasten the locking screws.
7. Start the SPP:  -> start /SYS/SPPx
Note : You must wait for the SPP to reset -- possibly twice -- while it automatically upgrades SysFW.  Before proceeding to start the PDomain check that no operation is in progress.  This can be done by checking the "/Servers/PDomains/PDomain_x/HOST operation_in_progress" for the PDomain that contains the inserted SPP. The property value will be set to "none" if no operation is in progress.
You may need to wait 20mins or more while the SPP resets twice and the SysFW is automatically upgraded to be consistent with the other SP/SPP.

8. Restart the PDomain.  -> start /HOSTx
9. Determine your next step.
■ If you determined in “Remove an SPP”  that TPM is in use on the system, continue at Step 10. (See the Service Manual for full details with examples)
■ If TPM is not in use, go to Step 11.
10. Set up TPM if it is not already running in the PDomain.
a. Enable the TPM tcsd daemon.
$ svcadm enable svc:/application/security/tcsd
b. Initialize TPM.
$ tpmadm init
c. Verify that TPM has been initialized and the TPM keystore is available.
$ tpmadm keyinfo
■ If you see the migratable root key under the storage root key, the TPM
keystore is available on the PDomain-SPP. Go to Step 11.
■ If you do not see the migratable root key, the replacement SPP is the PDomain-SPP in a single-DCU PDomain. Continue at Step d.
d. Copy TPM data and the TPM key to the SPP.
$ tpmadm migrate import
e. Verify that the information was written to the SPP.
$ tpmadm keyinfo
11. Restart the PDomain.
12. Return the faulted component to Oracle.

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Restart software applications per applicable administration guides to resume system operation.

Use one of these Oracle ILOM commands to display faulty components.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback