Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2289554.1
Update Date:2018-03-20
Keywords:

Solution Type  Technical Instruction Sure

Solution  2289554.1 :   How to Replace a SPARC T8-1 Service Processor Module (SPM) [VCAP]  


Related Items
  • SPARC T8-1
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  




Applies to:

SPARC T8-1 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replace a SPARC T8-1 Service Processor Module (SPM)

Solution

 


*****************************************************************************************
To report errors or request improvements on this procedure, please add a comment on Doc ID: 2289554.1
*****************************************************************************************

ESD Caution:

  • Circuit boards and drives contain electronic components that are  extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
  • Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
  • Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

Contamination Caution:

  • Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.
Warning:
  • Ensure the replacement SP is of the correct type for the T8 system. The T8 SP can be identified by a black plastic cover, and the label: T8 Only Service Processor (SP). The previous versions of the Service Processor with the grey plastic cover, are NOT compatible with the T8 server and will not function. If the replacement Service Processor is not the correct type, STOP, and do not continue with the replacement procedure.

 

DISPATCH INSTRUCTIONS

WHAT SKILLS ARE REQUIRED?: No special skills required, Customer Replaceable Unit (CRU) procedure

Time Estimate: 30 minutes

TASK COMPLEXITY: 0

REMOVAL/REPLACEMENT INSTRUCTIONS:

PROBLEM OVERVIEW: SPARC T8-1 Service Processor Module (SPM) Replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Note:
  • A data backup is not a prerequisite but is a wise precaution.
  • If the system is still up and functioning, the Customer should perform an orderly and graceful shutdown of the applications and operating system.
  • NOTE: Restoring the SP configuration, after replacing the service processor, will be much simpler if the configuration has been saved using the Oracle ILOM backup utility. If the configuration has not been backed up, do so now, prior to SP replacement if possible. Refer to the Oracle Integrated Lights Out Manager (ILOM) 4.0 Documentation for instructions on "Backing up and Restoring the Oracle ILOM configuration".
  • If you want to retain the same version of the system firmware with the new service processor, note the current version before you remove the service processor.
  • Then power off the server and remove the AC power cords from the system. For ALL scenarios where an AC power down or AC power cycle is required for a T8-x server, please always use the steps in doc 1571054.1 prior to physically removing AC power cables from the server.


WHAT ACTIONS ARE REQUIRED?:

Damage Alert:
  • Perform a visual inspection of the replacement part to make sure that there are no damaged components, connectors, bent pins, damaged packages during shipping, etc). If the part is damaged, don't install it into the system, order a new part. Handle with caution and package carefully the return part to avoid any damages during shipping.
Note:
  • System firmware consists of two components, an SP component and a host component. The service processor (SP) firmware component is located on the SP, and the host component is located on the motherboard. In order for the server to operate correctly, these two components must be compatible.
  • After replacing the service processor (SP), the new service processor (SP) firmware component must be compatible with the existing host firmware component. If the firmware components are incompatible, you need to load new system firmware.
  • The amber SP OK/Fault LED on the front panel will be lit when an SP fault is detected.

 

Replace the Service Processor Module (SPM)

1. Log into the ILOM and check the fruid container values and sync them if needed.

    a. To avoid mismatched fruid values causing a failure after a Service Processor Module (SPM) replacement the fruid data should be confirmed to have matching data in at least the Primary (DBP) and Backup2 (MB) containers so that the SPM will have it's container updated automatically after replacement. Go into restricted mode and use the showpsnc command to check this.  

           -> set SESSION mode=restricted

           WARNING: The "Restricted Shell" account is provided solely to allow Services to perform diagnostic tasks.

           [(restricted_shell) t8-1-bur09-a-sp:~]# showpsnc
           Primary: fruid:///SYS/DBP
           Backup 1: file:///persist/psnc_backup1.xml
           Backup 2: fruid:///SYS/MB

           Element           | Primary           |  Backup1           |   Backup2
          ------------------+-------------------+-------------------+-------------------
           PPN                 35129165+1+1       35129165+1+1     35129165+1+1
           PSN                 1733NN80PF           1733NN80PF          1733NN80PF
           MACADDR        00:10:E0:D5:BA:E8 00:10:E0:D5:BA:E8 00:10:E0:D5:BA:E8
           HOSTID            86d5bae8              86d5bae8            86d5bae8
           Product Name    SPARC T8-1          SPARC T8-1          SPARC T8-1
              RFID SN 341A583DE5800000000225BB 341A583DE5800000000225BB 341A583DE5800000000225BB
           [(restricted_shell) t8-1-bur09-a-sp:~]# exit 

    b. The above example shows a system with all three containers properly in sync. If the output from the system does not show all of the containers with matching values then you should reset the SP and then re-check the values again. An ILOM reset will attempt to auto-populate the matching values if one container is out of sync.              

           -> reset /SP
           Are you sure you want to reset /SP (y/n)? y
           Performing reset on /SP 

    c. After an SP reset if the Primary and Backup2 containers match then proceed with the following steps to replace the SPM. If these two containers do not match then DO NOT proceed with the replacement yet.
    d. If the containers do not match you will need to use the copypsnc command from service or escalation mode to copy the data from the good container so that the Primary and Backup2 containers match (Backup1 is the SPM and we are about to replace this so it is not as important at this step).

If you are unfamiliar with this process and require assistance please reference the steps for using copypsnc to fix the serial number detailed in the "How to update product serial number on systems which implement TLI functionality (Doc ID 1280913.1)" and contact the TSC if needed. How to access service mode and escalation mode on ILOM 3.x and later platforms (Doc ID 1019946.1).
   
    e. After the fruid data in the Primary and Backup2 containers have been confirmed to match proceed with the following steps.

2. Prepare the server for service.

    a. Power off the server and disconnect the power cords from the power supplies.
    b. Extend the server to the maintenance position in the rack.
    c. Attach an anti-static wrist strap.

3. Remove the Top Cover of system.

4. Open the clear plastic air duct assembly cover by lifting the edge of the cover closest to the rear of the server. The cover can stand in its upright position but perform the next steps if you need to remove the cover:

    a. Open the fan cover.
    b. Pull open the plastic tabs to release the clear plastic air duct assembly covers hinged edge and remove it from the server.

5. Locate the SPM toward the back of the system between the PCIe cards.

6. Remove the SPM

    a. Grasp the SPM by the two grasp points and lift up to disengage the SPM from the connectors on the motherboard.
    b. Lift the SPM up and away from the motherboard.

7. Lower the side of the SPM with the Align Tab sticker at an angle down on the SPM tab on the motherboard.

8. Press the SPM straight down until it is fully seated in its socket.

Caution:
  • If the module does not slide into the socket with relative ease, do not force it. It may be that the module’s pins are not perfectly aligned with the socket. Excessive force could damage the pins, socket, or both.

9. Close the clear plastic air duct assembly cover by rotating it down over the motherboard / two memory risers and then slightly pressing in the two tabs to secure it. See following if you need to reattach the cover:

    a. Open the fan cover.
    b. Align the clear plastic air duct assembly cover on its hinge edge.
    c. Attach the hinged plastic tabs to the server by pulling them open and letting them seat into the holes so that the clear plastic air duct assembly cover can hinge up and down.
    d. Lower the clear plastic air duct assembly cover (making sure the two pairs of side tabs clear the sides of the server).

10. Install the top cover on the system

11. Return the Server to operation.

    a. Remove any anti-static measures that were used.
    b. Return the server to it's normal operating position within the rack.
    c. Re-install the AC power cords and any data cables that were removed.


12. Prior to powering on the server, connect a terminal or a terminal emulator (PC or workstation) to the SP SER MGT port. If the SP detects that the new host firmware component is incompatible with SP firmware component, further action will be suspended and the following message will be delivered over the SER MGT port:

Unrecognized Chassis: This module is installed in an unknown or unsupported chassis. You must upgrade the firmware to a newer version that supports this chassis.

If you see this message, go on to Step #13. Otherwise, skip to Step 14.

Note:
  • Whenever you replace the SPM or the motherboard, update the firmware on the server so the portions of firmware in the two components remain consistent.

13. Download the system firmware.
    a. Configure the SP’s network port to enable the firmware image to be downloaded.
        Refer to the Oracle ILOM documentation for network configuration instructions.
    b. Download the system firmware.
        Follow the firmware download instructions in the Oracle ILOM documentation.

Note:
  • You can load any supported system firmware version, including the firmware revision that had been installed prior to the replacement of the service processor.

    c. If a backup file was created, use the Oracle ILOM restore utility to restore the configuration of the replacement service processor.

14. Set the ILOM time/date (-> set /SP/clock datetime=MMDDhhmmYYYY.ss)

15. Power on server. Verify that the Power/OK indicator led lights steady on.

16. Verify that the SP Status LED is illuminated green.

Note:
  • The LED will flash green while the SPM initializes the Oracle ILOM firmware. 
Important:
  • Prior to booting your system the Solaris fallback image (miniroot) must be reloaded. This step can be performed when connected to the Service Processor using the Web Browser-Based Network Management Connection.
  • How to obtain the fallback image (miniroot):
    http://docs.oracle.com/cd/E53394_01/html/E54742/gplct.html

Note:

  • For step "c" the entry for product field would be "SPARC T8-1.....or T8-2, T8-4"
  • For step "d" select from the drop down...."SPARC T8-1 Fallback Boot 11.3" <------------- or whatever the correct version needs to be
  • For step "e" hit search and select correct version of fallback version and download file to a location available to the browser that will be used for the loading
  • How to update the fall back image via the BUI
    https://docs.oracle.com/cd/E37444_01/html/E37446/gqcim.html

17. Set the system serial number/fruid data if needed.

    a. The SPM is not the primary fruid container in this server so when it is replaced you should not normally need to fix the serial number information (TLI).
    b. login to the ILOM as root and then enter the "restricted shell" to check the fruid values. Follow the example below to enter restricted shell and use the showpsnc command:       

     -> set SESSION mode=restricted

     WARNING: The "Restricted Shell" account is provided solely to allow Services to perform diagnostic tasks.

     [(restricted_shell) t8-1-bur09-a-sp:~]# showpsnc
     Primary: fruid:///SYS/DBP
     Backup 1: file:///persist/psnc_backup1.xml
     Backup 2: fruid:///SYS/MB

     Element           | Primary           |  Backup1            |    Backup2
     ------------------+-------------------+-------------------+-------------------
     PPN                 35129165+1+1        35129165+1+1       35129165+1+1
     PSN                 1733NN80PF            0000000000            1733NN80PF
     MACADDR        00:10:E0:D5:BA:E8  00:10:E0:D5:BA:E8  00:10:E0:D5:BA:E8
     HOSTID            86d5bae8               86d5bae8              86d5bae8
     Product Name    SPARC T8-1           SPARC T8-1            SPARC T8-1
     [(restricted_shell) t8-1-bur09-a-sp:~]#  

    c. When the SPM is replaced the Backup1 fruid container will likely not match the Primary entry. If it does not you must enter escalation or service mode to fix it (if all three entries match this step is done).
    d. Contact the TSC to request an escalation password (service mode will work also if just the copypsnc command ends up needing to be used, if the setpsnc command is needed escalation mode is required. setpsnc is not covered in this procedure).
    e. Provide your TSC contact the output from the following ILOM commands- "version", "show /SYS product_serial_number", and "show /SP/clock". If the product_serial_number information does not give good output then provide the showpsnc output that was seen in step b above as well.
    f. The TSC will provide an escalation password that is made up of 32 short words. Follow the example below to create a new user with the 'Service' role assigned. The Service role is required to access service or escalation modes. In the following example we will create a user named 'escuser' with the service role.
 

     -> cd /SP/users
     /SP/users
     -> create escuser
     Creating user...
     Enter new password: ********
     Enter new password again: ********
     Created /SP/users/escuser
     -> set escuser role=aucros
     Set 'role' to 'aucros'
     -> show escuser
     /SP/users/escuser
     Targets:
     ssh
     Properties:
     role = aucros
     password = *****

    g. Set the check_physical_presence to false and then exit from the ILOM so that you can login as the newly created user.            

     -> set /SP check_physical_presence=false
     Set 'check_physical_presence' to 'false'
     -> show /SP check_physical_presence
     /SP
     Properties:
     check_physical_presence = false

     -> exit 

     h. Login using the escuser login and enter escalation mode using the password that was provided by the TSC.           

     t8-1-bur09-a-sp login: escuser
     Password:

     Oracle(R) Integrated Lights Out Manager

     Version 3.2.4.34 r95732

     Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.

     Warning: The system appears to be in manufacturing test mode.
     Contact Service immediately.

     Hostname: t8-1-bur09-a-sp

     -> cd /SP/users/ecsuser/escalation
     -> set SESSION mode=escalation                            
     Password:**** **** **** **** **** *** *** **** **** **** **** **** **** **** **** **** *** *** **** *** **** **** **** *** **** **** *** **** *** *
     Short form password is:  NOSE HAAG MED 

     [(escalation_mode) t8-1-bur09-a-sp:~]# 

    i. Use the showpsnc command to confirm the current container values. Confirm that the primary container has a serial number (the value on the PSN line) that matches the system serial number. The system serial number can be checked by comparing to the serial number RFID tag on the front left hand side of the server. After confirming that there is a valid fruid primary use the copypsnc command to write the good data from the primary to the backup1 container on the SPM. The following example shows copying from primary to the backup1, but you could also copy from backup2 if needed.           

     [(escalation mode) t8-1-bur09-a-sp:~]# showpsnc
     Primary: fruid:///SYS/DBP
     Backup 1: file:///persist/psnc_backup1.xml
     Backup 2: fruid:///SYS/MB

     Element            | Primary            |    Backup1            |   Backup2
     ------------------+-------------------+-------------------+-------------------
     PPN                  35129165+1+1          35129165+1+1        35129165+1+1
     PSN                  1733NN80PF              0000000000             1733NN80PF
     MACADDR         00:10:E0:D5:BA:E8    00:10:E0:D5:BA:E8  00:10:E0:D5:BA:E8
     HOSTID            86d5bae8                   86d5bae8                86d5bae8
     Product Name   SPARC T8-1                SPARC T8-1              SPARC T8-1

     [(escalation mode) t8-1-bur09-a-sp:~]# copypsnc Primary Backup1

     [(escalation mode) t8-1-bur09-a-sp:~]# showpsnc
     Primary: fruid:///SYS/DBP
     Backup 1: file:///persist/psnc_backup1.xml
     Backup 2: fruid:///SYS/MB

     Element            | Primary            |    Backup1            |   Backup2
     ------------------+-------------------+-------------------+-------------------
     PPN                  35129165+1+1          35129165+1+1           35129165+1+1
     PSN                  1733NN80PF               1733NN80PF              1733NN80PF
     MACADDR          00:10:E0:D5:BA:E8     00:10:E0:D5:BA:E8     00:10:E0:D5:BA:E8
     HOSTID             86d5bae8                   86d5bae8                   86d5bae8
     Product Name     SPARC T8-1               SPARC T8-1                 SPARC T8-1
     [(escalation mode) t8-1-bur09-a-sp:~]# exit
 

     j. At this point if all of the fruid containers match and have the correct serial number data this step is done. If more than one of the fruid containers had non-valid entries then the copypsnc command should be used to copy over the valid data to the other container that is not valid. (ie. "copypsnc Primary Backup2") After confirming all fruid data is correct reset the ILOM to confirm that the fruid data persists through a reboot and remove the escalation user if needed. 

     -> reset /SP
     Are you sure you want to reset /SP (y/n)? y
     Performing reset on /SP
     ..........

     ***login as the root user again and check the fruid data***

     -> set SESSION mode=restricted

     WARNING: The "Restricted Shell" account is provided solely to allow Services to perform diagnostic tasks.

     [(restricted_shell) t8-1-bur09-a-sp:~]# showpsnc
     Primary: fruid:///SYS/DBP
     Backup 1: file:///persist/psnc_backup1.xml
     Backup 2: fruid:///SYS/MB

     Element             | Primary                  |    Backup1                |    Backup2
     -------------------+------------------------+-------------------------+-------------------
     PPN                    35129165+1+1              35129165+1+1             35129165+1+1
     PSN                    1733NN80PF                  1733NN80PF                 1733NN80PF
     MACADDR           00:10:E0:D5:BA:E8        00:10:E0:D5:BA:E8        00:10:E0:D5:BA:E8
     HOSTID               86d5bae8                     86d5bae8                      86d5bae8
     Product Name       SPARC T8-1                  SPARC T8-1                  SPARC T8-1
     [(restricted_shell) t8-1-bur09-a-sp:~]# exit

     -> cd /SP/users
     /SP/users
     -> delete escuser
     Are you sure you want to delete /SP/users/escuser (y/n)? y
     Deleted /SP/users/escuser

     k. If trouble is encountered during any of the steps of accessing escalation mode and fixing the fruid containers please contact the TSC for assistance.

 

How to verify the SPM is working properly

1. Log into ILOM to check SPM status.

Sample:

-> show SPM

/SYS/MB/SPM
Targets:
NETMGMT
V_+1V0
V_+1V5
V_+1V8
V_+3V3
V_VBAT

Properties:
type = SP Board Module
ipmi_name = MB/SPM
fru_description = ASY,SP,T8/M8,8Gbit
fru_manufacturer = Oracle Corporation
fru_part_number = 7343771
fru_rev_level = 02
fru_serial_number = 465769T+17191N02E5
fault_state = OK
clear_fault_action = (none)

Commands:
cd
set
show

->

 
2.  Check ILOM event log to see if any errors related to SPM.

-> show /SP/faultmgmt
-> show /SP/logs/event/list

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FE/CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Boot system and monitor boot sequence for errors. Test functionality of system:

  1. Run the Solaris "fmadm faulty" and SP/ILOM "show faulty" command to verify that the fault has been cleared.
  2. Perform one of the following tasks based on your verification results:
    • If the previous steps did not clear the fault, refer to doc 1004229.1 for information about the tools and methods you can use to diagnose and clear component faults.
    • If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required
  3. Restart software applications per applicable administration guides to resume system operation.

PARTS NOTE:
https://support.oracle.com/handbook_private/Systems/SPARC_T8_1/components.html#SystemServiceProcessor

REFERENCE INFORMATION:
SPARC T8-1 Service Manual: https://docs.oracle.com/cd/E79179_01/html/E80510/index.html

 

References

<NOTE:1571054.1> - Performing an AC power cycle on the T3/T4/T5/S7/T7/T8 Servers
<NOTE:1280913.1> - How to update System, Chassis, and Product level Key Identity Properties on ILOM based systems which implement Top Level Identifier (TLI) functionality
<NOTE:1019946.1> - How to access service mode and escalation mode on ILOM 3.x and later platforms

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback