Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2346485.1
Update Date:2018-05-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  2346485.1 :   How to Replace an Oracle Server X7-2 Motherboard [VCAP]  


Related Items
  • Oracle Server X7-2
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  
  • Microlearning>Video>ML-VID-VCAP
  •  




Oracle Confidential PARTNER - Available to partners (SUN).
Reason: This is FRU

Applies to:

Oracle Server X7-2 - Version All Versions to All Versions [Release All Releases]
x86_64

Goal

How to Replace an Oracle Server X7-2 Motherboard.

Solution

 

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?:
Oracle Server X7-2 Training

TIME ESTIMATE: 60 minutes

TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Oracle Server X7-2 Motherboard needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

If the system is still up and functioning, customer should perform an orderly and graceful shutdown of applications and OS.  Then power off the server and remove the AC power cords from the system.

A data backup is not a prerequisite but is a wise precaution.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Reference Doc:
Oracle Server X7-2 Remove the Motherboard:
https://docs.oracle.com/cd/E72435_01/html/E72445/gqvfa.html#scrolltoc

 

1. Log into the ILOM check the fruid container values and sync them if needed.

  1. To avoid mismatched fruid values causing a failure after a motherboard replacement the fruid data should be confirmed to have matching data in at least the Primary (DBP) and Backup2 (PS0) containers so that the motherboard will have it's container updated automatically after replacement. Go into restricted mode and use the showpsnc command to check this.  
    -> set SESSION mode=restricted

    WARNING: The "Restricted Shell" account is provided solely
    to allow Services to perform diagnostic tasks.

    [(restricted_shell) x7-2-sp:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 35112724+1+1        35112724+1+1        35112724+1+1
    PSN                 1733XC300M          1733XC300M          1733XC300M
    Profile             0x00000000          0x00000000          0x00000000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE580000000022914 341A583DE580000000022914 341A583DE580000000022914
    [(restricted_shell) x7-2-sp:~]# exit

  2. The above example shows a system with all three containers properly in sync. If the output from the system does not show all of the containers with matching values then you should reset the SP and then re-check the values again. An ILOM reset will attempt to auto-populate the matching values if one container is out of sync.  
    -> reset /SP
    Are you sure you want to reset /SP (y/n)? y
    Performing reset on /SP
     
  3. After an ILOM reset if the Primary and Backup2 containers match then proceed with the following steps to replace the motherboard. If these two containers do not match then DO NOT proceed with the replacement yet.
  4. If the containers do not match you will need to use the copypsnc command from service or escalation mode to copy the data from the good container so that the Primary and Backup2 containers match (Backup1 is the MB and we are about to replace this so it is not as important at this step). If you are unfamiliar with this process and require assistance please reference the steps for using copypsnc to fix the serial number detailed in the "How to update product serial number on systems which implement TLI functionality (Doc ID 1280913.1)" and contact the TSC if needed. How to access service mode and escalation mode on ILOM 3.x and later platforms (Doc ID 1019946.1), yes this doc can be used for ILOM 4.x as well.
  5. After the fruid data in the Primary and Backup2 containers have been confirmed to match proceed with the following steps.

2. Make sure to back up the ILOM and BIOS configurations before replacing the motherboard.

  1. See the ILOM Administrator's Guide for Configuration and Maintenance Firmware Release 4.0.x for instructions:
    1. Backup the BIOS configuration https://docs.oracle.com/cd/E81115_01/html/E86149/z40001541481533.html#scrolltoc.
    2. Backup up the ILOM configuration https://docs.oracle.com/cd/E81115_01/html/E86149/z40048b81489311.html#scrolltoc.

3. Prepare the server for service.

  1. Power off the server and disconnect the power cords from the power supplies.
  2. Extend the server to the maintenance position in the rack.
  3. Attach an anti-static wrist strap, and then to a grounded area on the rack.

4. Remove the top cover, air baffles, and all of the Fan Modules.

  1. If the release button latch is in the locked position, use a Torx T10 screwdriver to turn the release button latch clockwise to the unlocked position.
  2. Unlatch the top cover.  Lift up on the release button on top of the server cover.  Lifting the release button causes the server cover to slide toward the rear of the chassis for easy removal.
  3. Lift up and remove the top cover.
  4. Lift the air baffles up and out of the server and set them aside.
  5. Remove the fan modules.  Using your thumb and forefinger, grasp the fan module in the finger recesses located in the plastic between the fans.  Lift the fan module straight up and out of the chassis.
  6. Repeat for all 4 fan modules.

5. Remove the power supplies.

  1. If the cable management arm (CMA) is installed, disconnect both CMA left-side connectors (on the PSU side) and move the CMA out of the way.
    Caution  -  When disconnecting the CMA left-side connectors, use something to support the CMA so that it does not hang down under its own weight and stress the right-side connectors; otherwise, the CMA might be damaged. You must continue to support the CMA until you have reconnected both of the left-side connectors.
  2. Grasp the power supply handle and push the power supply latch to the left.
  3. Pull the power supply out of the chassis.  Repeat steps b-c for the second power supply.
    Caution  -  When removing the power supplies it is important to label power supplies with the slot numbers from which they were removed (PS0, PS1). This is required because the power supplies must be reinstalled into the slots from which they were removed; otherwise, the server key identity properties (KIP) data might be lost.

6. Remove the PCIe risers and PCIe cards.

  1. PCIe cards in all slots are installed on vertical risers. You must remove the associated riser to remove and replace a PCIe card. You must remove all three PCIe risers when replacing the motherboard. 
  2. See Service Manual for instructions https://docs.oracle.com/cd/E72435_01/html/E72445/gqvft.html#scrolltoc.

7. Disconnect all the cables from the motherboard.

  1. Remove the M.2 SATA cable between the PCIe slot 3 riser and the motherboard SATA connector and set it aside.
  2. Disconnect the disk backplane power cable from the motherboard by pressing in on the connector latch and then pulling out the cable connector.
  3. Disconnect the disk backplane data cable by opening the ejectors and pulling out the cable connector.
  4. Disconnect the front indicator module (FIM) cable connector by opening the ejectors and pulling out the cable connector.
  5. Disconnect the NVMe cables (if present) and carefully lift them from the center cable trough and set them aside.
  6. Remove the SAS cables and the super capacitor cable that were connected to the internal HBA card, and then carefully lift them from the left-side cable trough and set them aside.

8. Remove the motherboard from the server chassis.

  1. Using a Torx T25 screwdriver, loosen the two green captive screws that secure the motherboard bracket/handle to the server chassis.
  2. Grasp the metal bracket located just to the rear of the DIMM sockets and the finger loop, and then slide the motherboard toward the front of the server and lift it slightly to disengage it from the eight mushroom-shaped standoffs located on the server chassis under the motherboard.
  3. Lift the motherboard out of the server chassis and place it on an antistatic mat and next to the replacement motherboard.

9. Remove the following reusable components from the motherboard and install them onto the replacement motherboard.

  1. Remove the internal USB flash drive.
  2. Remove the DIMMs from the motherboard.  Note - Install the DIMMs only in the sockets (connectors) that correspond to the sockets from which they were removed. Performing a one-to-one replacement of the DIMMs significantly reduces the possibility that the DIMMs will be installed in the wrong slots. If you do not reinstall the DIMMs in the same sockets, server performance might suffer and some DIMMs might not be used by the server.
  3. Remove the processors from the failed motherboard.
    1. See Service manual for instruction to remove the processor https://docs.oracle.com/cd/E72435_01/html/E72445/grblm.html#scrolltoc
    Note - Using a 12.0 in-lbs (inch-pounds) torque driver (part number 7352217) with a Torx T30 bit is required and must be ordered with the Motherboard FRU separately.
  4. Check the replacement motherboard to ensure it has the rear locate lightpipe installed.  If not, then it may be in a separate plastic bag in the replacement motherboard package.  If there is no rear locate lightpipe installed, or in a separate bag in the packaging, then transfer the rear locate lightpipe from the faulty motherboard to the replacement board.
    1. To remove the lightpipe use twizzers or a small flat screwdriver to lift the small fragile lightpipe retainer clip, then turn the lightpipe slightly to release from the triangle fins, and pull the lightpipe off.
    2. To install the lightpipe match the rectangular retaining clip holes so they fit over the triangle fins on the lightpipe button.  Push the lightpipe onto the button until the fins latch in the holes.

10. Remove the processor socket covers from the replacement motherboard and install the processors.

  1. Grasp the processor socket cover finger grips (labeled REMOVE) and lift the socket cover up and off the processor socket.
  2. Install a processor into the socket from which you removed the processor socket cover.
    1. See Service manual for instruction to install the processor https://docs.oracle.com/cd/E72435_01/html/E72445/grbln.html#scrolltoc.
    Note - Using a 12.0 in-lbs (inch-pounds) torque driver (part number 7352217) with a Torx T30 bit is required and must be ordered with the Motherboard FRU separately.

    If a 12.0 in-lbs torque driver is not available, the CPU heatsink can be safely installed with the following guidelines. Using a torx T30 hand tool (not electric), with a simple screwdriver type handle, gently tighten each of the 4 individual screws fully before moving to the next screw in this order 1-4 until they bottom-out, at which point a sharp increase in resistance will be felt. At that point apply only a modest tightening torque by hand, such as you would apply when turning a door-knob to open a door.

  3. Repeat Step 10.a and Step 10.b to remove the second processor socket cover from the replacement motherboard and install the second processor.

11. Install the processor socket covers on the faulty motherboard.

Caution - The processor socket covers must be installed on the faulty motherboard; otherwise, damage might result to the processor sockets during handling and shipping.
  1. Align the processor socket cover over the processor socket alignment posts. Install the processor socket cover by firmly pressing down on all four corners (labeled INSTALL) on the socket cover.  You will hear an audible click when the processor socket cover is securely attached to the processor socket.
  2. Repeat Step 11.a to install the second processor socket cover on the faulty motherboard.

12. Install the motherboard into the server chassis.

  1. Before starting ensure the rear locate lightpipe is installed from step 9d.
  2. Grasp the metal bracket located to the rear of the DIMMs and the finger grasp, and then tilt the front of the motherboard up slightly and push it into the opening in the rear of the server chassis.
  3. Lower the motherboard into the server chassis and slide it to the rear until it engages the eight mushroom-shaped standoffs located on the server chassis under the motherboard.
  4. Ensure that the indicators, controls, and connectors on the rear of the motherboard fit correctly into the rear of the server chassis.
  5. Using a Torx T25 screwdriver, tighten the two green captive screws to secure the motherboard bracket/handle to the server chassis.

13. Reinstall cables on to the motherboard.

  1. Carefully reinstall the SAS cables and super capacitor cable along the left-side cable trough.
  2. Carefully reinstall the NVMe cables (if present) into the center cable trough and then reconnect the cables to the motherboard NVMe connectors.
  3. Reconnect the front indicator module (FIM) cable to the motherboard connector.
  4. Reconnect the disk backplane data cable to the motherboard connector.
  5. Reconnect the disk backplane power cable to the motherboard connector.
  6. Reinstall the M.2 SATA cable between the PCIe slot 3 riser and the motherboard SATA connector.

14. Reinstall the following reusable components.

  1. PCIe risers and attached PCIe cards.
    1. See Service Manual for instructions to install PCIE cards https://docs.oracle.com/cd/E72435_01/html/E72445/gqvft.html#scrolltoc.
  2. Power supplies.
    1. Align the replacement power supply with the empty power supply slot.
    2. Slide the power supply into the bay until it is fully seated.  You will hear an audible click when the power supply fully seats.
    Caution - When reinstalling power supplies, it is important to reinstall them into the slots from which they were removed during the motherboard removal procedure; otherwise, the server key identity properties (KIP) data might be lost.
  3. Fan modules.
    1. Position the replacement fan module into the server.  The fan modules are keyed to ensure that they are installed in the correct orientation.
    2. Press down on the fan module and apply firm pressure to fully seat the fan module.
  4. Air baffles.

15. Return the Server to operation.

  1. Install the server top cover.
    1. Place the top cover on the chassis.  Set the cover down so that it hangs over the back of the server by about 1 inch (25 mm) and the side latches align with the cutouts in the chassis.
    2. Check both sides of the chassis to ensure that the top cover is fully down and flush with the chassis.  If the cover is not fully down and flush with the chassis, slide the cover towards the back of the chassis to position the cover in the correct position.
    3. Gently slide the cover toward the front of the chassis until it latches into place with an audible click.  As you slide the cover toward the front of the server, the release button on the top of the server automatically rotates downward to the closed position.  Latch the top cover by pushing down on the button until it is flush with the cover and you hear an audible click. An audible click indicates that the cover is latched.
    4. Use a Torx T10 screwdriver to turn the release button latch counter-clockwise to the locked position.
  2. Return the server to it's normal operating position within the rack.
  3. Remove any anti-static measures that were used.
  4. Reconnect the data cables to the server and reconnect the power cords to the server power supplies.
  5. Power on server. Verify that the Power/OK indicator led lights steady on.

16. Set the system serial number/fruid data if needed.

  1. The motherboard is not the primary fruid container in this server so when it is replaced you should not normally need to fix the serial number information.
  2. login to the ILOM as root and then enter the restricted shell to check the fruid values. Follow the example below to enter restricted shell and use the showpsnc command:  
    -> set SESSION mode=restricted

    WARNING: The "Restricted Shell" account is provided solely
    to allow Services to perform diagnostic tasks.

    [(restricted_shell) x7-2-sp:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 35112724+1+1        35112724+1+1        35112724+1+1
    PSN                 1733XC300M          0000000000          1733XC300M
    Profile             0x00000000          0x00000000          0x00000000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE580000000022914 341A583DE580000000022914 341A583DE580000000022914
    [(restricted_shell) x7-2-sp:~]# exit

  3. When the motherboard is replaced the Backup1 fruid container will likely not match the Primary entry. If it does not you must enter escalation or service mode to fix it (if all three entries match this step is done).
  4. Contact the TSC to request an escalation password (service mode will work also if just the copypsnc command ends up needing to be used, if the setpsnc command is needed escalation mode is required.  setpsnc is not covered in this procedure).
  5. Provide your TSC contact the output from the following ILOM commands- "version", "show /SYS product_serial_number", and "show /SP/clock". If the product_serial_number information does not give good output then provide the showpsnc output that was seen in step b above as well.
  6. The TSC will provide an escalation password that is made up of 32 short words. Follow the example below to create a new user with the 'Service' role assigned. The Service role is required to access service or escalation modes. In the following example we will create an user named 'escuser' with the service role.
    -> cd /SP/users
    /SP/users
    -> create escuser
    Creating user...
    Enter new password: ********
    Enter new password again: ********
    Created /SP/users/escuser
    -> set escuser role=aucros
    Set 'role' to 'aucros'
    -> show escuser
    /SP/users/escuser
    Targets:
    ssh
    Properties:
    role = aucros
    password = *****
  7. Set the check_physical_presence to false and then exit from the ILOM so that you can login as the newly created user.
    -> set /SP check_physical_presence=false
    Set 'check_physical_presence' to 'false'
    -> show /SP check_physical_presence
    /SP
    Properties:
    check_physical_presence = false

    -> exit
  8. Login using the escuser login and enter escalation mode using the password that was provided by the TSC.
    x7-2-sp login: escuser
    Password:

    Oracle(R) Integrated Lights Out Manager

    Version 4.0.0.28 r121827

    Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

    Warning: HTTPS certificate is set to factory default.

    Hostname: x7-2-sp


    -> cd /SP/users/ecsuser/escalation
    -> set SESSION mode=escalation                            
    Password:**** **** **** **** **** *** *** **** **** **** **** **** **** **** **** **** *** *** **** *** **** **** **** *** **** **** *** **** *** *
    Short form password is:  NOSE HAAG MED

    [(escalation_mode) x7-2-sp:~]#
  9. Use the showpsnc command to confirm the current container values. Confirm that the primary container has a serial number (the value on the PSN line) that matches the system serial number. The system serial number can be checked by comparing to the serial number RFID tag on the front left hand side of the server. After confirming that there is a valid fruid primary use the copypsnc command to write the good data from the primary to the backup1 container on the MB. The following example shows copying from primary to the backup1, but you could also copy from backup2 if needed.
    [(escalation_mode) x7-2-sp:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 35112724+1+1        35112724+1+1        35112724+1+1
    PSN                 1733XC300M          0000000000          1733XC300M
    Profile             0x00000000          0x00000000          0x00000000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE580000000022914 341A583DE580000000022914 341A583DE580000000022914

    [(escalation_mode) x7-2-sp:~]# copypsnc Primary Backup1
    [(escalation_mode) x7-2-sp:~]# showpsnc
    x7-2-sp:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 35112724+1+1        35112724+1+1        35112724+1+1
    PSN                 1733XC300M          1733XC300M          1733XC300M
    Profile             0x00000000          0x00000000          0x00000000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE580000000022914 341A583DE580000000022914 341A583DE580000000022914

    [(escalation_mode) x7-2:~]# exit

  10. At this point if all of the fruid containers match and have the correct serial number data this step is done. If more than one of the fruid containers had non-valid entries then the copypsnc command should be used to copy over the valid data to the other container that is not valid. (ie. "copypsnc Primary Backup2" to copy primary to backup2) After confirming all fruid data is correct reset the ILOM to confirm that the fruid data persists through a reboot and remove the escalation user if needed.
    -> reset /SP
    Are you sure you want to reset /SP (y/n)? y
    Performing reset on /SP
    ..........

    ***login as the root user again and check the fruid data***

    -> set SESSION mode=restricted

    WARNING: The "Restricted Shell" account is provided solely
    to allow Services to perform diagnostic tasks.

    [(restricted_shell) x7-2-sp:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 35112724+1+1        35112724+1+1        35112724+1+1
    PSN                 1733XC300M          1733XC300M          1733XC300M
    Profile             0x00000000          0x00000000          0x00000000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE580000000022914 341A583DE580000000022914 341A583DE580000000022914

    [(restricted_shell) x7-2-sp]# exit

    -> cd /SP/users
    /SP/users
    -> delete escuser
    Are you sure you want to delete /SP/users/escuser (y/n)? y
    Deleted /SP/users/escuser
  11. If trouble is encountered during any of the steps of accessing escalation mode and fixing the fruid containers please contact the TSC for assistance.

17. Make sure to restore the ILOM and BIOS configurations after replacing the motherboard.

  1. See the ILOM Administrator's Guide for Configuration and Maintenance Firmware Release 4.0.x for instructions:
    1. Restore the BIOS configuration https://docs.oracle.com/cd/E81115_01/html/E86149/z40001541481533.html#scrolltoc.
    2. Restore the ILOM configuration https://docs.oracle.com/cd/E81115_01/html/E86149/z40048b81489452.html#scrolltoc.

 

How to verify the Motherboard is working properly.

     1.  Log into ILOM to confirm if motherboard status is working properly.

Sample

-> show /SYS/MB

/SYS/MB
   Targets:
       BIOS
       CPLD
       FM0
       FM1
       FM2
       FM3
       NET0
       NET1
       NET2
       P0
       P1
       RISER1
       RISER2
       RISER3
       T_IN_SLOT1
       T_IN_SLOT2
       T_IN_SLOT3
       T_OUT_SLOT1
       T_OUT_SLOT2
       T_OUT_SLOT3

   Properties:
       type = Motherboard
       ipmi_name = MB
       fru_description = ASM, MB, X7-2
       fru_manufacturer = Oracle Corporation
       fru_part_number = 7317636
       fru_rev_level = 00
       fru_serial_number = 465136N+1732P5005A
       fru_macaddress = 00:10:e0:c3:c7:aa
       fault_state = OK
       clear_fault_action = (none)

   Commands:
       cd
       set
       show

->



    2.  Check ILOM event log to see if any errors related to the motherboard.

-> show /SP/faultmgmt
-> show /SP/logs/event/list

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Boot up system and verify full functionality.

REFERENCE INFORMATION:

Oracle Server X7-2 Documentation:
https://docs.oracle.com/cd/E72435_01/index.html

Oracle Integrated Lights Out Manager (ILOM) 4.0 Documentation:
https://docs.oracle.com/cd/E81115_01/index.html

Otube video:
placeholder

MP4:
placeholder


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback