Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1986007.1
Update Date:2018-03-12
Keywords:

Solution Type  Technical Instruction Sure

Solution  1986007.1 :   FS System: How to Remove and Replace a Motherboard in an FS1-2 Controller  


Related Items
  • Oracle FS1-2 Flash Storage System
  •  
  • Oracle FS1-2 Cloud System
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: DISK-CAP VCAP
  •  


Instructions on how to replace motherboard in an FS1-2 Controller.

In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU

Applies to:

Oracle FS1-2 Flash Storage System - Version All Versions to All Versions [Release All Releases]
Oracle FS1-2 Cloud System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

Outline the steps required to replace an FS1-2 Controller motherboard using Guided Maintenance.

 

Solution

DISPATCH INSTRUCTIONS

- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:

Product knowledge, FS1-2 Flash Storage System

TIME ESTIMATE: 120 minutes

TASK COMPLEXITY: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

NOTE: This Action Plan requires the following 3 items.  They come as part of the motherboard FRU or can be ordered separately by the FE:

  • CPU Installation/Removal Tool - part # 7026168
  • Thermal Grease Kit - part # 350-1271 and contains:
    • Thermal Grease Syringe - 310-0065
    • Alcohol wipes - 250-1802

If you are not very familiar with servicing the Netra Server X3-2/Sun Netra X4270 M3 Server upon which the Controller is based, it is highly recommended that you look at the animation videos that detail the replace procedures covered in this CAP.  They are available at the Oracle's Sun Server X3-2 Animations.

QRC for this procedure: 

Controller Motherboard Replacement


PROBLEM OVERVIEW: 

FS1-2 Controller motherboard.

What: A Controller motherboard in an FS1-2 has failed and needs to be replaced. 

Where: A failed motherboard will have a System Alert for the affected Controller. 

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?

The Controller with the motherboard failure will likely have a warning status but depending on how severe the damage is, the entire Controller itself may be in a missing state.  The other Controller must have a normal status as this procedure will require a Controller failover so that the problem Controller can be powered off in order to replace the failed motherboard.

 

NOTE: Please review Document 1942676.1 FS System: How to Disable Call-Home to Prevent Automatic Service Request (ASR) Generation before proceeding with the procedure below. The steps contained therein are provided to allow an administrator to deactivate a particular ASR enabled array while performing maintenance or troubleshooting. This will prevent any additional Service Requests from being created unnecessarily.

 

NOTE: The FS1-2 Controller uses a quorum mechanism for Key Identity Properties (KIP).  The quorum is comprised of the motherboard, disk backplane and power distribution bus which are all encoded with the Product Serial Number (PSN) of the Controller (not the FS1-2).  At least two of these must agree on the correct PSN or the Controller will NOT boot.  So as to avoid this problem, this process has the user confirm the PSNs are in sync before attempting the replacement.  NEVER replace one of these quorum devices if the PSNs are not in sync and NEVER replace two of these items at the same time.

  

NOTE: If the FS1-2 is running software R6.2.10 or lower, it is advised that the system be upgraded to R6.2.11 or higher before attempting the replacement.  If the surviving Controller reboots for any reason after the bad Controller's motherboard is replaced but before the replacement process completes, the system will end up with 3 Controllers and be unable to recover.  Systems with only one Controller can be upgraded by checking the Ignore Hardware Status box but this will impact data access.  Even if the option to Update software without restarting the system is selected, the one surviving Controller must warm start as part of the upgrade.  This will result in the FS1-2 will not be accessible until the warm start completes (5-10 seconds).

 

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

  1. Confirm Product Serial Number Containers (PSNCs) are currently synchronized.
    1. Use ssh to access the Pilot, login/password is root/a1s2d3f$.
      1. Software versions prior to R6.1.12 had ssh enabled from the factory. For versions R6.1.12 and newer, it can be enabled using fscli (30 minutes in this example):
        # fscli system -modify -enableSsh 30
         
    2. Use ssh to access the good Controller.
      [root@pilot1 ~]# cat /etc/nodenames
      172.30.80.2 WN2008fffffffffff2 WN2008000101000000 mgmtnode
      172.30.80.128 WN508002000XXXXXX0 WN2008000101000001  <==== Controller 1
      172.30.80.129 WN508002000XXXXXX1                     <==== Controller 2
      172.30.80.3 WN2009fffffffffffa
      [root@pilot2 ~]# ssh 172.30.80.128
      WN508002000XXXXXX0 #
       
      NOTE: the example above is the output expected when you have two good working Controllers in order to show the IP addresses of both Controllers.  Depending on the severity of the Controller motherboard issue being addressed, you may only see one Controller.  For the purposes of this document, Controller 1 (172.30.80.128) will be the good Controller.  The WWN of that Controller will be the prompt and end with a 0 (Controller 2 would end in a 1).
       
      If no ILOM prompt is seen and connection times out, restart the sp_config:
      WN5080002000XXXXXX0 # /etc/init.d/sp_config restart
      sp_config Acquired the lock on /var/run/sp.lock !
      udpsvd: listening on 169.254.2.6:69, starting
      RTNETLINK answers: File exists
      udpsvd: listening on 169.254.2.10:69, starting
      sp_config Released the lock on /var/run/sp.lock !
      WN508002000XXXXXX0 #
       
    3. Use ssh to access the bad Controller's ILOM (default ILOM password is changeme):
      WN508002000XXXXXX0 # ssh 169.254.2.9
      Password:

      Oracle(R) Integrated Lights Out Manager

      Version 3.1.2.40 r93718

      Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.

      Warning: password is set to factory default.

      ->
       
      NOTE: using the IP address of 169.254.2.9 will ALWAYS connect you to the other Controller's ILOM.  In the example above, starting from Controller 1, the connection is being made to Controller 2's ILOM.
       
    4. Enter restricted session mode and run the showpsnc command.
      -> set SESSION mode=restricted

      WARNING: The "Restricted Shell" account is provided solely
      to allow Services to perform diagnostic tasks.

      [(restricted_shell) ORACLESP-1315FMXXXX:~]# showpsnc
      Primary: fruid:///SYS/DBP
      Backup 1: fruid:///SYS/MB
      Backup 2: fruid:///SYS/PDB

      Element           | Primary           | Backup1           | Backup2
      ------------------+-------------------+-------------------+-------------------
      PPN                 70893370            70893370            70893370
      PSN                 1315FMXXXX          1315FMXXXX          1315FMXXXX  <=== Product Serial Numbers must match
      Product Name        FS1 Controller      FS1 Controller      FS1 Controller
      [(restricted_shell) ORACLESP-1315FMXXXX:~]#
       
    5. If all 3 PSNs match, exit all the way out of the FS1-2 and proceed to step 2.
    6. If the Disk BackPlane 0 (DBP0) and Power Distribution Board (BPD) are the same but MotherBoard (MB) is different, it is safe to proceed to step 2 since the motherboard will be replaced.
    7. If any other condition exists, STOP!! and re-engage the Oracle Support for steps to correct before proceeding to replace the failed motherboard.

  2. Prepare FS1-2 for service procedure.
    1. Disable Call-Home to prevent spurious alerts (see Document 1942676.1 FS System: How to Disable Call-Home to Prevent Automatic Service Request (ASR) Generation).
    2. Use ESD precautions.
    3. Log into Oracle FS System Manager to access Guided Maintenance:
      1. Select System tab
      2. In the navigation tree, expand Hardware and select Controllers
      3. In the main window, right click on the Controller with the failed motherboard and select View.
      4. In the pop-up View window, select Motherboard Assembly followed by the Replace Component button.
      5. Follow the steps in Guided Maintenance to identify and place the Controller offline.
        NOTE:  At this time, if the Controller motherboard issue causes the Controller itself not to boot, Guided Maintenance will not be available for the motherboard replacement.  In these cases, the failover will have already taken place and it is safe to proceed.
         
  3. Access the Controller motherboard.
    1. Deploy the anti-tip legs in the front of the rack.
    2. Slide the Controller into the service position.
    3. Unplug both power cords.
    4. Loosen the two captive screws in the rear of the top cover.
    5. Slide the top cover back and remove.
    6. Disengage both power supplies from motherboard.
      NOTE: It is recommended to unseat the power supplies rather than completely remove them.
       
    7. Lift Drive Compartment into service position.
      1. Unplug the Drive signal NVDIMM cables connected to the Drive Compartment - suggest that they be labeled to make reconnections easier.
      2. Unplug the Control Unit ID (CID) cable on the front right side of the Controller.
      3. Loosen the four captive screws that secure the Drive Compartment to the Controller chassis and tilt it up into the service position.
    8. Remove air duct.
    9. Remove fan tray.
      1. Remove all 5 fan modules.
      2. Remove fan tray.
    10. Remove risers.
      1. Loosen 2 captive screws in each riser.
      2. Lift the riser up and clear of the Controller chassis.
      NOTE: it is not necessary to remove the PCIe cards from the risers nor is it required to unplug any cables connected to the PCIe cards in those risers.  This will make reassembly easier.
       
    11. Disconnect PDB.
      1. Loosen the single captive screw that secures the PDB duct to the Controller chassis and lift it free.  Be sure to guide the CUID cable out of the slot in the PDB duct.
      2. Remove the PDB signal cable from the PDB.
      3. Remove the 4 screws that secure the PDB GND and +12V buses to the motherboard.

    12. Remove remaining cable connections from the motherboard
      1. Drive backplane power and monitoring cable (left side).
      2. LED board/Alarm cable (left side).
      3. Drive signal cable (right side).
      4. CUID cable (right side).

  4. Remove failed motherboard.
    1. Loosen the single captive screw at the front of the motherboard.
    2. Using the two green plastic handles, gently push the motherboard towards the rear until it is free of the chassis.
    3. Lift the motherboard out of the server's chassis and place it on an anti-static mat.  

  5. Install replacement motherboard.
    1. Using the 2 green plastic handles, insert the replacement motherboard from the rear and slide it forward to it's original position.  Take care to align the four holes that connect it to the PDB as well as the alignment tabs in the rear of the chassis.
    2. Secure motherboard by tightening captive screw in the front of the motherboard.
    3. Secure the  PDB buses to the mother board using the 4 screws previously removed.
    4. Reconnect the PDB signal cable to the PDB.
    5. Install the PDB duct in place and secure to chassis with captive screw.  Be sure to route the CUID cable through the slot on the top.


  6. Transfer DIMMs NVDIMMs and CPUs with their heatsinks from old motherboard to new.
    1. CPUs and heatsinks.
      1. While gently pushing down on the heatsink, loosen 4 screws 1.5 turns each using a crossing pattern until all the screws are free.
      2. Gently rotate the heatsink back and forth slightly while pulling up to free it from the CPU.
      3. Use a supplied alcohol pad, clean the heatsink bottom and CPU top.  Be VERY careful not to damage the CPU pins or socket by applying too much pressure.  Avoid spreading the thermal grease to other surfaces .
      4. You must use a CPU Installation/Removal tool 7026168 to extract the CPU
      5. Viewing from the front, disengage the right CPU Cover retaining levers by pushing down on them then pushing them away from the CPU.  Repeat for left lever.  Be sure to move the levers all the way back to allow better access to the CPU cover and CPU itself.
      6. Open the CPU cover towards the right side to expose the CPU.
      7. Press the round button at the top center of the tool to unlock it.  Then place it over the CPU using the green arrow to align it properly.
      8. Lock the tool to the CPU by pushing the tab next to the center button on the top of the tool.  Once CPU is locked to the tool lift the tool straight up.
      9. Position the CPU in the replacement motherboard in the same location it was removed from and carefully align it with the CPU socket on the motherboard.
      10. Press the center round button on top of the tool to unlock the CPU.  Do NOT press on the CPU itself.
      11. Lift the CPU tool free and clear of the Controller.
      12. Lower the CPU Cover back into place.
      13. Re-engage the left lever back into place and then the right lever.
      14. Using the syringe it comes in, apply ~0.1 ml of thermal grease in the center of the top of the CPU.  Do NOT spread it around.
      15. Verify the underside of the heatsink is clean and if not clean it with an alcohol pad.
      16. Carefully position the heatsink over the CPU by aligning the captive mounting screws to their holes in the motherboard.
      17. Once the heatsink has made contact with the thermal grease, keep any sideways movement to a minimum.
      18. Using a crossing pattern, tighten each screw 0.5 turns until all four are securely fastened.
      19. Repeat for second CPU.

    2. DIMMs/NVDIMMs
      1. One at a time, remove a DIMM/NVDIMM from the failed motherboard and install it in the same slot as the replacement motherboard.
      2. To ensure proper cooling, also transfer the plastic fillers in the unused DIMM/NVDIMM slots from the failed motherboard to it's replacement.
      3. Route the cables from the NVDIMMs to the left so that they will clear the air duct when it is reinstalled.

  7. Reassemble remaining Controller components.
    1. Reconnect all the remaining cables to their motherboard connections that were removed in step 3-l.
    2. Reinstall the risers.
      1. Each riser has 1 alignment hole, 2 screws and a PCIe connector.  Align the PCIe connector to the motherboard first and then adjust for the others before pushing the PCIe connector into the PCIe slot.
      2. Secure in place using the two captive screws.
    3. Reinstall fan tray and Fans.  Be sure to properly route the LED board and drive backplane power & monitoring cables (left side) and the CUID, drive signal and PDB cables (right side).
    4. Reinstall the air duct
    5. Return Drive compartment to it's normal position.
      1. Tilt the Drive Compartment back into it's normal position being careful not to pinch any cables.
      2. Secure in place using the 4 captive screws.
      3. Reconnect (left to right) the drive backplane power & monitoring cable from the motherboard, the cables from NVDIMMs to ESMs, the SAS cable and the CUID cable.
    6. Reseat both power supplies.
    7. Close the top cover.
    8. Tighten the two captive screws in the rear of the top cover.

  8. Return Controller to FS1-2 System.
    1. Plug in both power cords.
    2. Return the Controller to the rack position.
    3. Return the anti-tip legs to their normal position.
    4. Within about 3 minutes, the SP LED should be solid green and the OK LED should blink about every second indicating the Controller is booting.  If the OK LED indicates it is in Standby mode (blinking about every 3-5 seconds), push the power button located between the OK and SP LEDs to manually boot the Controller.
    5. Once the Controller has completed booting, confirm the proper motherboard BIOS:
      1. ssh to the active Pilot and run the ver command with the -v option:
        [root@pilot1 ~]# ver -v

        pilot1(Active):
        Pilot Apps Build:        060215-052400
        Pilot OS version:        060215-052200
        ...
        Controllers:
        172.30.80.128 :  OS version: 2060-00004-060215-052450
        Kernel version: 3.0.16-200.29.3.el6uek-axnp.ndebug.060215.050300
        BIOS version: American Megatrends Inc. 21000227 03/11/2016
        SP Firmware version: 3.2.9.21 r117708
        CPLD version: FW:2.5

        172.30.80.129 :  OS version: 2060-00004-060215-052450
        Kernel version: 3.0.16-200.29.3.el6uek-axnp.ndebug.060215.050300
        BIOS version: American Megatrends Inc. 21000227 03/11/2016
        SP Firmware version: 3.2.9.21 r117708
        CPLD version: FW:2.5

        [root@pilot1 ~]#
         
      2. If the BIOS or SP Firmware versions are different, please refer to KM Document 1939732.1 FS System: How to Access the Internal Service Guide, to upgrade the BIOS.
        NOTE: For FS1-2 systems running R6.1.18 or R6.1.19, please see KM Document 2064481.1 FS System: How to Load Missing Controller BIOS Files After Motherboard Replacement.
         
      3. Confirm that the HOST_AUTO_POWER_ON ILOM setting is enabled, and if not, enable it manually:
        [root@pilot1 ~]# ssh root@172.30.80.129 ipmitool -H 169.254.2.9 -U root -P changeme -I lanplus sunoem cli "'show /SP/policy HOST_AUTO_POWER_ON'"
        Connected. Use ^D to exit.
        -> show /SP/policy HOST_AUTO_POWER_ON

          /SP/policy
            Properties:
                HOST_AUTO_POWER_ON = disabled


        -> Session closed
        Disconnected
        [root@pilot1 ~]# ssh root@172.30.80.129 ipmitool -H 169.254.2.9 -U root -P changeme -I lanplus sunoem cli "'set /SP/policy HOST_AUTO_POWER_ON=enabled'"
        Connected. Use ^D to exit.
        -> set /SP/policy HOST_AUTO_POWER_ON=enabled
        Set 'HOST_AUTO_POWER_ON' to 'enabled'

        -> Session closed
        Disconnected
        [root@pilot1 ~]# ssh root@172.30.80.129 ipmitool -H 169.254.2.9 -U root -P changeme -I lanplus sunoem cli "'show /SP/policy HOST_AUTO_POWER_ON'"
        Connected. Use ^D to exit.
        -> show /SP/policy HOST_AUTO_POWER_ON

          /SP/policy
            Properties:
                HOST_AUTO_POWER_ON = enabled


        -> Session closed
        Disconnected
        [root@pilot1 ~]#
         
    6. Repeat step 1 to verify that the PSN of the replacement motherboard is synchronized to the other two quorum devices.
      NOTE: If, upon the Controller motherboard replacement, 3 Controllers are observed in the GUI or CLI, one of the following issues may have been encountered:
      1. KM Document 2017890.1 FS System: Improper Controller Motherboard Replacement in an FS1-2 Controller Results in 3rd Controller.
      2. Bugs 24524311 or 24501157
        Contact Oracle Support for further assistance.  Recovery may require a disruptive upgrade to R6.2.11.
       
    7. When finished, re-enable Call-Home.


OBTAIN CUSTOMER ACCEPTANCE


WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

 Confirm the System Alert associated previously is gone and the FS1-2 status is normal/green.

NOTE: Because the Controller must cold start, it may take as long as 20 minutes for the boot process to complete and the Controller to return to a normal status.


REFERENCE INFORMATION:

 From the Oracle Help Center: http://docs.oracle.com/en/storage/#fla select the Oracle Flash System Documentation Library for more information.

 

 

 

 

References

<NOTE:1939732.1> - FS System: How to access Internal Field Service Guides
<NOTE:2064481.1> - FS System: How to Load Missing Controller BIOS Files After Motherboard Replacement

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback