Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2347200.1
Update Date:2018-05-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  2347200.1 :   How to Replace an Oracle Server X7-8 Memory DIMM  


Related Items
  • Oracle Server X7-8
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution


Applies to:

Oracle Server X7-8 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replace an Oracle Server X7-8 Memory DIMM.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?:
No special skills required, Customer Replaceable Unit (CRU) procedure

TIME ESTIMATE: 60 minutes

TASK COMPLEXITY: 0

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Oracle Server X7-8 Memory DIMM needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?

Note - Oracle Server X7-8 can be configured as either a one 4-socket server, two independent 4-socket servers, or one 8-socket server. Depending on your Customers configuration will depend on if you need to bring the whole system down or not:
  • Oracle Server X7-8's configured with either a single 4-socket server or a single 8-socket server will require the system OS to be shutdown and the DC voltage powered off ("-> stop /System" ["bring down into standby mode"] on SMOD0/SMOD1) for CMOD, CPU, or DIMM replacement.
  • Oracle Server X7-8's configured as two 4-socket servers will require only the system OS shutdown on the 4-socket server (SMOD0 or SMOD1) needing their CMOD replaced and the DC powered off ("-> stop /System" ["bring down into standby mode"] on that SMOD) for CMOD, CPU, or DIMM replacement.
    A server modules (SMOD0 or SMOD1) DC can be powered off and the CMOD's (related to that SMOD) safely removed from the chassis without affecting the OS on the other operating SMOD and it's associated CMOD(s).

A data backup is not a prerequisite but is a wise precaution.

If the server module (SMOD) with effective CMOD, CPU, or DIMM is still up and functioning, Customer should perform an orderly and graceful shutdown of applications and OS on the server module (SMOD) being replaced. This includes performing a "-> stop /System" of this SMOD to bring it down into "standby mode". In Standby power mode, the Power/OK LED on the front panel of the server module will begin flashing.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Caution - Loss of service or component damage. Do not replace any components except for CMODs and internal CMOD subcomponents while the server is in warm service mode.
Caution - Data Loss. Do not remove more than one fan module from a column while the system is in Main power mode. This action removes power from the CMODs and causes an immediate shutdown. On an eight-CMOD system, this applies to all fan modules. On a four-CMOD system, this applies to the fan modules in the left-hand fan frame.

This procedure describes how to prepare the server for warm service. Use warm service to remove and replace CMOD, DIMM, and processor components without accessing the server back panel to disconnect AC power cords or shutting down Oracle ILOM.

When Oracle ILOM detects that two fan modules in a single cooling zone (a vertical column) have been removed, the SP removes power from the CMODs, allowing you to service CMODs and their subcomponents without removing the power cords. Oracle ILOM remains available in warm service mode.

Preparing a CPU Module (CMOD) for Removal

Note - This procedure can also be completed as a cold service procedure.
Caution - Data loss. Removing the power cords when the server is in Main power mode results in an immediate shut down of the server. Do not remove the power cord if the server is in Main power mode. Power off the server to Standby power mode first.

1. Log in to the SP Oracle ILOM CLI.

    Log in as a user with root or administrator privileges. For example, open an SSH session, and at the command line type:

           ssh root@sp_ip_address
           where sp_ip_address is the IP address of the server module SP.
           Type the password.

2. Power off the server gracefully to Standby power mode.
    At the prompt, type the following command:

-> stop /System


Removing the CPU Module (CMOD) that contains the Failed DIMM

1. Identify which group of fan modules (left or right) to remove to access the CMOD.
    For CMODs 0-3 access remove the left FM group. For CMODs 4-7 access remove the right FM group. In a four-CMOD configured server, remove the left FM group.

2. Remove the fan modules.

        a. To unlock a fan module, push in the green release button.

Caution - Data Loss. Do not remove more than one fan module from a column while the system is in Main power mode. This action removes power from the CMODs and causes an immediate shutdown. On an eight-CMOD system, this applies to all fan modules. On a four-CMOD system, this applies to the fan modules in the left-side fan frame.

b. To remove the fan module, pull it out of the slot.
    When the fan is removed from the slot, a hinged air vane drops down to close the slot. The vane maintains system cooling and prevents a disruption of server airflow during hot service.

3. Remove a fan frame.

        a. To remove the fan frame, hold it by the green labels at the center of the frame and pull it out of the server.
           The center of the fan frame is marked with green labels. The labels indicate where to hold the frame when you want to install or remove it.

4. Identify the CMOD.
    If you are removing a CMOD in a failed state, the lit fault indicator for the CMOD on the FIM shows you the CMOD number and the group to which it belongs.

5. To unlock the CMOD, squeeze together the green tabs on the end of the CMOD lever.

6. To disconnect the CMOD from the connector on the midplane, rotate the CMOD lever down and away from the CMOD.

Caution - Pinch point. Keep your fingers clear of the underside of the lever.

    The lever disconnects the CMOD from the midplane and its DPCC.

7. Use the lever to slide the CMOD partially out of the server until you can grab it with two hands.

Caution - The CMOD is heavy. Be prepared to hold it firmly when it is clear of the slot.

8. To remove the CMOD from the chassis, slide the CMOD completely out of the server.

9. Remove the CMOD top cover.
    To remove the CMOD top cover, push the release button, and slide the CMOD cover toward the back of the CMOD.

10. Lift the CMOD top cover away from the CMOD.

Caution - These procedures require that you handle components that are sensitive to electrostatic discharge. This sensitivity can cause the components to fail. To avoid damage, ensure that you follow antistatic practices as described in Electrostatic Discharge and Static Prevention Measures.

 

Identifying and Removing a DIMM

DIMM Physical Layout
DIMM Population Rules
DIMM Population Scenarios
DIMM Operating Speeds
DIMM Rank Classification Labels

1. Identify and note the location of the faulty DDR4 DIMM by pressing the Fault Remind button on the motherboard.
    Faulty DIMMs are identified with a corresponding amber LED on the motherboard.

  • If the DIMM Fault LED is off, then the DIMM is operating properly.
  • If the DIMM Fault LED is on (amber), then the DIMM is faulty and should be replaced.

    To locate the failed DIMM, press and hold the Fault Remind button.

    This procedure uses the DIMM fault remind test circuit in the CMOD to identify the failed DIMM. The circuit is a charged, time-limited circuit. Once power is removed from the server you have 10 minutes to use the circuit for troubleshooting.

2. Verify that the green Fault Remind Power LED indicator is lit.
    The Charge Status indicator lights if the Fault Remind circuit is operational.

     The Fault Remind circuit remains charged for about 10 minutes after power is removed from the CMOD, either by disconnecting power from the server, or by removing the CMOD from the chassis. When you press the Fault Remind button, the Charge Status indicator lights if there is enough power to use the fault remind circuit. Otherwise it remains unlit.

3. With the Fault Remind button pressed, look for a lit DIMM fault LED indicator.
    Twelve DIMM fault LED indicators are located next to the DIMM slots.

4. Rotate both DIMM socket ejectors outward as far as they will go.
    The DIMM is partially ejected from the socket. This action extracts the DIMM from its connector.

5. Remove the faulty DIMM from the CMOD.
    Carefully lift the DIMM straight up to remove it from the socket.

Note - Replace each faulty DIMM with either another DIMM of the same rank size (quad-rank or dual-rank) or leave the socket empty.

 

Installing a DIMM

Note - Use this procedure to install DIMMs for a memory upgrade or a configuration change, or as part of a DIMM reset (removal and installation).

1. Unpack the replacement DDR4 DIMM and place it on an antistatic mat.

2. Ensure that the replacement DDR4 DIMM matches the size of the DIMM it is replacing.
    You must not replace a dual-rank DIMM with a quad-rank DIMM and vice versa. If you violate this rule, the performance of the server might be adversely affected. For DIMM socket population rules, see DIMM Population Rules.

3. Locate the DIMM slot.

3. Install a DIMM.

      a. Ensure that the ejector tabs are in the open position.
      b. Align the notch in the replacement DIMM with the connector key in the connector socket.
          The notch ensures that the DIMM is oriented correctly.
      c. To align the DIMM in the slot, ensure that the notch on the DIMM connector lines up with the key in the DIMM slot.
      d. Push the DDR4 DIMM into the connector socket until the ejector tabs lock the DIMM in place.
          If the DIMM does not easily seat into the connector socket, verify that the notch in the DIMM is aligned with the connector key in the connector socket. If the notch is not aligned, damage to the DIMM might occur.

4. To install the DIMM in the slot, simultaneously press down on both edges of the DIMM.
    This action forces the DIMM into the slot and causes the two slot levers to rise and lock the DIMM in the slot.

5. Verify that the DIMM sits evenly in the slot and is locked.
    Both levers should be in their fully closed and vertical position. In this position the levers lock the DIMM in the slot.

6. Install the CMOD cover.


Installing a CPU Module (CMOD) and Returning it to Operation

1. Ensure that the CMOD lever is in the fully-open position.

      a. Squeeze together the green tabs on the end of the lever.
      b. Rotate the lever down and away from the CMOD.

2. Position the CMOD in the slot.

Caution - The CMOD is heavy. Be prepared to hold it firmly until it is securely supported in its slot.

3. On the front-facing side, ensure that the hinge for the lever is at the bottom.

4. Slide the CMOD into the slot until it stops.
    In this position, the pawl at the lever hinge is aligned with the slot in the server.

5. To install the CMOD, rotate the lever up until it locks into place and is flush with the front of the CMOD.

Caution - Pinch point. When operating the lever, keep your fingers clear of the back side and hinged end of the lever.

6. Install the fan frame.

     a. Position the fan frame at the opening in the front of the server with the air vane hinges at the top.

Note - The center of the fan frame is marked with green labels. The labels indicate where to grab the frame when you want to install or remove it.

     b. Slide the fan fame into the server until it stops and is flush with the front of the server.

7. Install the four fan modules.

     a. Align the fan module with the slot.
         Access this component directly from the front of the server. Ensure the handle is positioned at the bottom of the slot with the green release button to the left and that the air vane for the slot swings freely.

Caution - Component damage. Do not apply excessive force when sliding the fan module into the server. Ensure that the connector on the CMOD and the connector on the fan module are aligned correctly.

     b. To install the fan module, slide it into the slot until it stops and gently push it inward until the fan module locks into place.
         The locking action is accompanied by a click sound.
     c. Verify that the green Fan OK indicator on the fan module lights and is steady on.

8. Ensure that all external front and back components are fully installed.

9. Ensure that all cables are connected to the back of the server.

10. Connect all AC power cables to their inlets on the back of the server and verify that they are locked.
     The retaining clips lock the power cables and prevent accidental to the supply outlet.

11. If necessary, connect the other end of the AC power cables to the supply outlet.

12. Ensure that the server is powering into Standby power mode.
     When AC power is applied to the server power inlets, the server boots into Standby power mode.

13. Power on the server module.

-> start /System

14. Log out of Oracle ILOM.

15. Verify that the Power/OK indicator led lights steady on. 


How to verify the DIMM is working properly

Log in to the ILOM CLI and launch a Fault Management Shell session

-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y
faultmgmtsp>

use the "fmadm faulty -a" command to list out all active faulty components

faultmgmtsp> fmadm faulty -a

If the DIMM that was just replaced is still listed as a fault then indicate that the DIMM has been replaced by using the command "fmadm replaced <fru|cru|uuid>" for example:

faultmgmtsp> fmadm replaced /SYS/MB/P0/D2

confirm that the faults are cleared and then exit out of the fault management shell

faultmgmtsp> fmadm faulty -a
No faults found
faultmgmtsp> exit
->

Enter the following command to check status is normal status:

-> show /System/Memory/DIMMs/DIMM_x
note: the "x" represents the DIMM number of the DIMM replaced

Example:

->show /System/Memory/DIMMs/DIMM_0
/System/Memory/DIMMs/DIMM_0
Targets:
Properties:
health = OK
health_details = -
part_number = 001-0003-01,M393B2G70DB0-YK0
serial_number = 00CE011412225F9C48
location = P0/D0 (CPU 0 DIMM 0)
manufacturer = Samsung
memory_size = 16 GB

 Check if any error output from event log

-> show /SP/logs/event/list

 

PARTS NOTE:

7330697 [C] 16GB DDR4-2666 Registered DIMM
7330698 [C] 32GB DDR4-2666 Registered DIMM
7330699 [C] 64GB DDR4-2666 Load Reduced DIMM

REFERENCE INFORMATION:

Sun Server X7-8 Service Manual
https://docs.oracle.com/cd/E71925_01/html/E71936/index.html

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback