How to replace memory (DIMMs) in VSM6 T4-2 server:ATR:1539061.1:3

Asset ID:	1-71-1539061.1
Update Date:	2018-02-11
Keywords:

Solution Type Technical Instruction Sure

Solution 1539061.1 : How to replace memory (DIMMs) in VSM6 T4-2 server:ATR:1539061.1:3

Applies to:

StorageTek Virtual Storage Manager System 6 (VSM6) - Version All Versions and later
Oracle Solaris on SPARC (64-bit)

Goal

How to Replace a SPARC T4-2 DIMM on VSM6 server

Solution

DISPATCH INSTRUCTIONS
   WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: VSM6 trained, T4 server, Solaris 11
   TIME ESTIMATE: 60 minutes
   TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
   PROBLEM OVERVIEW: How to Replace a SPARC T4-2 DIMM on VSM6 server

   WHAT SKILLS DOES THE ENGINEER NEED:(IS A SITE ENGINEER AVAILABLE?) SPARC T4-2 Server product training and VSM6 product training.

   WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

       Prepare affected VSM6 server for servicing:

1. Before the VSM6 node can be serviced, the customer must stop all activity to the one node that must be repaired. All Host channel activity and single-path RTD activity to the node must be stopped and taken offline so that the node can be powered off. If an RTD is configured for dual-path communication through both nodes, it does not have to be taken offline since communication with the other node will not be interrupted. It is also recommended to stop all VSM IP communication while the node is shut down.

2. Log into the node that does NOT have the failed DIMM.

3. Move resources to the other node, put into maintenance mode, and shut down the node with the failed DIMM.
$ cli "shutdown node -maint -node <1 or 2>"

Note: <1 or 2> is the number of the node with the failed DIMM. (Example: If the failed DIMM is on node 1, enter the command from node 2: cli "shutdown node -maint -node 1). Some older docs reference the -force option -- Do NOT use the -force parameter unless instructed by TSC or engineering.

4. Attach an antistatic wrist strap and verify the node is completely shutdown before removing power.

5. Unplug power cords from the power supplies.

6. Extend the server to maintenance position.
Note: Be careful when pulling out server, as cables in CMA may bind

7. Remove the top cover.

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

CPU, Memory Riser, and DIMM Physical Layout
The name of each DIMM is based on the name of the memory riser and the DIMM slot location on the memory riser.

The following reference is an example of the full FRU name for a single DIMM: /SYS/MB/CMP1/MR0/BOB0/CH0/D0

Each memory riser slot in the server chassis must be filled with either a memory riser or filler panel, and each memory riser must be filled with DIMMs and/or DIMM filler panels. For example, empty CPU sockets (P1 and P3) must have associated memory riser slots populated with two riser filler panels per CPU.

A) Locate and Remove a Memory Riser and DIMM.

Caution - These procedures require that you handle components that are sensitive to ESD. This sensitivity can cause the component to fail. To avoid damage, ensure that you follow antistatic practices as described in ESD Measures.

1. Identify the memory riser with the faulty DIMM by pressing the Fault Remind button located on the air divider.
            - If the memory riser Service Action Required LED is off, all DIMMs on this riser are operating properly.
          - If the memory riser Service Action Required LED is on (amber), one or more of the DIMMs installed on this riser is faulty or misconfigured
2. Lift the memory riser that has its Service Action Required LED lit straight up to remove the memory riser from the memory module socket.
3. Identify the faulty or misconfigured DIMM(s) by pressing the Remind button on the memory riser.
4. On DIMMs that display an amber Fault LED, remove the DIMMs.
            a. Press down both DIMM slot ejector tabs as far as they will go.
            b. Carefully lift the DIMM straight up.

Caution - Whenever you remove a memory riser or DIMM, you should replace it with another memory riser or a DIMM or a filler panel; otherwise, the server might overheat due to improper airflow.

B) Install a DIMM and a Memory Riser

1. Attach an antistatic wrist wrap and unpack the DIMMs and place them on an antistatic mat.
2. Install the DIMMs into the memory riser by performing the following tasks.
           a. Ensure that the ejector levers at both ends of the memory module slot are in a fully open position.
           b. Align each DIMM with the empty connector slot, aligning the notch in the DIMM with the key in the connector.
              The notch ensures that the DIMM is oriented correctly.
           c. Gently press the DIMM into the slot until the ejector tabs lock the DIMM in place.
              Repeat these steps until each DIMM has been installed.
3. Push the memory riser module into the associated CPU memory riser slot until the riser module locks in place.
4. Return the server to operation:
           a. Install the top cover.
           b. Return the server to the normal rack position.
           c. Reinstall the power cords to the power supplies and power on the server.

d. The VSM server will boot the server but not join the VSM cluster because it was put into maintenance mode when it was shut down.
5. The following link to the T4-2 Service Guide, can be used as a guideline for verifying Replacement DIMMs: http://docs.oracle.com/cd/E23075_01/html/E23076/z40012f81428990.html#scrolltoc

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FE/CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

     Boot system and monitor boot sequence for errors. Test functionality of system:
     1. Run the Solaris "fmadm faulty" and SP/ILOM "show faulty" command (if only ILOM is supported run "showfaults -v" command) to verify that the fault has been cleared.
     2. Perform one of the following tasks based on your verification results:
          * If the previous steps did not clear the fault, refer to doc 1004229.1 for information about the tools and methods you can use to diagnose and clear component faults.
          * If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required
       3. Reboot the VSM Node (to get out of maint mode) with command:
   # sudo uadmin 2 1
       4. Check that the node came up correctly in cluster with command:
           # /usr/cluster/bin/scstat -g

PARTS NOTE:
   VSM6 refer to: https://support.us.oracle.com/handbook_internal/Systems/VSM6/component.memory.html

REFERENCE INFORMATION:

   SPARC T4-2 Service Manual
   http://docs.oracle.com/cd/E23075_01/pdf/E23078.pdf

   Oracle Integrated Lights Out Manager (ILOM) 3.0 Maintenance and Diagnostics - CLI and Web Guide
   http://download.oracle.com/docs/cd/E19860-01/E21449-01/E21449-01.pdf

   See also Oracle Integrated Lights Out Manager (ILOM) 3.0 Daily Management - CLI Procedures Guide
   http://download.oracle.com/docs/cd/E19860-01/E21445-01/E21445-01.pdf

VSM6 Install, Configuration and Service Guide
http://download-adc.oracle.com/archive/cd_ns/E38194_01/

Attachments

This solution has no attachment