Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1548740.1
Update Date:2017-03-28
Keywords:

Solution Type  Technical Instruction Sure

Solution  1548740.1 :   How to Replace a SPARC T3-4 or T4-4 PCI Express Module:ATR:1548740.1:0  


Related Items
  • SPARC T3-4
  •  
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • SPARC T4-4
  •  
  • SPARC SuperCluster T4-4
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  


How to remove and replace a SPARC T3-4 or T4-4 PCI Express Module

In this Document
Goal
Solution
References


Applies to:

SPARC T3-4 - Version All Versions and later
SPARC T4-4 - Version All Versions and later
SPARC SuperCluster T4-4 - Version All Versions and later
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
SPARC SuperCluster T4-4 Full Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

 Remove and replace a SPARC T3-4 or T4-4 PCI Express Module.

*********************************************************************
To report errors or request improvements on this procedure,
please Add a Comment on Doc ID: 1548740.1
*********************************************************************

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS ARE REQUIRED:

No special skills required, Customer Replaceable Unit (CRU) procedure

Time Estimate: 30 minutes
Task Complexity: 0

REMOVAL/REPLACEMENT INSTRUCTIONS:

PROBLEM OVERVIEW: Replace a PCI Express Module

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

The express module is a hot-service component.

Special Instructions: The SuperCluster T4-4 does not support hot service replacement of PCI expansion cards. Servicing a failed PCIe card will require the impacted server node to be shutdown. These components are FRUs for SuperCluster T4-4 and require an onsite FE engagement.
DAMAGE ALERT: Perform a visual inspection of the replacement part to make sure that there are no damaged components, connectors, bent pins, damaged packages during shipping, etc). If the part is damaged, don't install it into the system, order a new part. Handle with caution and package carefully the return part to avoid any damages during shipping.
TSE Special Instructions: When replacing Infiniband HCA, please make customer aware of this procedure. Post this doc in customer visible note in SR.
Updating IB partitions after replacing an Infiniband HCA in any nodes within IB network - steps to do after replacing HCA (Doc ID 1985159.1)
Note for OVM (LDOM):To remove a PCIe card that is assigned to an I/O domain, first remove the device from the I/O domain. Then, add the device to the root domain before you physically remove the device from the system. These steps enable you to avoid a configuration that is unsupported by the Direct I/O or SR-IOV feature. For more information about making hardware changes to an I/O domain, refer to the Oracle VM for SPARC documentation. Also, removing the PCI module dynamically (DR) is not supported, so 'cfgadm' will not work on Physical I/O devices bound to LDOM configuration, reference the OVM Administration Guide (DR).

Note for QLogic HBA: There is a known issue with some QLogic HBA's failing during cfgadm or hotswap operation, reference Doc ID 1527488.1

 
WHAT ACTIONS ARE REQUIRED:

Install a PCI Express Module

1. Take the necessary ESD precautions.

2. Locate the express module at the rear of the server that you want to remove.

3. Determine if you are removing an express module from a running server.

- If you are removing an express module from a server that is running (if you are hot-swapping the express module), go to Step 4.
- If you are removing an express module from a powered-down server, go to Step 5.

4. Determine if the express module has an Attention button.

If the express module has an Attention button, you can use that button to hot-swap the card from the server. If not, you can use the CLI to hot-swap the express module.

 - If the express module has an Attention button, press the button to bring the express module offline. The express module's Power OK LED should go off, indicating that the module is ready to be removed. Go to Step 5.
 - If the express module does not have an Attention button, bring the module offline using the CLI:
        At the Oracle Solaris prompt, type the cfgadm -al command to list all devices in the device tree, including express modules:

        # cfgadm -al

        This command lists dynamically reconfigurable hardware resources and shows their operational status. In this case, look for the status of the drive you plan to remove. This information is listed in the Occupant column.

        Example:

        Ap_id                       Type         Receptacle   Occupant        Condition
        PCI-EM0                     sas/hp       connected    configured      ok
        PCI-EM1                     sas/hp       connected    configured      ok
        ...

        Disconnect the express module using the cfgadm -c disconnect command.

        Example:

        # cfgadm -c disconnect Ap-id

        Replace Ap-id with the ID of the express module that you want to remove.
        Verify that the express module's green Power LED is off.

NOTE: the cfgadm -al command may take a long time to display output, this command (or Attention button if available) may also refuse to work for recognizing new PCI card. In this case you can refer to the workaround using "hotplug" command, as shown in the service manual (only works for PCIe devices).

5. Disconnect any cables connected to the card.

Tip - Label the cables to ensure proper connection to the replacement card.

6. Pull the express module handle down to disengage the card from the card cage.

7. Remove the express module from the server.

Install a PCI Express Module

1. Take the necessary ESD precautions.

2. Insert the express module into the empty express module slot.

3. Close the express module latch to lock the card in place.

4. Reconnect the cables to the express module, if necessary.

5. Determine if you replaced or installed an express module in a running server.

 - If you replaced or installed an express module in a server that is running (if you hot-swapped the express module), go to Step 6.
 - If you replaced or installed an express module in a powered-down server, power on the server using the instructions provided in Returning the Server to Operation, then go to Step 7.

6. Determine if the express module has an Attention button.

 - If the express module has an Attention button, you can use that button to bring the express card online. If not, you can use the CLI to bring the express module online.

 - If the express module has an Attention button, press the button to bring the express module online. The express module's Power OK LED should go on, indicating that the module is now online. Go to Step 7.

 - If the express module does not have an Attention button, bring the module online using the CLI:
        At the Oracle Solaris prompt, type the cfgadm -al command to list all devices in the device tree, including the express modules:

        # cfgadm -al

        This command helps you identify the express module you installed. For example:

        Ap_id                       Type         Receptacle   Occupant        Condition
        PCI-EM0                     sas/hp       connected    configured      ok
        PCI-EM1                     unknown      empty        unconfigured    unknown
        ...

        Connect the express module using the cfgadm -c connect command.

        Example:

        # cfgadm -c connect Ap_id

        Replace Ap_id with the ID of the express module that you want to connect.
        Verify that the green Power LED is lit on the express module that you installed.
        At the Oracle Solaris prompt, type the cfgadm -al command to list all drives in the device tree:

        # cfgadm -al

        The replacement express module is now listed as connected. For example:

        Ap_id                       Type         Receptacle   Occupant        Condition
        PCI-EM0                     sas/hp       connected    configured      ok
        PCI-EM1                     sas/hp       connected    configured      ok
        ...

7. Verify the express module functionality.

NOTE: the cfgadm -al command may take a long time to display output, this command (or Attention button if available) may also refuse to work for recognizing new PCI card. In this case, refer to the workaround using "hotplug" command, as shown in the service manual (only works for PCIe devices).

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTIONS ARE REQUIRED TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Verify Express Module Functionality

    Verify that the Fault LED is not lit on the express module.
    Verify that the System Service Required LEDs on the front panel and rear I/O module are not lit.

Boot system and monitor boot sequence for errors. Test functionality of system:
1. Run the Solaris "fmadm faulty" and SP/ILOM "show faulty" command (if only ALOM is supported run "showfaults -v" command) to verify that the fault has been cleared.
2. Perform one of the following tasks based on your verification results:
* If the previous steps did not clear the fault, refer to doc 1004229.1 for information about the tools and methods you can use to diagnose and clear
component faults.
* If the previous steps indicate that no faults have been detected, the component has been replaced successfully. No further action is required
3. Restart software applications per applicable administration guides to resume system operation.


PARTS NOTE:
for T4-4 refer to: https://support.oracle.com/handbook_private/Systems/SPARC_T4_4/components.html

for T3-4 refer to: https://support.oracle.com/handbook_private/Systems/SPARC_T3_4/components.html

REFERENCE INFORMATION:
SPARC T4-4 Service Manual: http://docs.oracle.com/cd/E23411_01/index.html

SPARC T3-4 Service Manual: http://docs.oracle.com/cd/E19417-01/index.html

See also:  Oracle Integrated Lights Out Manager (ILOM) 3.0 Documentation Collection: http://docs.oracle.com/cd/E19860-01/index.html


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback