Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1529344.1
Update Date:2017-06-19
Keywords:

Solution Type  Technical Instruction Sure

Solution  1529344.1 :   Mx-32 - How to Replace a Faulty PCIe Card  


Related Items
  • SPARC M5-32
  •  
  • SPARC M6-32
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP

Applies to:

SPARC M5-32 - Version All Versions and later
Oracle SuperCluster M6-32 Hardware - Version All Versions and later
SPARC M6-32 - Version All Versions and later
Information in this document applies to any platform.

Goal

CAP PROBLEM OVERVIEW: Mx-32 -  PCIe Card Failure

*********************************************************************
To report errors or request improvements on this procedure, please go to
My Oracle Support, and put a comment on Doc ID: 1529344.1

********************************************************************* 

ESD Caution:
  • Circuit boards and drives contain electronic components that are  extremely sensitive to static electricity. Ordinary amounts of static electricity from clothing or the work environment can destroy the components located on these boards. Do not touch the components along their connector edges.
  • Use a Antistatic Wrist strap. Attach one end of the strap to your wrist and the other end to the chassis, depending on what type of strap you use, with the adhesive end or the metal plug.
  • Use an Antistatic Mat. Place ESD-sensitive components such as motherboards, memory, and other PCBs on an antistatic mat.

 

Contamination Caution:
  • Dust particles of packaging material are number one cause of datacenter contamination. Make sure to remove all packaging material, up to the ESD safe packaging material, while still being outside the datacenter.

 

HOT Replacement Caution:
  • Oracle SuperCluster M6-32 Server does not support PCIe card hot replacement (aka hotplug).  The Physical Domain (PDom) which owns the PCIe slot must be stopped to replace the PCIe card.
    To maintain continuous services, RAC and Clusterware for the DB and Solaris Cluster for applications is required.

 

Oracle SuperCluster M6-32 Infiniband HCA Responsibility:
  • When replacing Infiniband HCA in Oracle SuperCluster M6-32, additional Field Engineering responsibilities must be completed.  Perform the steps in the following document:
    Updating IB partitions after replacing an Infiniband HCA in any nodes within IB network - steps to do after replacing HCA (Doc ID 1985159.1)

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED: Mx-32 Product Training/Experience

TASK COMPLEXITY: 2

TIME ESTIMATE: 20 minutes

HOT replacement (Restrictions apply - See HOT Replacement Cautions in the header)

FIELD ENGINEER INSTRUCTIONS

Prepare labeling materials sufficient for all IO cables attached to the IOU.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? : Solaris must be running in the domain which owns the target PCIe slot in order to use HOT replacement and the ATTN button

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

Required Preparation of PCIE slots assigned to Oracle VM Server guest domains:

a. PCIE slots assigned to Oracle VM Server I/O domains must re-assign the slot to the primary domain.  See the following reference for the necessary preparatory steps,

The Oracle® VM Server for SPARC 3.2 Administration Guide may be helpful with details how to successfully Minimize Guest Domain Outages When Removing a PCIe Card.
See https://docs.oracle.com/cd/E48724_01/html/E48732/minimizedomainoutageswhenremovecard.html

b. PCIE slots on a root complex owned by a non-primary root domain must, instead, see How to Replace PCIe Direct I/O Cards Assigned to an Oracle VM Server for SPARC Guest Domain (Doc ID 1684273.1).



1. Press the ATTN button on the carrier that contains the I/O card that you wish to remove.
    The LEDs on the carrier flash for approximately 10 seconds as the PDomain disables the I/O card. When the LEDs on both the carrier and the card turn off, the carrier and card are ready to remove.
    If the Green OK LED fails to extinguish the I/O card cannot safely be removed. Investigation must be made via Solaris to determine why the card has not been offlined. Domain shutdown may be required to
    safely removed the I/O card.

NOTE: The ATTN button will only function if Solaris is running in the domain which owns the PCIe slot.


2. Label and remove any I/O cables from the I/O card.
3. Remove the carrier from the slot:
   a. Pull the carrier’s extraction lever.  ( The lever is held in place by friction. )
   b. Swing the extraction lever out 90 degrees until the far end of the lever begins to push the carrier out of the slot.
   c. Remove the carrier from the slot.
   d. Place the carrier on a static-safe workspace.

Remove the I/O card from the carrier.
   a. Lift the green tab to unlock and open the top of the PCIe hot-plug carrier
   b. Pull the I/O card or filler panel out of the carrier.


Install an I/O Card in a PCIe Hot-Plug Carrier

1. Seat the I/O card or filler panel in the PCIe hot-plug carrier.
2. Close the top of the carrier.
   The green latch should click into place.  If the top is difficult to close, verify that the notch of the card bracket fits around the guide post.

Install the carrier in the server.

1. Insert the PCIe hot-plug carrier with the I/O card in the slot and lock the carrier’s extraction lever.

  1. Push evenly on both sides of the carrier so that the carrier slides straight into the slot
    If the carrier slides correctly into the slot, you should feel a slight resistance as the carrier starts to seat in the connector
    Do not push the extraction level while you insert the carrier into the slot.  The carrier can enter at an angle and damage the connections
  2. Lock the carrier's extraction lever
    The LEDs on the carrier and the card should remain off at this point.

2. Attach I/O cables to the card.
3. Press the ATTN button on the carrier to reconfigure the I/O card into the PDomain.
   The carrier’s LEDs should flash for a few seconds until PDomain enables the I/O card. The card’s LEDs will show activity when the card is enabled.

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Restart software applications per applicable administration guides to resume system operation.

If the PCIE slot fails to be available once Solaris is booted the following issue may apply.  Consult
FMA I/O retirement : PCI devices can be seen from OBP but disappear when System Boots up into Solaris (Doc ID 1614738.1)
NOTE: make certain to check for the /etc/devices/retire_store   within the control domain (aka, primary) which consumes subscribed IO fault diagnosis.

======================== Other info =====================

REFERENCE INFORMATION:  Service Manual: https://www.oracle.com/technetwork/documentation/oracle-sparc-ent-servers-189996.html

The Oracle® VM Server for SPARC 3.2 Administration Guide may be helpful with details how to successfully Minimize Guest Domain Outages When Removing a PCIe Card.
See https://docs.oracle.com/cd/E48724_01/html/E48732/minimizedomainoutageswhenremovecard.html

References

<NOTE:1985159.1> - Updating IB partitions after replacing an Infiniband HCA in any nodes within IB network - steps to do after replacing HCA
<NOTE:1614738.1> - [SPARC T4/T5/M5 and M6] FMA I/O retirement : PCI devices can be seen from OBP but disappear when System Boots up into Solaris
<NOTE:1684273.1> - How to replace PCIe Direct I/O cards assigned to a OVM guest domain

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback