Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1970278.1
Update Date:2018-04-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  1970278.1 :   How to Replace an Exadata X5-2/X6-2 Compute Node 10GbE NIC Card  


Related Items
  • Zero Data Loss Recovery Appliance X6 Hardware
  •  
  • Exadata X5-2 Hardware
  •  
  • Exadata X5-2 Eighth Rack
  •  
  • Exadata X5-2 Full Rack
  •  
  • Exadata X6-2 Hardware
  •  
  • Exadata X5-2 Quarter Rack
  •  
  • Exadata X5-2 Half Rack
  •  
  • Zero Data Loss Recovery Appliance X5 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU replacement on Engineered system

Applies to:

Exadata X5-2 Hardware - Version All Versions and later
Exadata X5-2 Eighth Rack - Version All Versions and later
Exadata X5-2 Half Rack - Version All Versions and later
Zero Data Loss Recovery Appliance X5 Hardware - Version All Versions and later
Exadata X5-2 Full Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

 How to Replace 10GbE NIC card in Exadata X5-2/X6-2 compute node

Solution

 DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Exadata Trained


TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 3


FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:


WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

The server that contains the faulty10GbE Ethernet card should have its services offline and system powered off.


WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

The instructions below assume the customer DBA is available and working with the field engineer onsite to manage the host OS and
DB/ASM services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the
FE if the customer DBA wants or allows or needs help with their steps.


Step A. Pre-Steps to shutdown the node for servicing:


1. For Extended information on this section, check MOS Note:
ID 1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration.


For a documentation reference, in the Exadata Maintenance Guide, use the section of chapter 1 "General Maintenance Information"
titled "Non-Emergency Power Procedures" section "Powering Off Oracle Exadata Rack" sub-section "Powering off Database Servers" available on the customer's
cell server image in the /opt/oracle/cell/doc directory, or internal to Oracle here:
http://amomv0115.us.oracle.com/archive/cd_ns/E50790_01/doc/doc.121/e51951/general.htm#DBMMN21014


if running OVM then go to section "For Compute Node running OVM" - for non-OVM proceed as follows:


Shutdown crs


i. As root user do the following to stop crs and disable autostart of crs on reboot:

# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
# $ORACLE_HOME/bin/crsctl disable crs
# $ORACLE_HOME/bin/crsctl stop crs
or
# <GI_HOME>/bin/crsctl stop crs

where GI_HOME environment variable is typically set to “/u01/app/11.2.0/grid” but will depend on the customer's environment.
In the above output the “1” of “+ASM1” refers to the DB node number. For example, Db node #3 the value would be +ASM3.

ii. Validate CRS is down cleanly. There should be no processes running.

# ps -ef | grep css

For Compute Node running OVM proceed as follows:


If there are any concerns engage EEST engineer.

The customer should perform the following:

(a) See what user domains are running (record result )
Connect to the management domain (domain zero, or dom0).
This is an example with just two domains and the management domain Domain-0

# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 8192 4 r----- 409812.7
dm01db01vm01 8 8192 2 -b---- 156610.6
dm01db01vm02 9 8192 2 -b---- 152169.8

connect to each domain using the command

# xm console domainname

where domainname would be dm01db01vm01 or dm01db01vm02 if using the above examples.

Shut down any instances of crs ,refer to the example above in previous section "shutdown crs" in all user domains

Note: Omit the following command for OVM as it is not not required.
# $ORACLE_HOME/bin/crsctl disable crs

Press CTRL+] to disconnect from the console.

(b)Shutdown all user domains from dom0

# xm shutdown -a -w

(c) See what user domains are running (should be only Domain-0)

(d) Disable user domains from auto starting during dom0 boot after motherboard has been replaced.

# chkconfig xendomains off


The customer can now shutdown the server operating system:

# shutdown -hP now

 

Reference links for Service Manual:

X5-2 DB’s: ( http://docs.oracle.com/cd/E41059_01/html/E48312/napsm.gnriy.html#scrolltoc )

 The field engineer can now slide out the server for maintenance. Do not remove any cables prior to sliding the server forward, or the
loose cable ends will jam in the cable management arms (CMA). Ensure all customer-added data network cables are properly dressed
into the CMA Take care to ensure the cables and CMA is moving properly.
Remember to disconnect the power cords before opening the top of the server.

Locate and Remove the PCIe card.

(a) There are three external PCIe slots in the system. The external PCIe slots are numbered 1, 2, and 3 from left to right when you
view the server from the rear. The 10G Ethernet card is always installed in PCIe slot 2.

(b) Locate the 10G Ethernet card in PCIe slot 2 and unplug any cables from the PCIe card making note of their locations so that they
can be re-installed in the same configuration (label if needed).

(c) lift the green-tabbed latch on the rear of the server's chassis next to the PCIe slot to release the PCIe card's rear bracket.

(d) Lift the riser release lever with one hand and use your other hand to remove the riser from the motherboard
      place the riser and card on an anti-static mat.

(e) Remove the 10G Ethernet card from the PCIe riser. Hold the riser in one hand and use your other hand to carefully pull the PCIe
card connector out of the riser.

(f) Disconnect the rear bracket that is attached to the 10G Ethernet card from the rear of the PCIe riser.


Replace the PCIe card.


(a) Pull out any SFP modules installed in the original 10G Ethernet card and install them into the replacment 10G Ethernet card
making sure to orient them properly so that they seat fully into the card.

(b) Insert the rear bracket that is attached to the 10G Ethernet card into the PCIe riser.

(c) Hold the riser in one hand and use your other hand to carefully insert the PCIe card connector into the Riser.

(d) Install the PCIe riser with the installed PCIe cards into the server.

(e) Raise the PCIe riser release lever (marked with a green tab) to the open (up) position
Making sure to replace the riser into the same position from which it was removed (PCIe slot 2), gently press the riser into the
motherboard connector until it seats and press the green-tabbed, riser release lever to the closed (down) position.

(f) Close the green-tabbed latch on the rear of the server's chassis next to the applicable PCIe slot to secure the PCIe card's rear
bracket to the server's chassis.

(g) Reconnect any cables to the PCIe card that were unplugged during the removal procedure making sure to connect them in the
same configuration as when they were disconnected.

 

Server Services Startup Validation:


DB Node Startup:

Verify the new card is detected and the interfaces are functioning.Check both eth4 and eth5 if both ports are used and ensure the link is detected

and the speed and duplex are correct.Example below is for eth5

 # ethtool eth5

Settings for eth5:
        Supported ports: [ FIBRE ]
        Supported link modes:   10000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: No
        Advertised link modes:  10000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Speed: 10000Mb/s
        Duplex: Full
        Port: Other
        PHYAD: 0
        Transceiver: external
        Auto-negotiation: off
        Supports Wake-on: d
        Wake-on: d
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

 

CRS services should now be started.

 

"DB Node Startup Verification" - for compute node NOT running OVM ,for OVM refer to next section.

Startup CRS and re-enable autostart of crs. After the OS is up, the Customer DBA should validate that CRS is running. As root execute:

# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
# $ORACLE_HOME/bin/crsctl start crs
# $ORACLE_HOME/bin/crsctl check crs
Now re-enable autostart
# $ORACLE_HOME/bin/crsctl enable crs
or
# <GI_HOME>/bin/crsctl check crs
# <GI_HOME>/bin/crsctl enable crs

where GI_HOME environment variable is typically set to “/u01/app/11.2.0/grid” but will depend on the customer's environment.
In the above output the “1” of “+ASM1” refers to the DB node number. For example, Db node #3 the value would be +ASM3.

Example output when all is online is:

# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validate that instances are running:

# ps -ef |grep pmon

It should return a record for the ASM instance and a record for each database.

For Compute Node running OVM


If the customer requires assistance please ask them to contact EEST engineer or parent case owner.

Once the compute node has booted ,re-enable user domains to autostart during Domain-0 boot.

# chkconfig xendomains on

Startup all user domains that are marked for auto start

# service xendomains start

See what user domains are running (compare against result from previously collected data)

# xm list

if any not auto-started then Startup a single user domain

# xm create -c /EXAVMIMAGES/GuestImages/DomainName/vm.cfg

Check that crs has started in user domains ,refer to previous section "DB Node Startup Verification"

 

OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

Verify that HW Components and SW Components are returned to properly functioning state with server up and database services
operating on DB Servers


PARTS NOTE: 7051223 - Dual 10-Gigabit Ethernet

REFERENCE INFORMATION:

1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback