Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2117430.1
Update Date:2018-05-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  2117430.1 :   How to Replace an Exadata X5-8, X6-8 Compute Node Internal RAID HBA SuperCap and/or Cable  


Related Items
  • Exadata X5-8 Hardware
  •  
  • Exadata X6-8 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU replacement on Engineered system

Applies to:

Exadata X5-8 Hardware - Version All Versions and later
Exadata X6-8 Hardware - Version All Versions and later
Information in this document applies to any platform.

Goal

 How to Replace an Exadata X5-8, X6-8 Compute Node Internal RAID HBA SuperCap and/or Cable.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?:
Exadata X5-8/X6-8 Training

TIME ESTIMATE: 60 minutes

TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Exadata X5-8/X6-8 Compute Node HBA ESM (SuperCap) or ESM Cable needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

The server that contains the faulty super capacitor or cable should have its services offline and system powered off.


WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

The instructions below assume the customer DBA is available and working with the field engineer onsite to manage the host OS and
DB/ASM services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the
FE if the customer DBA wants or allows or needs help with their steps.


Step A. Pre-Steps to shutdown the node for servicing:


1. For Extended information on this section, check MOS Note:
ID 1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration.


For a documentation reference, in the Exadata Maintenance Guide, use the section of chapter 1 "General Maintenance Information"
titled "Non-Emergency Power Procedures" section "Powering Off Oracle Exadata Rack" sub-section "Powering off Database Servers" available on the customer's
cell server image in the /opt/oracle/cell/doc directory, or online here:
hhttps://docs.oracle.com/cd/E80920_01/DBMMN/exadata-general-maintenance.htm#DBMMN20984


If running OVM then go to section "For Compute Node running OVM" - for non-OVM  proceed as follows:

Shutdown crs


i. As root user do the following to stop crs and disable autostart of crs on reboot:

# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
# $ORACLE_HOME/bin/crsctl disable crs
# $ORACLE_HOME/bin/crsctl stop crs
or
# <GI_HOME>/bin/crsctl stop crs

where GI_HOME environment variable is typically set to “/u01/app/11.2.0/grid” but will depend on the customer's environment.
In the above output the “1” of “+ASM1” refers to the DB node number. For example, Db node #3 the value would be +ASM3.

ii. Validate CRS is down cleanly. There should be no processes running.

# ps -ef | grep css

 

For Compute Node running OVM proceed as follows:

If there are any concerns engage EEST engineer.


The customer should perform the following:
(a) See what user domains are running (record result )
Connect to the management domain (domain zero, or dom0).
This is an example with just two domains and the management domain Domain-0

# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 8192 4 r----- 409812.7
dm01db01vm01 8 8192 2 -b---- 156610.6
dm01db01vm02 9 8192 2 -b---- 152169.8

connect to each domain using the command

# xm console domainname

where domainname would be dm01db01vm01 or dm01db01vm02 if using the above examples.

Shut down any instances of crs ,refer to the example above in previous section "shutdown crs" in all user domains

Note: Omit the following command for OVM as it is not not required.
# $ORACLE_HOME/bin/crsctl disable crs

Press CTRL+] to disconnect from the console.

(b)Shutdown all user domains from dom0

# xm shutdown -a -w

(c) See what user domains are running (should be only Domain-0)

(d) Disable user domains from auto starting during dom0 boot after Super Capacitor has been replaced.

# chkconfig xendomains off

3. Revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost
when replacement of the supercapacitor occurs. Set all logical volumes cache policy to WriteThrough cache mode:

# /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wt -lall -a0

Verify the current cache policy for all logical volumes is now WriteThrough :

# /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU

4. The customer can now shutdown the server operating system:

# shutdown -hP now

 

Step B. Physical RAID Card replacement

Reference links for Service Manual:
Oracle Server X5-8: ( https://docs.oracle.com/cd/E56301_01/html/E56311/index.html )

Remove the SMOD from the Server

1. Disconnect the AC power cords from the server. This is a cold-service item.

2. Access the rear of the server.

3. Label and disconnect all cables from the SMOD.

4. Disengage the SMOD from the server midplane.

    a. To Unlock the SMOD handles, squeeze together the release latches on the end of both handles.  The unlocking action is accompanied by an audible click.

    b. To disengage the SMOD from the server midplane, simultaneously, rotate both handles downward toward their fully open positions.  This action disengages the connectors on the SMOD from the connectors on the server midplane.


Caution - Physical harm or component damage. Do not use the SMOD handles to remove the SMOD from the server.

5. To remove the SMOD, grab it by it's sides and slide it out of the server.

6. Set the SMOD on a flat surface rotated 180 degrees so the rear (connector) side is facing toward you.


Caution - Component Damage - SMOD components are extremely sensitive to electro-static discharge. Wear a wrist strap and use an anti-static mat.

7. Close the SMOD handles.

For ESM Replacement:

1. The ESM is located between the two disk drive enclosures. Remove as follows:

a. Disengage the ESM cable connector from the ESM extension cable.

b. Remove the ESM by lifting it straight up and out of position.

2. Install the new ESM:

a. Insert the ESM into the holder between the disk drive enclosures, so that the cable extends out from the top.

b. Connect the ESM cable connector to the ESM extension cable.

For ESM Cable Replacement:

1. Disengage the ESM cable connector from the ESM extension cable.

2. Slide the ESM extension cable out of the routing slot

3. Turn the SMOD over and locate the HBA.  The SMOD rear (connector) side should still be facing you.

4. Disconnect the ESM extension cable from the HBA connector.

5. Disconnect the ESM extension cable from the routing clips around the SMOD enclosure wall

6. Remove the old ESM extension cable.

7. Install the new ESM extension cable to the HBA connector.

8. Route the new ESM extension cable under the support beam and into the routing clips on the SMOD enclosure wall, and down through the routing slot.

9. Turn the SMOD over. The SMOD rear (connector) side should still be facing you.

10. Connect the ESM extension cable to the ESM cable connector.

 

Install the SMOD into the Server

1. Ensure that the handles on the SMOD are in their fully open position.

    a. To unlock the SMOD handles, squeeze together the release latches on the end of both handles.  The unlocking action is accompanied by an audible click.

    b. To open, rotate both handles downward until they are at a 90 degree angle to the SMOD (fully-open position).


Caution - Physical harm or component damage. Do not use the SMOD handles to install the SMOD into the server.

2. Orient the SMOD with the handles facing away from the server and the connectors facing toward the open slot in the server.  The handles should be at the bottom front and the disks should be at the top front.

3. Align the SMOD in the slot.

4. Slide the SMOD into the slot until it stops.  This leaves the SMOD protruding slightly from the back of the server. Do not attempt to push the SMOD inward beyond this point.

5. To install the SMOD, simultaneously rotate both handles upward until they lock into place.  This action draws the SMOD inward engaging the SMOD connectors with the connectors on the server midplane.


Caution - Pinch point. When operating the lever, keep your fingers clear of the back side and hinged end of the lever.

6. Connect the cables to the SMOD.

Power on :

1. Re-attach the AC Power cords to the server.
2. Once the ILOM has booted you will see a slow blink on the green LED for the server. Power on the server by pressing the power
button on the front of the unit.

 

Server Services Startup Validation:

DB Node Startup:


After the OS is up, login as root and validate the physical and logical volumes are seen in the OS, and that the supercap is seen:

# /opt/MegaRAID/MegaCli/MegaCli64 -LdInfo -Lall -a0


Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :DBSYS
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
Size : 1.633 TB
Physical Sector Size: 512
Logical Sector Size : 512
VD has Emulated PD : No
Parity Size : 557.861 GB
State : Optimal
Strip Size : 1.0 MB
Number Of Drives : 4
Span Depth : 1
Creation Date : 25-12-2014
Creation Time : 08:32:46 AM
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disabled
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI
Is VD Cached: No

# /opt/MegaRAID/MegaCli/MegaCli64 -PdList -a0 | grep "Slot\|Firmware\|Inq"

Slot Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: A690
Inquiry Data: HITACHI H109060SESUN600GA6901446BZMTTX
Slot Number: 1
Firmware state: Online, Spun Up
Device Firmware Level: A690
Inquiry Data: HITACHI H109060SESUN600GA6901446BZMW0X
Slot Number: 2
Firmware state: Online, Spun Up
Device Firmware Level: A690
Inquiry Data: HITACHI H109060SESUN600GA6901446B01TBX
Slot Number: 3
Firmware state: Online, Spun Up
Device Firmware Level: A690
Inquiry Data: HITACHI H109060SESUN600GA6901446BZN1KX


# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0
BBU status for Adapter: 0
BatteryType: CVPM02
truncated.....

Set all logical drives cache policy to WriteBack cache mode:

# /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0

Verify the current cache policy for all logical drives is now using WriteBack cache mode:

# /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU

CRS services should now be started.


"DB Node Startup Verification" - for compute node NOT running OVM ,for OVM refer to next section.


Startup CRS and re-enable autostart of crs. After the OS is up, the Customer DBA should validate that CRS is running. As root execute:

# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle
# $ORACLE_HOME/bin/crsctl start crs
# $ORACLE_HOME/bin/crsctl check crs

Now re-enable autostart

# $ORACLE_HOME/bin/crsctl enable crs
or
# <GI_HOME>/bin/crsctl check crs
# <GI_HOME>/bin/crsctl enable crs

 where GI_HOME environment variable is typically set to “/u01/app/11.2.0/grid” but will depend on the customer's environment.
In the above output the “1” of “+ASM1” refers to the DB node number. For example, Db node #3 the value would be +ASM3.
Example output when all is online is:

# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Validate that instances are running:

# ps -ef |grep pmon

It should return a record for the ASM instance and a record for each database.


For Compute Node running OVM

If the customer requires assistance please ask them to contact EEST engineer or parent case owner.

Once the compute node has booted ,re-enable user domains to autostart during Domain-0 boot.

# chkconfig xendomains on

Startup all user domains that are marked for auto start

# service xendomains start

See what user domains are running (compare against result from previously collected data)

# xm list

if any not auto-started then Startup a single user domain

# xm create -c /EXAVMIMAGES/GuestImages/DomainName/vm.cfg

Check that crs has started in user domains, refer to previous section "DB Node Startup Verification"

 

OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

Verify that HW Components and SW Components are returned to properly functioning state with server up and database services
operating on DB Servers


PARTS NOTE:

7086345 - 13.5V 6.4F Super Capacitor

7086346 - Super Capacitor Cable

 


REFERENCE INFORMATION:

Oracle Server X5-8 Service Manual:  https://docs.oracle.com/cd/E56301_01/html/E56311/gownr.html#scrolltoc

1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback