How to Replace an Exadata Database Machine X5-2/X6-2 Compute Node Motherboard

Asset ID:	1-71-1966059.1
Update Date:	2018-04-10
Keywords:

Solution Type Technical Instruction Sure

Solution 1966059.1 : How to Replace an Exadata Database Machine X5-2/X6-2 Compute Node Motherboard

Applies to:

Exadata X6-2 Hardware - Version All Versions and later
Zero Data Loss Recovery Appliance X6 Hardware - Version All Versions and later
Exadata X5-2 Full Rack - Version All Versions and later
Exadata X5-2 Half Rack - Version All Versions and later
Exadata X5-2 Eighth Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

Canned Action Plan for replacing a Motherboard in compute node in an Exadata Database Machine [X5-2/X6-2]

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Exadata Trained

TIME ESTIMATE: 120 minutes

TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW:A server in an Exadata Database Machine requires the motherboard to be replaced. This procedure is specific to
Exadata X5-2/X6-2 systems based on Oracle Server X5-2/X6-2.

Connectivity to the rack will depend on the customer's access requirements. The following procedure partially requires serial connection
and network access to ILOM which assumes using a laptop attached to the Cisco management switch. If no port is available in a full
rack, then temporarily disconnect a port used for another host's ILOM (e.g. port 2). If the customer does not allow login access to the
host ILOM, then they will need to run the commands given below.

When connecting to ILOM via serial cable remember that the baud rate is 9600 for replacement boards. This will get changed during
the post-install procedure to the Exadata default which is 115200 for installed boards.

To make room for needed text MB = Motherboard

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Pre-Install Steps:

1. Backup ILOM Settings.
Assuming the ILOM is not the reason for the replacement of the system MB, then take a current backup of the ILOM SP
configuration using a browser under “ILOM Administration → Configuration Management” tab on the left menu list.

This can also be done from the ILOM CLI as follows:-

-> cd /SP/config
-> set passphrase=welcome1
-> set dump_uri=scp://root:password@laptop_IP/var/tmp/SP.config

2. Obtain the correct Serial Numbers required.

(a) Make a note of the System Serial Number from the front label of the server.
(b) Make a note of the Rack Master Serial Number from the front label of the rack (left-side vertical wall, half way up the rack).

3. If the system is not down already due to whatever problem is causing the MB to be replaced, then have the customer DBA shut the
node down.

(a) For Extended information on this section, check MOS Note 1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS
Services and Cell/Compute Nodes On An Exadata Configuration. (https://support.oracle.com/epmos/faces/ui/km
/SearchDocDisplay.jspx?id=1093890.1&type=DOCUMENT )

For a documentation reference, in the Exadata Owner’s Guide (E13874), use the section of chapter 7 titled “Non-Emergency Power
Procedures” section “Powering Off Oracle Exadata Rack” sub-section “Powering off Database Servers” available on the customer's
cell server image in the /opt/oracle/cell/doc.

http://amomv0115.us.oracle.com/archive/cd_ns/E13877_01/doc/doc.112/e13874/maintenance.htm#autoId18

(b) The Customer should shutdown CRS services prior to powering down the DB node,

if running OVM then go to section "For Compute Node running OVM - for non-OVM proceed as follows:

Shutdown crs

i. As root user do the following to stop crs and disable autostart of crs on reboot:

# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle

# $ORACLE_HOME/bin/crsctl disable crs

      # $ORACLE_HOME/bin/crsctl stop crs
     or
     # <GI_HOME>/bin/crsctl stop crs

where GI_HOME environment variable is typically set to “/u01/app/11.2.0/grid” but will depend on the customer's environment.

In the above output the “1” of “+ASM1” refers to the DB node number. For example, Db node #3 the value would be +ASM3.

ii. Validate CRS is down cleanly. There should be no processes running.

# ps -ef | grep css

(c)Check to see if CoD (Capacity on Demand) is configured.

Issue the following command and make a note of the total number of active physical cores.If the count is 36 then CoD is not in use.

If the compute node which requires the motherboard replacement is down then run this command on another compute node in the same cluster.

Record the result.

# dbmcli -e LIST DBSERVER attributes coreCount detail

coreCount: 36

Check to see if IaaS is configured. If ON this will have to enabled after rebooting.If a blank line is retruned then IaaS is not configured.

# dbmcli -e list dbserver ATTRIBUTES iaasMode detail
iaasMode: ON

(d) The customer or the field engineer can now shutdown the server operating system:

Linux:

# shutdown -hP now

(d) The field engineer can now slide out the server for maintenance. Remember to disconnect the power cords before opening the top of the server .

For Compute Node running OVM proceed as follows:

If there are any concerns engage EEST engineer.

The customer should perform the following:

(a) See what user domains are running (record result )

Connect to the management domain (domain zero, or dom0).

This is an example with just two domains and the management domain Domain-0

# xm list
Name ID Mem VCPUs State Time(s)
Domain-0 0 8192 4 r----- 409812.7
dm01db01vm01 8 8192 2 -b---- 156610.6
dm01db01vm02 9 8192 2 -b---- 152169.8

connect to each domain using the command

# xm console domainname

where domainname would be dm01db01vm01 or dm01db01vm02 if using the above examples.

Shut down any instances of crs ,refer to the example above in previous section "shutdown crs" in all user domains

Note: Omit the following command for OVM as it is not not required.

# $ORACLE_HOME/bin/crsctl disable crs

Press CTRL+] to disconnect from the console.

(b)Shutdown all user domains from dom0

# xm shutdown -a -w

(c) See what user domains are running (should be only Domain-0)

(d) Disable user domains from auto starting during dom0 boot after motherboard has been replaced.

# chkconfig xendomains off

(e) Check to see if CoD (Capacity on Demand) is configured.

Issue the following command and make a note of the total number of active physical cores.If the count is 36 then CoD is not in use.

If the compute node which requires the motherboard replacement is down then run this command on another compute node in the same cluster.

Record the result.

# dbmcli -e LIST DBSERVER attributes coreCount detail

coreCount: 36

Check to see if IaaS is configured.If ON this will have to be enabled after rebooting.If a blank line is returned then IaaS is not configured.

# dbmcli -e list dbserver ATTRIBUTES iaasMode detail
iaasMode: ON

(f) The customer or the field engineer can now shutdown the server operating system:

# shutdown -hP now

(g)The field engineer can now slide out the server for maintenance. Remember to disconnect the power cords before opening the top of the server .

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Note: The removal/insertion CPU tool is new for the Ivy Bridge M3 product lines. If you have not used this new tool before please make yourself familiar before
attempting to use on-site. The tool is not intuitive so reference the service manual before attempting this service action.

Reference links for Service Manual:
X5-2 DB’s: ( http://docs.oracle.com/cd/E41059_01/html/E48312/napsm.html#scrolltoc )

Physical Replacement Steps:

1. Replace the MB as per MOS Note "TO BE ENTERED" migrating existing CPUs, DIMMs, PCI Cards and risers.

NOTE:- Pull power cords before opening the top cover to avoid a SP degraded condition.

2. Carefully follow the port numbers on the cables when re-attaching so they are not reversed. It is easiest to plug cables in while the server is in the fully extended
maintenance position.

3. Do not power up the system yet, just ILOM.

Post-Installation Steps:

1. Update the Serial Number on the new MB, to that of the server chassis. This is REQUIRED in order for ASR to continue to work on the unit, and is required
for all servers that are part of Exadata racks that may have a future Service Request, whether ASR is configured now or not.

These platforms use the Top Level Indicator (TLI) feature in ILOM to perform the MB serial number update automatically. In certain circumstances this may
not work correctly and will need manually corrected. For more information on TLI and restricted shell please refer to the following 2 MOS notes for these
systems:-

TLI MOS Note 1280913.1
Restricted Shell MOS Note 1302296.1

NOTE: The serial numbers of each server can be found at the front on the left hand side.

(a) Connect to ILOM via serial port, and login as “root” with default password “changeme”.
(b) Enter Restricted Mode.

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

[(restricted_shell) exdx5-tvp-a-db1-sp:~]# showpsnc
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0

Element           | Primary           | Backup1           | Backup2
------------------+-------------------+-------------------+-------------------
PPN                 7090664             7090664             7090664
PSN                 1450NM104V          1450NM104V          1450NM104V
Product Name        ORACLE SERVER X5-2 ORACLE SERVER X5-2 ORACLE SERVER X5-2
[(restricted_shell) exdx5-tvp-a-db1-sp:~]#

If the replacement has the correct product serial number, in all 3 containers including “/SYS/MB” then skip to step 2 of the post-replacement procedures. If
the replacement does not have the product serial number populated correctly, then “exit” out of restricted shell mode and continue:

[(restricted_shell) db02-ilom:~]# exit
exit
->

(d) Where there is at least one container which still contains valid TLI information (usually the primary disk backplane DBP0), a service mode command
copypsnc can be used to update the product serial number.

i. Login as root and create escalation mode user with service role:

-> cd /SP/users
-> create sunny role=aucros (will ask for password)

ii. Gather “version”, “show /SYS” and “show /SP/clock” outputs needed for generating the service mode password:

-> version
SP firmware 3.2.4.10
SP firmware build number: 94551
SP firmware date: Fri Nov 14 18:42:04 EST 2014
SP filesystem version: 0.2.10

-> show /SYS

..............
    Properties:
        type = Host System
        ipmi_name = SYS
        product_name = ORACLE SERVER X5-2
        product_part_number = 7090664
        product_serial_number = 00000000

..............
-> show /SP/clock

..............
    Properties:
        datetime = Mon Feb 2 13:38:08 2015
        timezone = GMT (Europe/London)
        uptime = 11 days, 04:19:54
        usentpserver = enabled

iii. Generate a service mode password using “http://modepass.us.oracle.com/” Login is via Oracle Single-Sign-On. Example output of the tool is:

BRAND : sun
MODE : service
VERSION : 3.2.4.10
SERIAL : 00000000
UTC DATE : 05/20/2013 16:00
POP DOLL PHI TOW BRAN TAUT FEND PAW SKI SCAR BURG CEIL MINT DRAB KAHN FIR MAGI LEAF LIMB EM LAWS BRAE DEAL BURN GOAL HEFT
HEAR KEY SEE A

iv. Logout of root and log back in as 'sunny' user that you created, and enter Service mode:

-> set SESSION mode=service

Password:*** **** *** *** **** **** **** *** *** **** **** **** **** **** **** *** **** **** **** ** **** **** **** ****
**** **** **** *** *** *
Short form password is: ARMY ULAN HULL
Currently in service mode.

v. Correct the invalid containers using the “copypsnc” command:
-> copypsnc
Number of arguments is incorrect.
Usage:
copypsnc [-n] <src> <dest>
where <src> is PRIMARY|BACKUP1|BACKUP2
<dest> is PRIMARY|BACKUP1|BACKUP2
-n: If src is a bilingual FRU, copy from new-style record.
PRIMARY: fruid:///SYS/DBP0
BACKUP1: fruid:///SYS/MB
BACKUP2: fruid:///SYS/PS0
-> copypsnc BACKUP1 PRIMARY

The copypsnc command produces no output upon success.

vi. After running copypsnc, the service processor should be rebooted.

-> reset /SP

vii. Log in again as 'root' user with default password 'changeme' and verify the SN is now populated correctly using 'show /SYS' and 'showpsnc' as
shown above.

viii. Remove the 'sunny' user:

-> delete /SP/users/sunny

If there are any issues with programming the serial number with “copypsnc” then an escalation mode password and instructions will need to be provided by
the TSC x86 engineer assigned to the SR.

2. Re-flash the ILOM/BIOS to the correct levels required for Exadata if required.

Exadate X5-2 requires minimum image 12.1.2.1.0 this image automatically checks the version of ILOM/BIOS on the motherboard and will attempt to correct this if required .

Before power up it is necessary to configure the ILOM network settings , obtain the ILOM network values from the customer or the previously saved SP.config file . Login to the ILOM then "-> cd /SP/network" and apply the settings.If this step is not performed the ILOM will fail to re-flash.

i. Power up system using front button or from ILOM "-> start /SYS"
ii. During the boot validation phase, CheckHWnFWProfile will run and determine the ILOM version, and give a WARNING if it is not correct ,it will then attempt to reflash ,which will result in a reboot.

Note : If you see the following warning the the firmware update has failed.You will need to manually update the firmware.

[WARNING] Firmware updates were already tried once and likely failed.
[WARNING] To retry the firmware update reboot or try to update the firmware manually.

then please delete /opt/oracle.cellos/TRIED_FW_UPDATE_ONCE and re-run /opt/oracle.SupportTools/CheckHWnFWProfile -U /opt/oracle.cellos/iso/cellbits

# /opt/oracle.SupportTools/CheckHWnFWProfile -U /opt/oracle.cellos/iso/cellbits

NOTE: The above command will do a similar update to the Cell automatic update method. The server will be powered off during this, ILOM will reset,
and after 10 minutes of being off to allow ILOM reset and BIOS flash update, the server host will be automatically powered back on.
See Example output below:-

[root@gmpadb04 cellbits]# /opt/oracle.SupportTools/CheckHWnFWProfile -U /opt/oracle.cellos/iso/cellbits

Now updating the ILOM and the BIOS ...
[INFO] Start ILOM firmware upgrade to version 3.1.2.10 r74387. Attempt 1 of 2.
Connected. Use ^D to exit.
...
Waiting for upgrade to start..
Broadcast message from root (Thu Dec 9 00:57:48 2010):
The system is going down for system halt NOW!
..Connection to 10.7.7.24 closed by remote host.
Connection to 10.7.7.24 closed.

3. Restore the backed up SP configuration done during the pre-installation steps.

(a) Using a browser under Maintenance Tab or from ILOM cli:-

-> cd /SP/config
-> set passphrase=welcome1
-> set load_uri=scp://root:password@laptop_IP/var/tmp/SP.config

If SP backup was not possible check with customer for network information & use another ILOM within the rack for general settings. The primary specific
setup for Exadata are:

i. Serial Baud rate is 115200 for external and host
-> show -l all /SP/serial
speed = 115200 for "external" and "host"
if this is not 115200 (ILOM default is 9600, Exadata default is 115200), then they should be set:

-> set /SP/serial/external pendingspeed=115200
-> set /SP/serial/external commitpending=true
-> set /SP/serial/host pendingspeed=115200
-> set /SP/serial/host commitpending=true

ii. /SP system_identifer is set to the appropriate rack type string and master Rack Serial Number. This is critical for ASR deployments. The Master Rack
Serial number can be obtained top left inside the cabinet or from show /SP on any other ILOM. The string should be of the following format:
X5-2 - “Exadata Database Machine X5-2 <Rack SN>”
For Example:

-> show /SP
    Properties:
        check_physical_presence = false
        current_hostname = exdx5-tvp-a-db1-sp
        hostname = exdx5-tvp-a-db1-sp
        reset_to_defaults = none
        system_contact = (none)
        system_description = ORACLE SERVER X5-2, ILOM v3.2.4.10, r94551
        system_identifier = Exadata Database Machine X5-2 AK00268428
        system_location = (none)

iii. /SP hostname is setup
iv. /SP/network settings
v. /SP/alertmgmt rules that may have been previously setup by ASR or cell configuration
vi. /SP/clock timezone, datetime, and /SP/clients/ntp NTP settings
vii. /SP/clients/dns Name service settings
viii. root account password. If the root password has not been provided you can have the customer do this, or do this manually:

-> set /SP/users/root password=welcome1 (or customers password)
Changing password for user /SP/users/root...
Enter new password again: ********
New password was successfully set for user /SP/users/root

(b) Reset the ILOM under the Maintenance Tab or from ILOM cli:

-> reset /SP

(c) Check you can login to all interfaces and ILOM can be accessed using a browser and ssh from another system on the customer's management network.

4. Power-on the host server, and go into BIOS setup and check BIOS settings against EIS checklist, in particular make sure USB is first in boot order if this is a
Storage Cell (the original USB stick should have been moved from the old board to the new board), and check date and time is correct. Use ILOM cli to set
the boot device to BIOS "-> set /HOST boot_device=bios" or press F2 (Ctrl-E) during BIOS at the right time to get into BIOS setup menu.

5. Power-on and boot the system, monitoring the graphics java console through ILOM (or local video if there is a crash cart available). As the system boots the

hardware/firmware profile will be checked, and either a green “Passed” will be displayed, or a red “Warning” that something with the hardware or firmware does
not match what is expected. If the check passes, then everything is correct and up, and the boot will continue up to the OS login prompt. If the check fails, then
the issue being flagged should be investigated and rectified before continuing.

6. Additional OS checks:-

(a) Verify the network interfaces have correctly picked up the new MAC addresses of the new system board:

# ifconfig eth0 (for each eth1/bondeth0 etc.)

# ipmitool sunoem cli "show /SYS/MB/NET0" (for each NIC NET0/1/2/3)

OR from ILOM

-> show /SYS/MB/NET0

/SYS/MB/NET0
Targets:
Properties:
type = Network Interface
ipmi_name = MB/NET0
fru_description = 10G Ethernet Controller
fru_manufacturer = INTEL
fru_part_number = X540
fru_macaddress = 00:10:e0:3e:51:ec
fault_state = OK
clear_fault_action = (none)

Verify that the management network is working:

# ethtool eth0 | grep det
Link detected: yes

Verify that the ILOM management network is working:

# ipmitool sunoem cli 'show /SP/network' | grep ipadd
ipaddress = 192.168.1.108
pendingipaddress = 192.168.1.108
[root@db01 ~]# ping -c 3 192.168.1.108
PING 192.168.1.108 (192.168.1.108) 56(84) bytes of data.
64 bytes from 192.168.1.108: icmp_seq=1 ttl=64 time=0.625 ms
64 bytes from 192.168.1.108: icmp_seq=2 ttl=64 time=0.601 ms
64 bytes from 192.168.1.108: icmp_seq=3 ttl=64 time=0.606 ms
--- 192.168.1.108 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 3199ms
rtt min/avg/max/mdev = 0.601/0.608/0.625/0.026 ms

Verify that all memory is present in Linux. Sun Server X5-2 DB nodes have 256GB.

# grep MemTotal /proc/meminfo
MemTotal: 264152848 kB

- this may vary depending on BIOS version.

Verify the disks are visible and online:

# /opt/MegaRAID/MegaCli/MegaCli64 -Pdlist -a0 | grep "Slot\|Firmware state"
Output from Exadata X5-2 DB node with dual-boot option:
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up

Verify the hardware logical volume is correctly set up:

# /opt/MegaRAID/MegaCli/MegaCli64 -LdInfo -lAll -a0
Output for Exadata X5-2 DB nodes:
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :DBSYS
RAID Level : Primary-5, Secondary-0, RAID Level Qualifier-3
Size : 1.633 TB
Physical Sector Size: 512
Logical Sector Size : 512
VD has Emulated PD : No
Parity Size : 557.861 GB
State : Optimal
Strip Size : 1.0 MB
Number Of Drives : 4
Span Depth : 1
Creation Date : 14-11-2014
Creation Time : 04:14:18 PM
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if
Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if
Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disabled
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI
Is VD Cached: No
Exit Code: 0x00

Verify HW Profile is operating correctly.

# /opt/oracle.SupportTools/CheckHWnFWProfile
[SUCCESS] The hardware and firmware matches supported profile for server=ORACLE_SERVER_X5-2

If there are any errors, they will need to be corrected.

(b) Verify the InfiniBand connections are up and actively seen in the fabric:

If possible to login to DB01, then check InfiniBand connections are ok by running the following from DB01:

# cd /opt/oracle.SupportTools/ibdiagtools
# ./verify-topology

(options to verify-topology may be required depending on configuration)
If not possible for security reasons, then on this local node, verify the IB connection status with:

# ibstatus (Looking for both link ports up and active at 40Gb/s (4X QDR))
# ibdiagnet (Looking for any fabric errors that might suggest a link or cabling failure)
# ibnetdiscover (Looking for ability to see all expected switches and other DB nodes and cells in the IB fabric)

(d) If dcli is setup for password-less SSH, then the SSH keys need to be updated for new mac address. The customer should be able to do this using their
root password.

6. [Eighth Rack Only] For DB Nodes in Eighth Rack configurations, the configuration requires limiting CPU cores for licensing as follows. Refer to MOS Note
1538561.1 for more details if required.

To confirm if this is an Eighth rack config,ask the customer to login to a storage cell which is part of the this cluster and issue the following command:

# cellcli -e list cell attributes eighthRack detail
eighthRack: TRUE

If a blank line is displayed then this is not an Eighth rack config.Above shows a Eighth rack has been detected

(a) Verify the current configuration with the following command:

(root)# /opt/oracle.SupportTools/resourcecontrol -show
[INFO] Validated hardware and OS. Proceed.
[SHOW] Number of cores active per socket: 18
[SHOW] Total number of cores active: 36
[root)#

For an eighth rack configuration, you should see 18 cores enabled. If that's what you see, then there are no configuration changes needed and the rest of this
procedure should not be used.

(b) If the output shows 36 cores enabled as shown above, we need to change the configuration with the following command:

(root)# /opt/oracle.SupportTools/resourcecontrol -core 18

Reboot the host:

# reboot

(root)# /opt/oracle.SupportTools/resourcecontrol -show
[INFO] Validated hardware and OS. Proceed.
[SHOW] Number of cores active per socket: 9
[SHOW] Total number of cores active: 18
(root)#

If the compute node is configured for CoD restore the config if required.Check the reported value is the same as previously recorded.

# dbmcli -e LIST DBSERVER attributes coreCount detail
coreCount: 36

If the value is incorrect then adjust the core count

# dbmcli -e ALTER DBSERVER pendingCoreCount = new_number_of_active_physical_cores

Verify the pending number of active physical cores using the following command:

# dbmcli -e LIST DBSERVER attributes pendingCoreCount

Now reboot the compute node.

If IaaS is configured:

Verify that all physical cores are active using the following command:

# dbmcli -e LIST DBSERVER attributes coreCount detail

Enable IaaS

# dbmcli

DBMCLI> ALTER DBSERVER iaasMode = "on"

DBMCLI> list dbserver ATTRIBUTES iaasMode detail
iaasMode: ON

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

This is for compute nodes NOT running OVM proceed as follows,if running OVM see later section "For compute node running OVM"

You can now hand the system back to the customer DBA to check all ASM or DB CRS services can be brought up and are online before obtaining sign-off. This step
may take more than 10 minutes to complete based on the current load on the database. See detailed information below. If the customer DBA requires assistance
beyond this, then you should direct them to callback the parent SR owner in EEST.

DB Node Startup Verification:

1. Startup CRS and re-enable autostart of crs. After the OS is up, the Customer DBA should validate that CRS is running. As root execute:

# . oraenv
ORACLE_SID = [root] ? +ASM1
The Oracle base for ORACLE_HOME=/u01/app/11.2.0/grid is /u01/app/oracle

# $ORACLE_HOME/bin/crsctl start crs
# $ORACLE_HOME/bin/crsctl check crs

Now re-enable autostart

# $ORACLE_HOME/bin/crsctl enable crs
or
# <GI_HOME>/bin/crsctl check crs

# <GI_HOME>/bin/crsctl enable crs

where GI_HOME environment variable is typically set to “/u01/app/11.2.0/grid” but will depend on the customer's environment.
In the above output the “1” of “+ASM1” refers to the DB node number. For example, Db node #3 the value would be +ASM3.
Example output when all is online is:

# /u01/app/11.2.0/grid/bin/crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

2. Validate that instances are running:

# ps -ef |grep pmon

It should return a record for the ASM instance and a record for each database.

For Compute Node running OVM

If the customer requires assistance please ask them to contact EEST engineer or parent case owner.

Once the compute node has booted ,re-enable user domains to autostart during Domain-0 boot.

# chkconfig xendomains on

Startup all user domains that are marked for auto start

# service xendomains start

See what user domains are running (compare against result from previously collected data)

# xm list

if any not auto-started then Startup a single user domain

# xm create -c /EXAVMIMAGES/GuestImages/DomainName/vm.cfg

Check that crs has started in user domains ,refer to previous section "DB Node Startup Verification"

PARTS NOTE:

7098505 System Board Assembly for Exadata X5-2 DB Nodes (Sun Server X5-2)

REFERENCE INFORMATION:

Service Manual's:
X5-2 DB’s: ( http://docs.oracle.com/cd/E41059_01/html/E48312/napsm.html#scrolltoc )
X5-2 Motherboard Replacement Procedure: MOS Note TO BE INSERTED when available.
MOS Note 1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration.
MOS Note ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM

MB Serial Number Reprogramming:
TLI MOS Note 1280913.1
Restricted Shell MOS Note 1302296.1
Exadata Database Machine Maint Guide: http://amomv0115.us.oracle.com/archive/cd_ns/E50790_01/doc/doc.121/e51951/toc.htm
EIS Checklist: http://eis.us.oracle.com/checklists/pdf/Exadata-X5-2-8B.pdf

Attachments

This solution has no attachment