Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1968764.1
Update Date:2018-04-05
Keywords:

Solution Type  Technical Instruction Sure

Solution  1968764.1 :   How to Replace Motherboard in Storage Cell in Exadata Database Machine [X5-2/X6-2]  


Related Items
  • Oracle SuperCluster T5-8 Full Rack
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Zero Data Loss Recovery Appliance X6 Hardware
  •  
  • Exadata SL6 Hardware
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • Exadata X5-2 Eighth Rack
  •  
  • Exadata X5-2 Full Rack
  •  
  • Exadata X6-2 Hardware
  •  
  • Exadata X6-8 Hardware
  •  
  • Exadata X5-2 Hardware
  •  
  • Exadata X5-2 Quarter Rack
  •  
  • Exadata X4-8 Hardware
  •  
  • Exadata X5-2 Half Rack
  •  
  • Zero Data Loss Recovery Appliance X5 Hardware
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: CAP is for field to use

Applies to:

Oracle SuperCluster T5-8 Hardware - Version All Versions and later
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
Oracle SuperCluster T5-8 Half Rack - Version All Versions and later
Exadata X5-2 Eighth Rack - Version All Versions and later
Exadata X5-2 Half Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

  Canned Action Plan for replacing a Motherboard in Storage Cell in an Exadata Database Machine [X5-2/X6-2]

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Exadata trained

TIME ESTIMATE: 120 Minutes

TASK COMPLEXITY: 3

 

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: A server in an Exadata Database Machine requires the motherboard to be replaced. This procedure is specific to Exadata X5-2/X6-2 systems based on Oracle Server X5-2/X6-2 and Oracle Server X5-2L/X6-2L.

Connectivity to the rack will depend on the customer's access requirements. The following procedure partially requires serial connection and network access to ILOM which assumes using a laptop attached to the Cisco management switch. If no port is available in a full rack, then temporarily disconnect a port used for another host's ILOM (e.g. port 2). If the customer does not allow login access to the host ILOM, then they will need to run the commands given below.  

When connecting to ILOM via serial cable remember that the baud rate is 9600 for replacement boards. This will get changed during the post-install procedure to the Exadata default which is 115200 for installed boards. 

 To make room for needed text MB = Motherboard

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

Pre-Install Steps:

1. Backup ILOM Settings.

Assuming the ILOM is not the reason for the replacement of the system MB, then take a current backup of the ILOM SP configuration using a browser under “ILOM Administration → Configuration Management” tab on the left menu list.

This can also be done from the ILOM CLI as follows:-

-> cd /SP/config
-> set passphrase=welcome1
-> set dump_uri=scp://root:password@laptop_IP/var/tmp/SP.config

  

2.  Obtain the correct Serial Numbers required.

(a) Make a note of the System Serial Number from the front label of the server.
(b) Make a note of the Rack Master Serial Number from the front label of the rack (left-side vertical wall, half way up the rack).

3. If the system is not down already due to whatever problem is causing the MB to be replaced, then have the customer DBA shut the node down. 

(a) For Extended information on this section check MOS Note ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM

This is also documented in the Exadata Owner's Guide in chapter 7 section titled “Maintaining Exadata Storage Servers” subsection “Shutting Down Exadata Storage Server” available on the customer's cell server image in the /opt/oracle/cell/doc.

http://amomv0115.us.oracle.com/archive/cd_ns/E13877_01/doc/doc.112/e13874/maintenance.htm#autoId33

In the following examples the SQL commands should be run by the Customers DBA prior to doing the hardware replacement. These should be done by the field engineer only if the customer directs them to, or is unable to do them. The Cellcli commands will need to be run as root.

(b) ASM drops a disk shortly after it/they are taken offline. The default DISK_REPAIR_TIME attribute value of 3.6hrs should be adequate for replacing components, but may have been changed by the customer. To check this parameter, have the customer log into ASM and perform the following query:

SQL> select dg.name,a.value from v$asm_attribute a, v$asm_diskgroup dg where a.name = 'disk_repair_time' and a.group_number = dg.group_number;

 

As long as the value is large enough to comfortably replace the components being replaced, then there is no need to change it.

(c) Check if ASM will be OK if the grid disks go OFFLINE.

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
...snipit ...
DATA_CD_09_cel01 ONLINE Yes
DATA_CD_10_cel01 ONLINE Yes
etc....

 

If one or more disks return asmdeactivationoutcome='No', then wait for some time and repeat step (b). Once all disks return asmdeactivationoutcome='Yes', proceed to the next step.

(d) Run cellcli command to Inactivate all grid disks on the cell that needs to be powered down for maintenance. (this could take up to 10 minutes or longer)

# cellcli
...
CellCLI> ALTER GRIDDISK ALL INACTIVE
GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
...etc...

 

(e) Execute the command below and the output should show asmmodestatus='UNUSED' or 'OFFLINE' and asmdeactivationoutcome=Yes for all griddisks once the disks are offline and inactive in ASM.

CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
...
DATA_CD_00_dmorlx8cel01 inactive OFFLINE Yes
DATA_CD_01_dmorlx8cel01 inactive OFFLINE Yes
...etc...

  

(f) Once all disks are offline and inactive, the customer or field engineer may shutdown the Cell using the following command:

# shutdown -hP now

  

(g) The field engineer can now slide out the server for maintenance.  Remember to disconnect the power cords before opening the top of the server.  Do not remove any cables prior to sliding the server forward, or the loose cable ends will jam in the cable management arms.

 

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Note: You must use the removal/insertion CPU tool for the X5-2L server.  If you have not used this new tool before please make yourself familiar before attempting to use on-site.  The tool is not intuitive so reference the service manual before attempting this service action.

 
Caution  -  During the MB removal procedure, it is important to label power supplies with the slot numbers from which they were removed (PS0, PS1). This is required because the power supplies must be reinstalled into the slots from which they were removed; otherwise, the server key identity properties (KIP) data might be lost. When a server requires service, the KIP is used by Oracle to verify that the warranty on the server has not expired. For more information on KIP, see FRU Key Identity Properties (KIP) Automated Update.

Reference links for Service Manual's:

X5-2L Cell’s: (http://docs.oracle.com/cd/E41033_01/html/E48325/cnpsm.z40001d31037512.html#scrolltoc)

 

Physical Replacement Steps:

1. Replace the MB as per MOS Note (How to Remove and Replace a Motherboard Assembly in an Oracle Server X5-2. (Doc ID 1992420.1)

 

NOTE:- On Storage Cells, remember to move the internal USB stick onto the new board.

  

  NOTE:- Pull power cords before opening the top cover to avoid a SP degraded condition.

  

2. Carefully follow the port numbers on the cables when re-attaching so they are not reversed. It is easiest to plug cables in while the server is in the fully extended maintenance position.

3. Do not power up the system yet, just ILOM

 

Post-Installation Steps:

1. Update the Serial Number on the new MB, to that of the server chassis. This is REQUIRED in order for ASR to continue to work on the unit, and is required for all servers that are part of Exadata racks that may have a future Service Request, whether ASR is configured now or not.

These platforms use the Top Level Indicator (TLI) feature in ILOM to perform the MB serial number update automatically. In certain circumstances this may not work correctly and will need to be manually corrected. For more information on TLI and restricted shell please refer to the following 2 MOS notes for these systems:-

TLI MOS Note 1280913.1

Restricted Shell MOS Note 1302296.1

NOTE: The serial numbers of each server can be found at the front on the left hand side.

  

(a) Connect to ILOM via serial port, and login as “root” with default password “changeme”.

(b) Enter Restricted Mode.

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

  

 (c) Review the current PSNC containers with “showpsnc” command: 

[(restricted_shell) exdx5-tvp-a-db1-sp:~]# showpsnc
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0

Element           | Primary           | Backup1           | Backup2
------------------+-------------------+-------------------+-------------------
PPN                 7090664             7090664             7090664
PSN                 1450NM104V          1450NM104V          1450NM104V
Product Name        ORACLE SERVER X5-2  ORACLE SERVER X5-2  ORACLE SERVER X5-2
[(restricted_shell) exdx5-tvp-a-db1-sp:~]#

  

If the replacement has the correct product serial number, in all 3 containers including “/SYS/MB” then skip to step 2 of the post-replacement procedures. If the replacement does not have the product serial number populated correctly, then “exit” out of restricted shell mode and continue:

[(restricted_shell) exdx5-tvp-a-db1-sp:~]# exit
exit
-> 

  

(d) Where there is at least one container which still contains valid TLI information (usually the primary disk backplane DBP0), a service mode command copypsnc can be used to update the product serial number.

i. Login as root and create escalation mode user with service role:

-> cd /SP/users
-> create sunny role=aucros (will ask for password)

  

ii. Gather “version”, “show /SYS” and “show /SP/clock” outputs needed for generating the service mode password:

-> version
SP firmware 3.2.4.12
SP firmware build number: 94599
SP firmware date: Mon Nov 17 13:07:41 EST 2014
SP filesystem version: 0.2.10


-> show /SYS
..............
    Properties:
        type = Host System
        ipmi_name = SYS
        product_name = ORACLE SERVER X5-2L
        product_part_number = 7090697
        product_serial_number = XXXXXXXXXX
        product_manufacturer = Oracle Corporation
        fault_state = OK
        clear_fault_action = (none)
        power_state = On


-> show /SP/clock
..............
    Properties:
        datetime = Wed Feb  4 05:35:50 2015
        timezone = PST (America/Los_Angeles)
        uptime = 5 days, 07:03:01
        usentpserver = disabled

  

iii. Generate a service mode password using “http://modepass.us.oracle.com/” Login is via Oracle Single-Sign-On. Example output of the tool is:

BRAND : sun
MODE : service
VERSION : 3.2.4.10
SERIAL : 00000000
UTC DATE : 05/20/2013 16:00
POP DOLL PHI TOW BRAN TAUT FEND PAW SKI SCAR BURG CEIL MINT DRAB KAHN FIR MAGI LEAF LIMB EM LAWS BRAE DEAL BURN GOAL HEFT
HEAR KEY SEE A

  

 iv. Logout of root and log back in as 'sunny' user that you created, and enter Service mode:

-> set SESSION mode=service
Password:*** **** *** *** **** **** **** *** *** **** **** **** **** **** **** *** **** **** **** ** **** **** **** **** **** **** **** *** *** *
Short form password is: ARMY ULAN HULL

Currently in service mode.

  

v. Correct the invalid containers using the “copypsnc” command:

-> copypsnc
Number of arguments is incorrect.
Usage:
copypsnc [-n] <src> <dest>
where <src> is PRIMARY|BACKUP1|BACKUP2  
      <dest> is PRIMARY|BACKUP1|BACKUP2
      -n: If src is a bilingual FRU, copy from new-style record.

PRIMARY: fruid:///SYS/DBP0
BACKUP1: fruid:///SYS/MB
BACKUP2: fruid:///SYS/PS0

-> copypsnc BACKUP1 PRIMARY

  

The copypsnc command produces no output upon success. 

vi. After running copypsnc, the service processor should be rebooted.

-> reset /SP

  

vii. Log in again as 'root' user with default password 'changeme' and verify the SN is now populated correctly using 'show /SYS' and 'showpsnc' as shown above.

viii. Remove the 'sunny' user:

-> delete /SP/users/sunny

 

If there are any issues with programming the serial number with “copypsnc” then an escalation mode password and instructions will need to be provided by the TSC x86 engineer assigned to the SR.
 

2. Re-flash the ILOM/BIOS to the correct levels required for Exadata.

The image on the Exadata Storage Cells contains the firmware and will automatically re-flash it if it is not correct during boot. You do not need to do any flash updates manually.

(a) Power up system using front button or ILOM  "-> start /SYS"

(b) During the boot validation phase, CheckHWnFWProfile will run and determine the ILOM is not correct, and automatically flash it. The server will be powered off during this, ILOM will reset, and after 10 minutes of being off to allow ILOM reset and BIOS flash update, the server host will be automatically powered back on. It is recommended to be connected to the serial console and monitor the host console through ILOM to verify this completes successfully.

 

3. Restore the backed up SP configuration done during the pre-installation steps.

(a) Using a browser under Maintenance Tab or from ILOM cli:-

-> cd /SP/config
-> set passphrase=welcome1
-> set load_uri=scp://root:password@laptop_IP/var/tmp/SP.config

  

If SP backup was not possible check with customer for network information & use another ILOM within the rack for general settings. The primary specific setup for Exadata are:

i. Serial Baud rate is 115200 for external and host
-> show -l all /SP/serial
speed  = 115200  for  "external" and "host"
if this is not 115200 (ILOM default is 9600, Exadata default is 115200), then they should be set:

-> set /SP/serial/external pendingspeed=115200
-> set /SP/serial/external commitpending=true
-> set /SP/serial/host pendingspeed=115200
-> set /SP/serial/host commitpending=true

  

ii. /SP system_identifer is set to the appropriate rack type string and master Rack Serial Number. This is critical for ASR deployments. The Master Rack Serial number can be obtained top left inside the cabinet or from show /SP on any other ILOM. The string should be of the following format:

X5-2 - “Exadata Database Machine X5-2 <Rack SN>”

For Example:

        check_physical_presence = false
        current_hostname = ORACLESP-1449NM702E
        hostname = (none)
        reset_to_defaults = none
        system_contact = svcid pn|Exadata X5-2| sn|AK00000000| name|Exadata
                         X5-2|
        system_description = ORACLE SERVER X5-2L, ILOM v3.2.4.12, r94599
        system_identifier = Exadata Database Machine X5-2 AK00000000
        system_location = (none)

  

iii. /SP hostname is setup

iv. /SP/network settings

v. /SP/alertmgmt rules that may have been previously setup by ASR or cell configuration

vi. /SP/clock timezone, datetime, and /SP/clients/ntp NTP settings

vii. /SP/clients/dns Name service settings

viii. root account password. If the root password has not been provided you can have the customer do this, or do this manually:

-> set /SP/users/root password=welcome1 (or customers password)
Changing password for user /SP/users/root...
Enter new password again: ********
New password was successfully set for user /SP/users/root

  

(b) Reset the ILOM under the Maintenance Tab or from ILOM cli:

-> reset /SP

  

(c) Check you can login to all interfaces and ILOM can be accessed using a browser or ssh from another system on the customer's management network.

4. Once the SP is re-configured, power on the server by pressing the power button at the front and enter the BIOS Setup Utility.  To enter the BIOS Setup Utility, press the F2 key (Ctrl+E from a serial connection) when prompted and while the BIOS is running the power-on self-tests (POST). Check BIOS settings against EIS checklist, in particular make sure USB is first in boot order (the original USB stick should have been moved from the old board to the new board), and check date and time is correct.


5. This step is only required on Extreme Flash Storage Cells, if this is a High Capacity Storage Cell skip to step 6:

Important - When NVMe cables are removed or replaced between the storage drive backplane and NVMe switch cards, you must perform the procedure in this section to confirm that all NVMe cable connections are correct. If the NVMe cable connections are not correct, the storage server operating system should not be allowed to boot, as it could cause a problem with disk drive mapping.

Log into the ILOM CLI and enter restricted mode and run the NVMe cable connection test.

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

[(restricted_shell) exdx5-tvp-a-cel3-sp:~]#
[(restricted_shell) exdx5-tvp-a-cel3-sp:~]#
[(restricted_shell) exdx5-tvp-a-cel3-sp:~]# hwdiag io nvme_test
HWdiag (Restricted Mode) - Build Number 94599 (Nov 17 2014, 18:59:38)
         Current Date/Time: Apr 11 2015, 01:21:34
    Checking NVME drive fru contents...
        checking fru on drive NVMe 0              OK
        checking fru on drive NVMe 1              OK
        checking fru on drive NVMe 3              OK
        checking fru on drive NVMe 4              OK
        checking fru on drive NVMe 6              OK
        checking fru on drive NVMe 7              OK
        checking fru on drive NVMe 9              OK
        checking fru on drive NVMe 10             OK
    NVME drives fru check:                        PASSED

    Checking NVME drive pcie links...
        checking pcie link on drive NVMe 0        OK
        checking pcie link on drive NVMe 1        OK
        checking pcie link on drive NVMe 3        OK
        checking pcie link on drive NVMe 4        OK
        checking pcie link on drive NVMe 6        OK
        checking pcie link on drive NVMe 7        OK
        checking pcie link on drive NVMe 9        OK
        checking pcie link on drive NVMe 10       OK
    NVME drives pcie link check:                  PASSED

    Checking NVME drive DSN...
        checking DSN on drive NVMe 0              OK
        checking DSN on drive NVMe 1              OK
        checking DSN on drive NVMe 3              OK
        checking DSN on drive NVMe 4              OK
        checking DSN on drive NVMe 6              OK
        checking DSN on drive NVMe 7              OK
        checking DSN on drive NVMe 9              OK
        checking DSN on drive NVMe 10             OK
    NVME drives DSN check:                        PASSED

    Checking NVME cabling...
        Cables associated with Switch Card 3 in PCIe Slot 6 verified
        Cables associated with Switch Card 2 in PCIe Slot 5 verified
        Cables associated with Switch Card 1 in PCIe Slot 2 verified
        Cables associated with Switch Card 0 in PCIe Slot 1 verified
    NVME cable check:                             PASSED

NVME test PASSED
[(restricted_shell) exdx5-tvp-a-cel3-sp:~]#


If everything PASSED as shown above them continue to step 2.  If there are any fail statuses then the cable issue must be resolved before going past this step.  Remove the AC power cords again, and correct the cable issue.  After each cable change the above step should be repeated until everything passes. If everything passes exit the restricted shell and enter the serial console again to the BIOS setup menu.

6. In the BIOS setup menu go to "Exit" tab and select "Save Changes and Exit"

7. As the system boots the hardware/firmware profile will be checked, and either a green “Passed” will be displayed, or a red “Warning” that something with the hardware or firmware does not match what is expected.  If the check passes, then everything is correct and up, and the boot will continue up to the OS login prompt.  If the check fails, then the issue being flagged should be investigated and rectified before continuing.

8. Additional OS checks:-

(a) Verify the network interfaces have correctly picked up the new MAC addresses of the new system board: 

# ifconfig eth0 (for each eth1/bondeth0 etc.)

OR

# ipmitool sunoem cli "show /SYS/MB/NET0" (for each NIC NET0/1/2/3)

OR from ILOM

-> show /SYS/MB/NET0
/SYS/MB/NET0
Targets:
Properties:
type = Network Interface
ipmi_name = MB/NET0
fru_description = 10G Ethernet Controller
fru_manufacturer = INTEL
fru_part_number = X540
fru_macaddress = 00:10:e0:3e:51:ec
fault_state = OK
clear_fault_action = (none)

 

Compare this to the following network configuration files under:

/etc/sysconfig/network-scripts/ifcfg-ethX

  

where X is a 0 (ifcfg-eth0=NET0), 1 (ifcfg-eth1=NET1), 2 (NET2), 3 (NET3), or “ifcfg-bondeth0” if a db node with bonding.

Example of file output :-

#### DO NOT REMOVE THESE LINES ####
#### %GENERATED BY CELL% ####
DEVICE=eth0
BOOTPROTO=static
ONBOOT=yes
IPADDR=10.167.166.90
NETMASK=255.255.252.0
NETWORK=10.167.164.0
BROADCAST=10.167.167.255
GATEWAY=10.167.164.1
HOTPLUG=no
IPV6INIT=no
HWADDR=00:21:28:46:ef:8a

 

If there is any inconsistency with the new MAC addresses or IP, there should be backup files ending .bak in the same directory so use these and the ILOM information to update the files with the correct information.

(b) Verify that the management network is working:

# ethtool eth0 | grep det
Link detected: yes

 
(c) Verify that the ILOM management network is working:

# ipmitool sunoem cli 'show /SP/network' | grep ipadd
ipaddress = 192.168.1.108
pendingipaddress = 192.168.1.108
[root@db01 ~]# ping -c 3 192.168.1.108
PING 192.168.1.108 (192.168.1.108) 56(84) bytes of data.
64 bytes from 192.168.1.108: icmp_seq=1 ttl=64 time=0.625 ms
64 bytes from 192.168.1.108: icmp_seq=2 ttl=64 time=0.601 ms
64 bytes from 192.168.1.108: icmp_seq=3 ttl=64 time=0.606 ms
--- 192.168.1.108 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 3199ms
rtt min/avg/max/mdev = 0.601/0.608/0.625/0.026 ms

 

(d) Verify that all memory is present in Linux. High Capacity X5-2 Storage Cells have 96GB in total. Extreme Flash X5-2 Storage Cells have 64GB in total.

 # grep MemTotal /proc/meminfo
MemTotal:       65583428 kB

  - this may vary depending on BIOS version.

 

(e) Verify HW Profile is operating correctly.

# /opt/oracle.SupportTools/CheckHWnFWProfile
[SUCCESS] The hardware and firmware matches supported profile for server=ORACLE_SERVER_X5-2

 

(f) Verify the InfiniBand connections are up and actively seen in the fabric:

If possible to login to DB01, then check InfiniBand connections are ok by running the following from DB01:

# cd /opt/oracle.SupportTools/ibdiagtools
# ./verify-topology 

  

 (options to verify-topology may be required depending on configuration)

If not possible for security reasons, then on this local node, verify the IB connection status with:

# ibstatus (Looking for both link ports up and active at 40Gb/s (4X QDR))
# ibdiagnet (Looking for any fabric errors that might suggest a link or cabling failure)
# ibnetdiscover (Looking for ability to see all expected switches and other DB nodes and cells in the IB fabric)

  

(g) Verify server functionality per the EIS checklist server and common check sections

  

(h) If dcli is setup for password-less SSH, then the SSH keys need to be updated for new mac address. The customer should be able to do this using their root password.

 

(i) This step is only required on the Extreme Flash Storage Cell:

Once the storage cell is booted up check that all the NVMe devices are present:

The following command should show 8 or 12 NVMe devices on an Extreme Flash storage cell:

[root@exdx5-tvp-a-cel3 ~]# nvmecli --identify --all | grep /dev
/***************** NVMe Device /dev/nvme0n1 ******************/
/***************** NVMe Device /dev/nvme1n1 ******************/
/***************** NVMe Device /dev/nvme2n1 ******************/
/***************** NVMe Device /dev/nvme3n1 ******************/
/***************** NVMe Device /dev/nvme4n1 ******************/
/***************** NVMe Device /dev/nvme5n1 ******************/
/***************** NVMe Device /dev/nvme6n1 ******************/
/***************** NVMe Device /dev/nvme7n1 ******************/


Also check that the OS can see the NMVe devices:

[root@exdx5-tvp-a-cel3 ~]# lspci | grep 0953
05:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
07:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
25:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
27:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
86:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
88:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
96:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
98:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)


If this is not correct, then there is a problem with the disk volumes that may need additional assistance to correct. The server should be re-opened and the device connections and boards checked to be sure they are secure and well seated BEFORE the following CellCLI commands are issued.

 

 OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?

You can now hand the system back to the customer DBA to check all ASM or DB CRS services can be brought up and are online before obtaining sign-off.  This step may take more than 10 minutes to complete based on the current load on the database. See detailed information below. If the customer DBA requires assistance beyond this, then you should direct them to callback the parent SR owner in EEST.

Cell Node Startup Verification:

1. Activate the grid disks: 

 

# cellcli

CellCLI> alter griddisk all active
GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
...etc...

Issue the command below and all disks should show 'active': 

CellCLI> list griddisk
DATA_CD_00_dmorlx8cel01 active
DATA_CD_01_dmorlx8cel01 active
...etc...

  

2. Verify all grid disks have been successfully put online using the following command. Wait until asmmodestatus is ONLINE for all grid disks. The following is an example of the output early in the activation process.

CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
DATA_CD_00_dmorlx8cel01 active ONLINE Yes
DATA_CD_01_dmorlx8cel01 active ONLINE Yes
DATA_CD_02_dmorlx8cel01 active ONLINE Yes
RECO_CD_00_dmorlx8cel01 active SYNCING Yes
...etc...

  

Notice in the above example that RECO_CD_00_dmorlx8cel01 is still in the 'SYNCING'  process.

Oracle ASM synchronization is only complete when ALL grid disks show asmmodestatus=ONLINE.  This process can take some time depending on how busy the machine is, and has been while this individual server was down for repair.

 

PARTS NOTE:

7098504 System Board Assembly for Exadata X5-2 Storage Cells (Sun Server X5-2L)

REFERENCE INFORMATION:

Service Manual's:
X5-2L Cell’s: (http://docs.oracle.com/cd/E41033_01/html/E48325/cnpsm.z40001d31037512.html#scrolltoc)

X5-2L Motherboard Replacement Procedure:1993394.1

MOS Note 1093890.1 Steps To Shutdown/Startup The Exadata & RDBMS Services and Cell/Compute Nodes On An Exadata Configuration.

MOS Note ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback