![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 761868.1 : Oracle Exadata Diagnostic Information required for Disk Failures and some other Hardware issues
In this Document
Applies to:Exadata Database Machine V2 - Version All Versions and laterExadata Database Machine X2-8 - Version All Versions and later Exadata X3-8 Hardware - Version All Versions and later Oracle Exadata Storage Server Software - Version 11.1.0.3.0 and later Exadata Database Machine X2-2 Hardware Information in this document applies to any platform. PurposeThis document provides the set of commands required to collect diagnostic information required for disk failures on the Exadata Storage Servers and could be used on Db nodes or Storage Servers for some other hardware issues.
Note: sundiag is an Exadata node/cell tool. It works on both Linux and Solaris installations. For issues on Solaris platforms, additional OS data may be collected by the Explorer tool - <Document 1006990.1> Oracle Explorer Data Collector Implementation Best Practice. For issues on Linux platforms, additional OS data may be collected by the sosreport tool - <Document 1500235.1>.
Troubleshooting Steps--- Software Requirements/Prerequisites ---
The script collects diagnostic information required by Sun or HP. --- Configuring the Script ---
--- Running the Script ---
For Sun Oracle Exadata Environments:
For image versions 12.1.2.2.0 or later, use the sundiag.sh already included in /opt/oracle.SupportTools/sundiag.sh. For all systems using any image version prior to 12.1.2.2.0 then update the existing sundiag.sh to the latest v12.1.2.2.0_150917 version.
nodedb03: Archive: sundiag.zip
nodedb03: inflating: sundiag_12.1.2.2.0_150917.sh nodedb03: -r-xr-xr-x 1 root root 54919 Sep 17 19:49 sundiag_12.1.2.2.0_150917.sh nodedb03: 0e6fa48b54d7881b9fc8a252a9b068aa sundiag_12.1.2.2.0_150917.sh 3. Copy the new version of sundiag.sh to the default location
# /opt/oracle.SupportTools/sundiag.sh -h Oracle Exadata Database Machine - Diagnostics Collection Tool Version: 12.1.2.2.0.150917 By default sundiag will collect OSWatcher/ExaWatcher, Cell Metrics and traces, if there was an alert in the last 7 days. If there is more than one alert, latest alert is chosen to set the time range for data collection. Time range is 8hrs prior to and 1hr after the latest alert, for the total of 9 hrs e.g: latest alert timestamp = 2014-03-29T01:20:04-05:00 echo Time range = 2014-03-28_16:00:00 and 2014-03-29_01:00:00 User can also specify time ranges (as explained in usage below), which takes precedence over default behavior of checking for alerts Usage: /opt/oracle.SupportTools/sundiag.sh [ilom | snapshot] [osw <time ranges>] osw - This argument when used expects value of one or more comma separated time ranges. OSWatcher/ExaWatcher, cell metrics and traces will be gathered in those time ranges. The format for time range(s) is <from>-<to>,<from>-<to> and so on without spaces where <from> and <to> format is <date>_<time> <date> and <time> format should be any valid format that can be recognized by 'date' command. The command 'date -d <date>' or 'date -d <time>' should be valid e.g: /opt/oracle.SupportTools/sundiag.sh osw 2014/03/31_15:00:00-2014/03/31_18:00:00 Note: Total time range should not exceed 9 hrs. Only the time ranges that fall within this limit are considered for the collection of above data ilom - User level ILOM data gathering option via ipmitool, in place of separately using root login to get ILOM snapshot over the network. snapshot - Collects node ILOM snapshot- requires host root password for ILOM to send snapshot data over the network.
This list of the files created by version 12.1.2.2.0_150917 of sundiag.sh:
asr cell disk etc_configs etc_sysconfig_net fru-print_ipmitool.out ilom imagehistory-all.out imageinfo-all.out messages mrdiag net osw RackMasterSN raid SerialNumbers stderr.txt sysconfig var_log_cellos .version_sundiag
By default sundiag will collect OSWatcher/ExaWatcher, Cell Metrics and traces, if there was an alert in the last 7 days. If there is more than one alert, the latest alert is chosen to set the time range for data collection. Time range is 8hrs prior to and 1hr after the latest alert, for the total of 9 hrs e.g: latest alert timestamp = 2014-03-29T01:20:04-05:00 User can also specify time ranges, which takes precedence over default behavior of checking for alerts. This argument when used expects a value of one or more comma separated time ranges. OSWatcher/ExaWatcher, cell metrics and traces will be gathered in those time ranges. The format for time range(s) is <from>-<to>,<from>-<to> and so on without spaces where <from> and <to> format is <date>_<time>. Note: Total time range should not exceed 9 hrs. Only the time ranges that fall within this limit are considered for the collection of above data. This is to limit the amount of data being gathered to be appropriate for the problem being analysed. Execution will create a date stamped tar.bz2 file in /var/log/exadatatmp/sundiag_<hostname>_<serial#>_<date/time>.tar.bz2 (in /tmp on v1.4-1.5.1) including OS Watcher archive logs. These logs may be very large.
If there are concerns about entering the host 'root' password, then an alternative option is provided using the "# /opt/oracle.SupportTools/sundiag.sh ilom" which will use IPMI to gather user-level ILOM outputs. This is usually good but the ILOM snapshot level can provide more underlying ILOM outputs for troubleshooting issues with ILOM and system faults that the user-level data may not provide.
For gathering sundiag.sh outputs on versions of sundiag.sh, where the filename is unique for each node (v1.4 and later), use the following from DB01: 1. [root@exadb01 ~]# cd /opt/oracle.SupportTools/onecommand (or wherever the all_group file is with the list of the rack hostnames) 3. Verify there is output in /tmp or /var/log/exadatatmp/ on each node: [root@exadb01 onecommand]# dcli -g all_group -l root --serial 'ls -l /var/log/exadatatmp/sundiag* ' (v12.1.2.2.0_150917) 4. Make a temporary directory to copy for zipping: It is recommended the date be of the format YYMMDD year, month, day for SR's where multiple days of analysis may be required. 5. Copy the generated sundiag files from the nodes to the temporary directory (/tmp on v1.4-1.5.1, /var/log/exadatatmp on v12.1.2.2.0_150917): [root@exadb01 onecommand]# for H in `cat all_group`; do scp -p $H:/tmp/sundiag*.tar.bz2 dbm01_sundiags_date ; done [root@exadb01 onecommand]# for H in `cat all_group`; do scp -p $H:/var/log/exadatatmp/sundiag*.tar.bz2 dbm01_sundiags_date ; done 6. Bundle them into a single file for upload to Oracle: [root@exadb01 ~]# tar jcvf exa_rack_sundiag_date.tar.bz2 dbm01_sundiags_date
#!/bin/ksh
mdate=`/bin/date '+%Y-%m-%d-%H:%M:%S'` mkdir -p /tmp/info_deaddisk_$mdate cd /tmp/info_deaddisk_$mdate cellcli -e alter cell shutdown services ms MSSTATUS='cellcli -e list cell attributes msstatus' if [ `$MSSTATUS` != 'stopped' ] then echo MS did not stop!!!! echo -n 'Continuing may not be safe, do you whish to continue (y|N)? ' read Answer if [ `echo $Answer` = 'Y' -o `echo $Answer` = 'y' ] then echo Coninuing to dump diagnostics else exit 0 fi fi echo "Starting to collect disk information....." hpacucli ctrl all show config detail > `hostname -a`_$mdate.hpacucli.txt hpaducli -f `hostname -a`_$mdate.hpaducli.txt hpaducli -x -f `hostname -a`_$mdate.hpaducli.xml cellcli -e alter cell startup services ms MSSTATUS='cellcli -e list cell attributes msstatus' if [ `$MSSTATUS` != 'running' ] then echo MS did not start!!!! echo If this is unexpected, please contact Oracle Customer Support echo dead disk diagnostics were collected but the state of MS is unknown fi cp /var/log/messages . cp /var/spool/compaq/hpasm/registry/serial_output/* . cd .. zip -r info_deaddisk_$mdate info_deaddisk_$mdate Community DiscussionsStill have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window is the live community not a screenshot) Click here to open in main browser window References<NOTE:1006990.1> - Oracle Explorer Data Collector Implementation Best Practice<NOTE:1500235.1> - How To Collect an Sosreport on Oracle Linux Attachments This solution has no attachment |
||||||||||||||||||||||||||
|