Asset ID: |
1-72-2119546.1 |
Update Date: | 2016-03-24 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2119546.1
:
SuperCluster - RDSinfoExaWatcher.sh in Exawatcher does not collect rds-ping information
Related Items |
- SPARC SuperCluster T4-4 Full Rack
- Solaris Operating System
- Oracle SuperCluster T5-8 Full Rack
- Oracle SuperCluster M7 Hardware
- Oracle Database - Enterprise Edition
- Oracle SuperCluster T5-8 Half Rack
- SPARC SuperCluster T4-4 Half Rack
- Oracle SuperCluster T5-8 Hardware
- Oracle SuperCluster M6-32 Hardware
|
Related Categories |
- PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
|
In this Document
Applies to:
Oracle SuperCluster T5-8 Hardware - Version All Versions and later
SPARC SuperCluster T4-4 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
Solaris SPARC Operating System - Version 10 1/13 U11 to 11.3 [Release 10.0 to 11.0]
Oracle SuperCluster M7 Hardware - Version All Versions and later
Oracle Solaris on SPARC (64-bit)
Symptoms
Exawatcher collections on SuperCluster are missing this key diagnostic data point. Without it , it can become hard to narrow down to the exact time that RDS activity started experiencing issues. This applies to all LDoms ( global zones) and local zones.
Changes
NA
Cause
The script was modified years ago because of rds defects that have long been fixed.
Solution
#cd /opt/oracle.ExaWatcher
#mv RDSinfoExaWatcher.sh
#./StopExaWatcher.sh
#vi RDSinfoExaWatcher.sh
#svcadm enable ExaWatcher
Insert in the following code. Please note that even though the notes show the script named changed make sure the file you created is the old name. This is just to differentiate us from the Exadata version.
#!/bin/ksh
#
# SuperClusterRDSinfoExaWatcher.sh
#
# Copyright (c) 2013, 2016, Oracle and/or its affiliates. All rights reserved.
#
# NAME
# SuperClusterRDSinfoExaWatcher.sh
#
# DESCRIPTION
# This is a script to collect information related to RDS on SuperCluster
# DB domains and zones. Useful for diagnosis following RAC node evictions
#
# NOTES
# This script is very modified from the original script designed to run
# on Linux Exadata DB & cells nodes. Because of the highly virtualized
# nature of SuperCluster, you can't easily determine cluster members from
# things such as ibhosts command or cellip.ora file. So it uses Solaris 11
# networking commands plus the output of rds-info to identify the RDS
# connections established by RAC/grid, and rds-pings those addresses.
# This script is designed only to run in Solaris 11 (or later) DB domains &
# zones. It is not meant to run in Solaris 10 application domains. If this
# script is run in Solaris 11 app or root domains it will do nothing since
# no RDS connections will have be established by RAC/grid.
#
# MODIFIED (MM/DD/YY)
# jamgates 01/27/16 - re-write for SuperCluster
# jamgates 01/29/16 - minor fixes
#
umask 0037
echodo() { echo "# $@" ; "$@" ; }
check_os()
{
# If this is an S10 domain or zone then exit
if [[ `uname -r` == "5.10" ]]; then
echo "[ERROR:`date +'%F %T %Z'`] This domain is running Solaris 10. RDS info is not relevant in a domain running anything less than Solaris 11. Exiting ...."
exit
fi
}
do_dlstat_on_ib_links()
{
# This function gets all links on the Exadata (FFFF) IB partition
# and runs dlstat 4 times with an interval of 1 second. The first
# row output shows the total numbers since the creation of the link.
# The subsequent rows show the nomalized (per second) statistics.
for LINK in `dladm show-part -p -o LINK,PKEY | grep ":FFFF$" | cut -d: -f1`
do
echodo dlstat -Z $LINK 1 4
done
}
get_my_local_exadata_ib_ip_addresses()
{
# This function grabs local IP addresses on all IPMP groups on the
# Exadata (FFFF) IB partition. These will have connectivity to all
# cells and should be specified as the "cluster_interconnects"
# parameter in all DB init.ora files
IPMPSTAT=`ipmpstat -o INTERFACE,GROUP,ACTIVE -P -i`
for LINK in `dladm show-part -p -o LINK,PKEY | grep ":FFFF$" | cut -d: -f1`
do
for GROUP in `echo "$IPMPSTAT" | grep "^$LINK:" | grep ":yes$" | cut -d: -f2`
do
LOCAL_ADDR="$LOCAL_ADDR "`ipadm show-addr -p -o ADDR $GROUP | cut -d/ -f1`
# Simultaneous calls to ipmpstat can overload in.mpathd.
# Sleep between each call to
done
done
if [[ "$LOCAL_ADDR" == "" ]]; then
echo "[WARNING:`date +'%F %T %Z'`] No local IB IP addresses are configured on the Exadata (FFFF) partition. Either this domain/zone is mis-configured or this isn't a DB domain"
echo ""
else
echo "[INFO:`date +'%F %T %Z'`] My (`hostname`) local IB IP addresses:"
echo ""
echo $LOCAL_ADDR
echo ""
fi
}
get_remote_exadata_ib_ip_addresses()
{
# This function uses rds-info to identify remote addresses of all
# current RDS connections. These will correspond to the other DB
# nodes & cells in the same RAC clusters as this domain or zone
# and will have been established by the RAC DBs on this domain or
# zone
rds-info -n | while read LOC REM TOS NEXTTX NEXTRX FLGS
do
if [[ "$FLGS" == *"C-" ]]; then
# Flags containing --C- means the remote host is
# successfuly connected, so add it to the list
REMOTE_ADDR="$REMOTE_ADDR "$REM
fi
done
if [[ "$REMOTE_ADDR" == "" ]]; then
echo "[WARNING:`date +'%F %T %Z'`] No established RDS connections (is RAC running?)"
echo ""
else
# Sort the remote address list into unique addresses
REMOTE_ADDR=`echo $REMOTE_ADDR | tr " " "\n" | sort -u`
fi
}
split_remote_ib_ip_addresses()
{
# This function splits the list of remote IP addresses into cell
# nodes and DB nodes. This can be deduced a number of ways, not all
# reliable though. Probably the simplest & most reliable is to check
# for the address in the cellip.ora file. If the address isn't in the
# file, we assume its a DB node.
CELLIP=/etc/oracle/cell/network-config/cellip.ora
if [[ -r $CELLIP && -s $CELLIP ]]; then
for ADDR in $REMOTE_ADDR
do
grep -q "cell=\"$ADDR\"" /etc/oracle/cell/network-config/cellip.ora
if [[ $? -eq 0 ]]; then
CELLS="$CELLS "$ADDR
else
DBNODES="$DBNODES "$ADDR
fi
done
else
# cellip.ora is empty or doesn't exist? Plan B is to check the
# output of 'ibhosts' which identifies storage cells with
# "hostname C IP address[,IP address...] HCA-#" in the node
# descriptor field.
IBHOSTS=`ibhosts`
for ADDR in $REMOTE_ADDR
do
echo "$IBHOSTS" | grep -q " C.*[ ,]$ADDR[ ,].*HCA-"
if [[ $? -eq 0 ]]; then
CELLS="$CELLS "$ADDR
else
DBNODES="$DBNODES "$ADDR
fi
done
fi
echo "[INFO:`date +'%F %T %Z'`] Connected remote IB IP addresses:"
echo ""
echo "Cells: "$CELLS
echo "DB Nodes: "$DBNODES
echo ""
}
do_rds_ping()
{
# This function gets all local & remote IB IP addresses and rds-pings
# each remote address from each local address. The ping is performed
# 4 times with a (default) 1 second timeout. This is so we see a
# reasonable sample or response times (since the first rds-ping can
# often take a lot longer than subsequent). Note we don't ping local
# addresses from local addresses because a) That doesn't really tell
# us much about the health of the IB transport and b) RAC doesn't
# establish loopback connections to itself anyway.
get_my_local_exadata_ib_ip_addresses
get_remote_exadata_ib_ip_addresses
split_remote_ib_ip_addresses
echo ""
echo "[INFO:`date +'%F %T %Z'`] rds-ping to cells"
for I_ADDR in $LOCAL_ADDR
do
for R_ADDR in $CELLS
do
echodo rds-ping -c 4 -I $I_ADDR $R_ADDR
if [[ $? != 0 ]]; then
echo "[WARNING:`date +'%F %T %Z'`] rds-ping to $R_ADDR failed"
echodo ibdiagnet
fi
done
done
echo "[INFO:`date +'%F %T %Z'`] rds-ping to DB nodes"
for I_ADDR in $LOCAL_ADDR
do
for R_ADDR in $DBNODES
do
echodo rds-ping -c 4 -I $I_ADDR $R_ADDR
if [[ $? != 0 ]]; then
echo "[WARNING:`date +'%F %T %Z'`] rds-ping to $R_ADDR failed"
echodo ibdiagnet
fi
done
done
}
########======Main=====#######
CounterLimit=6
ExaWatcherDir="/opt/oracle.ExaWatcher"
RDSinfoCounterFile="$ExaWatcherDir/tmp/RDSinfoCounter"
check_os
DATE=`date "+%F %T %Z"`
echo " <$DATE>"
echo " ==========================="
echo " This is zone - `zonename`"
echo ""
# Check if an rds-info command is already running. This might indicate
# another ExaWatcher is already running and/or wedged. Running multiple
# rds-info commands can burden the system.
pgrep -f "rds-info"
if [[ $? -ne 0 ]]; then
# rds-info (with no arguments) prints all data, which inlcudes
# socket & queue information, which can be large. kstat produces
# a lot of output too. So these commands are only run once every
# six times.
if [[ ! -f $RDSinfoCounterFile ]]; then
RDSinfoCounter=1
else
RDSinfoCounter=`cat $RDSinfoCounterFile`
fi
if [[ $RDSinfoCounter == 1 ]]; then
# Full rds-info & kstats
echo "===/usr/bin/rds-info==="
echodo rds-info
else
# Just rds connections & counters
echo "===/usr/bin/rds-info -Icn==="
echodo rds-info -Icn
fi
let RDSinfoCounter=$RDSinfoCounter+1
if [[ $RDSinfoCounter -gt $CounterLimit ]]; then
RDSinfoCounter=1
fi
echo $RDSinfoCounter > $RDSinfoCounterFile
echo "===/bin/netstat -rpn==="
echodo netstat -rpn
echo "===All nodes rds-ping==="
do_rds_ping
echo "===dlstat==="
do_dlstat_on_ib_links
else
echo "[WARNING:`date +'%F %T %Z'`] ExaWatcher has found another rds-info process running. This turn of collection will be skipped."
fi
exit 0
Save the file
#svcadm enable ExaWatcher
#./ExaWatcher
You may have to hit enter twice to get back to the prompt.
Please note if you do a pkg fix on osc-exawatcher or apply any QFSDP prior to APR 2016 you will have to repeat these steps as the repair / upgrade activity will put the original file back.
This will be corrected permanently in the APR 2016 QFSDP.
Attachments
This solution has no attachment