Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2119546.1
Update Date:2016-03-24
Keywords:

Solution Type  Problem Resolution Sure

Solution  2119546.1 :   SuperCluster - RDSinfoExaWatcher.sh in Exawatcher does not collect rds-ping information  


Related Items
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Solaris Operating System
  •  
  • Oracle SuperCluster T5-8 Full Rack
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Oracle Database - Enterprise Edition
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
  •  




In this Document
Symptoms
Changes
Cause
Solution


Applies to:

Oracle SuperCluster T5-8 Hardware - Version All Versions and later
SPARC SuperCluster T4-4 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
Solaris SPARC Operating System - Version 10 1/13 U11 to 11.3 [Release 10.0 to 11.0]
Oracle SuperCluster M7 Hardware - Version All Versions and later
Oracle Solaris on SPARC (64-bit)

Symptoms

Exawatcher collections on SuperCluster are missing this key diagnostic data point. Without it , it can become hard to narrow down to the exact time that  RDS activity started experiencing issues. This applies to all LDoms ( global zones) and local zones.

Changes

 NA

Cause

The script was modified years ago  because of rds defects that have long been fixed.

Solution

#cd /opt/oracle.ExaWatcher
#mv RDSinfoExaWatcher.sh

#./StopExaWatcher.sh

#vi RDSinfoExaWatcher.sh

#svcadm enable ExaWatcher

 

Insert in the following code. Please note that even though the notes show the script named changed make sure the file you created is the old name. This is just to differentiate us from the Exadata version.

#!/bin/ksh
#
# SuperClusterRDSinfoExaWatcher.sh
#
# Copyright (c) 2013, 2016, Oracle and/or its affiliates. All rights reserved.
#
#    NAME
#      SuperClusterRDSinfoExaWatcher.sh
#
#    DESCRIPTION
#      This is a script to collect information related to RDS on SuperCluster
#      DB domains and zones. Useful for diagnosis following RAC node evictions
#
#    NOTES
#      This script is very modified from the original script designed to run
#      on Linux Exadata DB & cells nodes. Because of the highly virtualized
#      nature of SuperCluster, you can't easily determine cluster members from
#      things such as ibhosts command or cellip.ora file. So it uses Solaris 11
#      networking commands plus the output of rds-info to identify the RDS
#      connections established by RAC/grid, and rds-pings those addresses.
#      This script is designed only to run in Solaris 11 (or later) DB domains &
#      zones. It is not meant to run in Solaris 10 application domains. If this
#      script is run in Solaris 11 app or root domains it will do nothing since
#      no RDS connections will have be established by RAC/grid.
#
#    MODIFIED   (MM/DD/YY)
#    jamgates    01/27/16 - re-write for SuperCluster
#    jamgates    01/29/16 - minor fixes
#

umask 0037

echodo() { echo "# $@" ; "$@" ; }

check_os()
{
    # If this is an S10 domain or zone then exit
    if [[ `uname -r` == "5.10" ]]; then
        echo "[ERROR:`date +'%F %T %Z'`] This domain is running Solaris 10. RDS info is not relevant in a domain running anything less than Solaris 11. Exiting ...."
        exit
    fi
}

do_dlstat_on_ib_links()
{
    # This function gets all links on the Exadata (FFFF) IB partition
    # and runs dlstat 4 times with an interval of 1 second. The first
    # row output shows the total numbers since the creation of the link.
    # The subsequent rows show the nomalized (per second) statistics.

    for LINK in `dladm show-part -p -o LINK,PKEY | grep ":FFFF$" | cut -d: -f1`
    do
        echodo dlstat -Z $LINK 1 4
    done
}

get_my_local_exadata_ib_ip_addresses()
{
    # This function grabs local IP addresses on all IPMP groups on the
    # Exadata (FFFF) IB partition. These will have connectivity to all
    # cells and should be specified as the "cluster_interconnects"
    # parameter in all DB init.ora files
    IPMPSTAT=`ipmpstat -o INTERFACE,GROUP,ACTIVE -P -i`

    for LINK in `dladm show-part -p -o LINK,PKEY | grep ":FFFF$" | cut -d: -f1`
    do
        for GROUP in `echo "$IPMPSTAT" | grep "^$LINK:" | grep ":yes$" | cut -d: -f2`
        do
            LOCAL_ADDR="$LOCAL_ADDR "`ipadm show-addr -p -o ADDR $GROUP | cut -d/ -f1`
            # Simultaneous calls to ipmpstat can overload in.mpathd.
            # Sleep between each call to
        done
    done

    if [[ "$LOCAL_ADDR" == "" ]]; then
        echo "[WARNING:`date +'%F %T %Z'`] No local IB IP addresses are configured on the Exadata (FFFF) partition. Either this domain/zone is mis-configured or this isn't a DB domain"
        echo ""
    else
        echo "[INFO:`date +'%F %T %Z'`] My (`hostname`) local IB IP addresses:"
        echo ""
        echo $LOCAL_ADDR
        echo ""
    fi
}

get_remote_exadata_ib_ip_addresses()
{
    # This function uses rds-info to identify remote addresses of all
    # current RDS connections. These will correspond to the other DB
    # nodes & cells in the same RAC clusters as this domain or zone
    # and will have been established by the RAC DBs on this domain or
    # zone

    rds-info -n | while read LOC REM TOS NEXTTX NEXTRX FLGS
    do
        if [[ "$FLGS" == *"C-" ]]; then
            # Flags containing --C- means the remote host is
            # successfuly connected, so add it to the list
            REMOTE_ADDR="$REMOTE_ADDR "$REM
        fi
    done

    if [[ "$REMOTE_ADDR" == "" ]]; then
        echo "[WARNING:`date +'%F %T %Z'`] No established RDS connections (is RAC running?)"
        echo ""
    else
        # Sort the remote address list into unique addresses
        REMOTE_ADDR=`echo $REMOTE_ADDR | tr " " "\n" | sort -u`
    fi
}

split_remote_ib_ip_addresses()
{
    # This function splits the list of remote IP addresses into cell
    # nodes and DB nodes. This can be deduced a number of ways, not all
    # reliable though. Probably the simplest & most reliable is to check
    # for the address in the cellip.ora file. If the address isn't in the
    # file, we assume its a DB node.

    CELLIP=/etc/oracle/cell/network-config/cellip.ora

    if [[ -r $CELLIP && -s $CELLIP ]]; then
        for ADDR in $REMOTE_ADDR
        do
            grep -q "cell=\"$ADDR\"" /etc/oracle/cell/network-config/cellip.ora
            if [[ $? -eq 0 ]]; then
                CELLS="$CELLS "$ADDR
            else
                DBNODES="$DBNODES "$ADDR
            fi
        done
    else
        # cellip.ora is empty or doesn't exist? Plan B is to check the
        # output of 'ibhosts' which identifies storage cells with
        # "hostname C IP address[,IP address...] HCA-#" in the node
        # descriptor field.

        IBHOSTS=`ibhosts`
        for ADDR in $REMOTE_ADDR
        do
            echo "$IBHOSTS" | grep -q " C.*[ ,]$ADDR[ ,].*HCA-"
            if [[ $? -eq 0 ]]; then
                CELLS="$CELLS "$ADDR
            else
                DBNODES="$DBNODES "$ADDR
            fi
        done
    fi

    echo "[INFO:`date +'%F %T %Z'`] Connected remote IB IP addresses:"
    echo ""
    echo "Cells: "$CELLS
    echo "DB Nodes: "$DBNODES
    echo ""
}

do_rds_ping()
{
    # This function gets all local & remote IB IP addresses and rds-pings
    # each remote address from each local address. The ping is performed
        # 4 times with a (default) 1 second timeout. This is so we see a
    # reasonable sample or response times (since the first rds-ping can
    # often take a lot longer than subsequent). Note we don't ping local
    # addresses from local addresses because a) That doesn't really tell
    # us much about the health of the IB transport and b) RAC doesn't
    # establish loopback connections to itself anyway.

    get_my_local_exadata_ib_ip_addresses
    get_remote_exadata_ib_ip_addresses
    split_remote_ib_ip_addresses

    echo ""

    echo "[INFO:`date +'%F %T %Z'`] rds-ping to cells"
    for I_ADDR in $LOCAL_ADDR
    do
        for R_ADDR in $CELLS
        do
            echodo rds-ping -c 4 -I $I_ADDR $R_ADDR
            if [[ $? != 0 ]]; then
                echo "[WARNING:`date +'%F %T %Z'`] rds-ping to $R_ADDR failed"
                echodo ibdiagnet
            fi
        done
    done

    echo "[INFO:`date +'%F %T %Z'`] rds-ping to DB nodes"
    for I_ADDR in $LOCAL_ADDR
    do
        for R_ADDR in $DBNODES
        do
            echodo rds-ping -c 4 -I $I_ADDR $R_ADDR
            if [[ $? != 0 ]]; then
                echo "[WARNING:`date +'%F %T %Z'`] rds-ping to $R_ADDR failed"
                echodo ibdiagnet
            fi
        done
    done
}

########======Main=====#######

CounterLimit=6
ExaWatcherDir="/opt/oracle.ExaWatcher"
RDSinfoCounterFile="$ExaWatcherDir/tmp/RDSinfoCounter"

check_os

DATE=`date "+%F %T %Z"`
echo "     <$DATE>"
echo "     ==========================="
echo "     This is zone - `zonename`"
echo ""

# Check if an rds-info command is already running. This might indicate
# another ExaWatcher is already running and/or wedged. Running multiple
# rds-info commands can burden the system.

pgrep -f "rds-info"
if [[ $? -ne 0 ]]; then
    # rds-info (with no arguments) prints all data, which inlcudes
    # socket & queue information, which can be large. kstat produces
    # a lot of output too. So these commands are only run once every
    # six times.

    if [[ ! -f $RDSinfoCounterFile ]]; then
        RDSinfoCounter=1
    else
        RDSinfoCounter=`cat $RDSinfoCounterFile`
    fi

    if [[ $RDSinfoCounter == 1 ]]; then
        # Full rds-info & kstats
        echo "===/usr/bin/rds-info==="
        echodo rds-info
    else
        # Just rds connections & counters
        echo "===/usr/bin/rds-info -Icn==="
        echodo rds-info -Icn
    fi

    let RDSinfoCounter=$RDSinfoCounter+1
    if [[ $RDSinfoCounter -gt $CounterLimit ]]; then
        RDSinfoCounter=1
    fi
    echo $RDSinfoCounter > $RDSinfoCounterFile

    echo "===/bin/netstat -rpn==="
    echodo netstat -rpn

    echo "===All nodes rds-ping==="
    do_rds_ping

    echo "===dlstat==="
    do_dlstat_on_ib_links
else
    echo "[WARNING:`date +'%F %T %Z'`] ExaWatcher has found another rds-info process running. This turn of collection will be skipped."
fi

exit 0

  

Save the file

#svcadm enable ExaWatcher

#./ExaWatcher

 

You may have to hit enter twice to get back to the prompt.

 

 

Please note if you do a pkg fix on osc-exawatcher or apply any QFSDP prior to APR 2016 you will have to repeat these steps as the repair / upgrade activity will put the original file back.

 

 This will be corrected permanently in the APR 2016 QFSDP.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback