Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2151140.1
Update Date:2017-06-15
Keywords:

Solution Type  Technical Instruction Sure

Solution  2151140.1 :   How to Check the Current Health Status of the Oracle Private Cloud Appliance (PCA) Nodes  


Related Items
  • Oracle Virtual Compute Appliance X4-2 Hardware
  •  
  • Oracle Virtual Compute Appliance X3-2 Hardware
  •  
  • Private Cloud Appliance X5-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Engineered Systems HW>SN-x86: OVCA
  •  




In this Document
Goal
Solution
References


Applies to:

Private Cloud Appliance X5-2 Hardware - Version All Versions and later
Oracle Virtual Compute Appliance X3-2 Hardware - Version All Versions and later
Oracle Virtual Compute Appliance X4-2 Hardware - Version All Versions and later
Information in this document applies to any platform.

Goal

The Oracle PCA controller software contains a monitoring service, which is started and stopped with the ovca service on the active management node
The appliance administrator can retrieve current component health status information through the Oracle PCA CLI at any time by means of the diagnose command.

Solution

This document shows steps to check the health status on the PCA.

   #1 How to check the health status on the compute nodes and the zfs heads
   #2 How to identify the location of the faulted node
   #3 How to check the health status on the two management nodes
Optional steps if needed
   #4 An ILOM snapshot from the faulted node may be needed for additional troubleshooting.
   #5 How to find additional information for any faults found


#1 How to check the health status on the compute nodes and zfs heads

Using ssh and an account with superuser privileges, log into the active management node.

# ssh root@xxx.xxx.xxx.xxx
root@xxx.xxx.xxx.xxx's password:
root@ovcamn05r1 ~]#

Launch the Oracle PCA command line interface.

# pca-admin
Welcome to PCA! Release: 2.1.1
PCA>

Check the current status of the rack nodes by querying their ILOMs.

PCA> diagnose ilom

Checking ILOM health............please wait..

IP_Address      Status          Health_Details
----------      ------          --------------
192.168.4.129   Not Connected   None
192.168.4.128   Not Connected   None
192.168.4.127   Not Connected   None
192.168.4.126   Not Connected   None
192.168.4.125   Not Connected   None
192.168.4.124   Not Connected   None
192.168.4.123   Not Connected   None
192.168.4.122   Not Connected   None
192.168.4.121   Not Connected   None
192.168.4.120   Not Connected   None
192.168.4.101   OK              None
192.168.4.102   OK              None
192.168.4.105   Faulty Mon Nov 25 14:17:37 2013 Power PS1 (Power Supply 1)
A loss of AC input to a power supply has occurred.
(Probability: 100, UUID: 2c1ec5fc-ffa3-c768-e602-ca12b86e3ea1,
Part Number: 07047410, Serial Number: 476856F+1252CE027X,
Reference Document: http://www.sun.com/msg/SPX86-8003-73)
192.168.4.107   OK              None
192.168.4.106   OK              None
192.168.4.109   OK              None
192.168.4.108   OK              None
192.168.4.112   Not Connected   None
192.168.4.113   Not Connected   None
192.168.4.110   OK              None
192.168.4.111   Not Connected   None
192.168.4.116   Not Connected   None
192.168.4.117   Not Connected   None
192.168.4.114   Not Connected   None
192.168.4.115   Not Connected   None
192.168.4.118   Not Connected   None
192.168.4.119   Not Connected   None
-----------------
27 rows displayed

Status: Success

PCA> exit

[root@ovcamn05r1 ~]#

For each possible server node slot location, the Status output will show: "OK", "Not Connected", or "Faulty"
For each possible server node slot location, the Health_Details output either will show: "None" or the actual ILOM fault text.
The output is not sorted and it does not show the rack location.
In the example shown above, there is an SPX86-8003-73 fault on the node at 192.168.4.105

#2 How to identify the location of the faulted node

The output from pca-admin diagnose ilom gives the fault by node ILOM IP address.

There is no correlation between the compute node's IP address and it's rack location. The compute node IP addresses are set at provision time, and can change if and when reprovisioned.
There is a constant correlation between the node IP address and it's ILOM IP address. They are always 100 apart -- the node's IP address is always 100 less than it's ILOM IP address.
The faulted node reported at ILOM 192.168.4.105 is the same physical node at IP address 192.168.4.5.

Use the following command to grep by the node IP (not the ILOM IP) to determine the physical location.

[root@ovcamn05r1 ~]# pca-admin list compute-node | grep 192.168.4.5 | cut -f1 -d" "
ovcacn07r1

ovcacn07r1
||||
>>>>>>>>>> always ovca

ovcacn07r1
    ||
    >>>>>> cn = compute node or mn = management node or sn = storage node (zfs storage)

ovcacn07r1
      ||
      >>>> 07 = the numerical ID of the rack unit (RU) aka the "slot" location. Starting from the bottom with number 1 and counting upwards

ovcacn07r1
        ||
        >> r1 = rack 1, where r1 = the base rack, r2 = the first expansion rack, r3 = the second expansion rack, etc.

In this example, the fault SPX86-8003-73 reported at 192.168.4.105 is located in the base rack, on the compute node that is the 7th RU location up from the bottom.

Also note these two IP addresses for the zfs heads are static:
ZFS appliance ilom-ovcasn01r1 192.168.4.101
ZFS appliance ilom-ovcasn02r1 192.168.4.102


#3 How to check the health status on the two management nodes

Note:  The pca-admin diagnose ilom command currently does not check the ILOM health on either management node.

In order to check the MN ILOMs for faults,
ssh to both MN ILOMs at 192.168.4.103 and 192.168.4.104 and type 'show faulty' to check for any faults found.

For example

[root@ovcamn05r1 ~]# ssh root@192.168.4.103
Password:
-> show faulty

Target   | Property   | Value
---------+------------+-------
                                <<<<<<<<< If blank, no faults are present.

-> exit

[root@ovcamn05r1 ~]# ssh root@192.168.4.104
Password:
-> show faulty

Target   | Property   | Value
---------+------------+-------
                               <<<<<<<<< If blank, no faults are present.

-> exit

#4 An ILOM snapshot from the faulted node may be required to assist with troubleshooting.

If needed, see How to Collect an ILOM Snapshot from a Private Cloud Appliance (PCA) or Virtual Cloud Appliance (VCA) Node (Doc ID 1591199.1)

#5 How to find additional information for any faults found.

Search MOS for the fault code.  In this example the fault code is SPX86-8003-73
Searching MOS knowledge for SPX86-8003-73 will find:
   SPX86-8003-73 - Lack of AC input power. (Doc ID 1429954.1)
MOS documents for fault codes normally will explain the following additional information:
   Type, Severity, Description, Automated Response, Impact, and Suggested Action for the System Administrator.


 

References

<BUG:23125774> - PCA-ADMIN LIST OR SHOW COMPUTE-NODE --ILOM
<NOTE:1591197.1> - How to determine OVCA or PCA Management Node, Compute Node, Switch and ILOM service processor IP addresses

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback