Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-2079645.1
Update Date:2018-03-06
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  2079645.1 :   Health Check (ZCheck) for Oracle ZFS Storage Appliance on Oracle Enterprise Manager 12c and 13c  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  


This document provides general information, installation guide and remediation information for ZCheck (Health Check) for Oracle ZFS Storage Appliance on Oracle Enterprise Manager 12c.

In this Document
Purpose
Scope
Details
 Overview
 Installation Guide
 Prerequisite
 Installation Steps
 Troubleshooting
 Uninstallation Steps
 Upgrade Steps
 Health Check Monitoring Features and Customization
 Collection Schedule and Customization
 Alert threshold and Customization
 Stateful Alerts
 Remediation For Health Check
 Repair Action for Cluster Links
 Repair Action for Analytics Retention Policy
 Repair Action for DNS Configuration
 Repair Action for Datasets Check
 Repair Action for Backend Check
 Repair Action for L2Arc Header Size Check
 Repair Action for Locked Server Check


Applies to:

Sun ZFS Storage 7420 - Version All Versions and later
Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
7000 Appliance OS (Fishworks)

Purpose

This document provides general information, installation guide and remediation information for Health Check for Oracle ZFS Storage Appliance on Oracle Enterprise Manager 12c or Oracle Enterprise Manager 13c.

 

Scope

This document is intended for administrators and Oracle support engineers that monitors Oracle ZFS Storage Appliance on Oracle Enterprise Manager.

 

Details

Overview

Health Check(ZCheck) is a monitoring metric on Oracle Enterprise Manager that is designed to assess the health of Oracle ZFS Storage Appliance.

Health Check periodically monitors configuration settings in the ZFS Storage Appliance to check if they are within the Oracle recommendation best practices for performance and resilience.

Health Check generates a stateful warning or alert if any setting doesn't meet the Oracle recommendation and automatically clear the warning or alert when it clears.

Users can customize both the Health Check collection schedule and alert threshold type.

In this release, Health Check includes seven individual categories, additional items will be added in future release. Note that Health Check is in synchronization with the ORAchk (Oracle Configuration Audit Tool) for ZFS Storage Appliance.

 

The following table summarizes the seven checks for Oracle ZFS Storage Appliance.

Check Name
Check Detail
Cluster Check Examines the health of the cluster links in a clustered appliance, skipped for non-cluster heads.
Analytics Retention Policy ZFSSA Analytics stores datasets and retains them for a specified period of time. You must consider the available disk space while setting up the retention policy, to ensure that analytics datasets do not use up too much disk space.
DNS Configuration To allow proper network configuration within the ZFS storage appliance, ensure that you use a host name that matches the DNS.
Dataset Check Examines the size of the datasets of the appliance.
Backend Check Chassis and disk are present, not faulted and not having spurious time out issues. Checks that both paths are reporting properly to disks (not single path) and that firmware revision meets the requirements on the chassis and disks.
L2ARC Header Size Check Reports the percentage of L2Arc header size in the memory size to exam if the performance of the overall appliance is affected.
Locked Server Check Older versions of the Oracle ZFS Storage Appliance software have low locked server settings, this check will verify that the settings are sufficient.

 

 

 

 

 

 

 

 

 

Table below shows the Benefit/Impact and Risk for each check.

Check Name Benefit/Impact Risk
Cluster Check Examines the cluster link of the appliance, verifies cable configuration. Any faults within the cluster link may lead to problems and outages.
Analytics Retention Policy Keeping the retention policy to the limits recommended by Oracle helps keep storage consumption from analytics information contained to a reasonable amount without intervention by administrators. If the retention policy is not set carefully within the recommended values, dataset growth may exceed the available disk space, which could cause significant performance degradation on the ZFS Storage Appliance.
DNS Configuration By keeping the DNS Configuration up to date, the appliance can conduct reverse name lookups and reach critical services in a timely fashion. An incorrect host name that does not match the DNS can cause network configuration issue.
Dataset Check Examines each enabled analytics dataset on ZFS Storage Appliance. Flags any one that are larger than 2 GB. An excessive number of large datasets can cause performance degradation.
Backend Check Keeping the backend health check successful is critical to both the performance of the system and to ensuring there is little to no downtime for applications. Any faults, single path, or mismatches in firmware version can lead to disk problems and outage.
L2ARC Header Size Check Early versions of the Oracle ZFS Storage Appliance could grow metadata in primary memory to be larger than it should be as a percentage of total memory.This check helps ensure that memory consumption is optimal in the system for the best performance and response times overall. Leaving this issue unaddressed can impact overall appliance performance. The software for the appliance is likely out of date or the appliance may be responding to management requests slowly.
Locked Server Check Lower limits for locked servers may impact client applications that hold large numbers of locks. Leaving this issue unaddressed impacts client scalability.

 

 

 

 

 

 

 

 

 

 

 

 

Installation Guide

Prerequisite

1. OEMCC plugin version 2.1 is deployed and running on Oracle Enterprise Manager 12c, OEMCC plugin version 2.1.3 is deployed and running on Oracle Enterprise Manager 13c.

2. Grant the user role (if not "root") on all monitored ZFS targets the "shell" and "audit" privileges. Usually the user is "oracle_agent". To do that

  • Login to ZFS Appliance Kit as administrator
  • Go to "Configuration" - "Users"
  • Select the role (usually "oracle_agent")
  • For AK2011:
    • Add the scope to: system
    • Select appliance name: shell
  • For OS 8.3:
    • Add the scope to: system
    • Select appliance name: shell, audit
  • For OS 8.4 or later:
    • Add the scope to: appliance
    • Select appliance name: shell, audit

Installation Steps

The attached deliverable in this note includes two metric extension .zip files and two Java .jar files.

ME$HealthCheckWindows.zip is for the Enterprise Manager agent running on Windows platform.

ME$HealthCheckNonWindows.zip is for Enterprise Manager agent running on all other platforms.

Oracle_Grid.jar works for all OS platforms, but we have two versions of Oracle_Grid.jar for different OEMCC plugin versions. One is for version 2.1.3 only and the other is for version 2.1, please select the matched version of Oracle_Grid.jar for Enterprise Manager OEMCC plugin.

  • ME$HealthCheckWindows.zip: Windows agent
  • ME$HealthCheckNonWindows.zip: non-Windows agent
  • Oracle_Grid.jar for v.2.1.3: generic for Windows and non-Windows, OEMCC plugin version 2.1.3
  • Oracle_Grid.jar for v.2.1: generic for Windows and non-Windows, OEMCC plugin version 2.1

1. Login as the user role of the administrator of Enterprise Manager.

2. Replace the existing Oracle_Grid.jar with the new one

Health Check Script is included in the Oracle_Grid.jar. Unlike common installation procedure, the user doesn't need to redeploy opar file or restart Enterprise Manager Agent/OMS.  The user can replace this Oracle_Grid.jar file when EM is running.

To Replace Oracle_Grid.jar:

  • Go to the directory of where EM Agent is installed, for instance, on a Linux 64 OS:   
    cd /u01/app/oracle/agent
     
  • Go to the directory of OEMCC plugin scripts:

    If the OEMCC plugin version is 2.1, then go to:

    cd plugins/oracle.sun.oss7.agent.plugin_12.1.0.6.0/scripts/emx/sun_storage_7000

    If the OEMCC plugin version is 2.1.3, then go to:

    cd plugins/oracle.sun.oss7.agent.plugin_12.1.0.7.0/scripts/emx/sun_storage_7000
     
  • Replace the Oracle_Grid.jar to the new Oracle_Grid.jar in the deliverable package. (Important: don't change the name of this jar, make sure the new Oracle_Grid.jar overwrites the existing one.)
  • Change mode to -rw-r--r--. For example, on Linux 64 OS:   
    chmod 644 Oracle_Grid.jar

  

Note: If the user has multiple agents monitoring ZFS targets, depending on the configuration and need, the user may need to repeat the steps above to replace all Oracle_Grid.jar under all running agents.

  

2.  Import the correct ME$HealthCheck.zip on Enterprise Manager Browser UI.

  • Log into Enterprise Manager.
  • Select from top level menu: "Enterprise" - "Monitoring" - "Metric Extensions".
  • Click the "Actions" - "Import...", select "Browse" and select the correct ME$Health Check.zip(s) according the agent OS, click "OK".
  • Choose the new added metric extension row and click "Actions" - "Save As Deployable Draft".
  • Choose the new added metric extension row and click "Actions" - "Deploy To Targets...", click "+ Add" and multi select the targets to monitor.
  • Choose the new added metric extension row and click "Actions" - "Publish Metric Extension".

The user should be able to see the metric extension from ZFS Target - Oracle ZFS Storage Appliance - Monitoring - All Metrics.

For more details, refer to Oracle Enterprise Manager - Using Metric Extensions

Troubleshooting

Q: The status of added appliance shows error or unknown status and click "*Health Check" shows "Can not find ME$HealthCheck" error message.

A: It typically means the Oracle_Grid.jar is not matched with the OEMCC plugin version correctly, make sure you download the matched version Oracle_Grid.jar from the attachment link. If your OEMCC plugin is version 2.1.3, please download Oracle_Grid.jar for v2.1.3, if your OEMCC plugin is version 2.1, please download Oracle_Grid.jar for v2.1.

 

Q: "*Health Check" on some targets don't show up in the metric list.

A: Go to "Enterprise" - "Monitoring" - "Metric Extensions" - "Actions" - "Manage Target Deployments", see if the target is deployed. If not, select "Action" - "Deploy to Targets" and deploy Health Check on the target.

 

Q: Click "*Health Check" shows "ClassNotFound" error message.

A: It typically means the Oracle_Grid.jar is not replaced correctly, the script cannot be found. Contact the administrator, go to 

cd <agent_homt>/plugins/oracle.sun.oss7.agent.plugin_12.1.0.7.0/scripts/emx/sun_storage_7000

Run the command below (<zfs_target_name> is the target hostname, username/password is the pair to log into ZFS target). Note that this command is for non-windows, for windows, replace the : with ;.

java -classpath ./Oracle_Grid.jar:./ws-commons-java5-1.0.1.jar:./xmlrpc-common-3.1.2.jar:./xmlrpc-client-3.1.2.jar:./ws-commons-util-1.0.2.jar:./commons-logging-1.1.jar:./commons-codec-1.7.jar:./jsch-0.1.49.jar com.sun.s7000.client.RetrieveSSHResult <zfs_target_name> "HealthCheck.aksh"

InstanceUser=<username>

InstancePassword=<password>

For Non-windows: CTRL + D (end of the text stream)

For Windows: CTRL + Z (end of the text stream)

Check if the result is printed out correctly. If not, contact the administrator. Otherwise, delete the metric extension and import the ME$HealthCheck.zip again.

 

Q: DNS Configuration check shows result: "INFO: Error running nslookup command, check failed" or "Error in getting result from appliance.Please contact the administrator. oracle.sysman.emSDK.emd.comm.MetricGetException: Error in getting result from appliance.Please contact the administrator."

A: Grant the user role of ZFS target the "shell" and "audit" privileges, the steps are:

  • Login to ZFS Appliance Kit as administrator
  • Go to "Configuration" - "Users"
  • Select the role (usually "oracle_agent")
  • For AK2011:
    • Add the scope to: system
    • Select appliance name: shell
  • For OS 8.3:
    • Add the scope to: system
    • Select appliance name: shell, audit
  • For OS 8.4 or later:
    • Add the scope to: appliance
    • Select appliance name: shell, audit

Uninstallation Steps

To uninstall Health Check metric extension:

1. Undeploy all targets:

  • Go to "Enterprise" - "Monitoring" - "Metric Extensions".
  • select the Health Check metric extension, choose "Actions" - "Manage Target Deployments".
  • select all deployed targets and click "Undeploy".

2. Delete the metric extension:

  • Go to "Enterprise" - "Monitoring" - "Metric Extensions".
  • select the Health Check metric extension, choose "Actions" - "Delete".

For detail of delete the metric extension, please refer to Oracle Enterprise Manager - Using Metric Extensions

Note: The user doesn't have to switch the Oracle_Grid.jar back, but if the user want, the user can replace the new jar back to the original Oracle_Grid.jar, steps to do that are the same with installation steps on swapping Oracle_Grid.jar.

Upgrade Steps

Health Check is updated version of ZCheck. If you previously installed ZCheck, the best way to upgrade to Health Check is to uninstall ZCheck first and install Health Check metric extension followed by the steps above from the beginning.

Health Check Monitoring Features and Customization

Collection Schedule and Customization

By default, collection schedule is set to collect every 4 hours and upload to database on alert only. The user can customize this setting by changing the collection schedule.

Alert threshold and Customization

By default, "CRITICAL" result is set under the critical threshold, "UNKNOWN" or "WARNING" results are set under the warning threshold. The user can customize this setting by changing the keywords under menu "Oracle ZFS Storage Appliance" - "Metric and Collection Settings" - metric "Health Check".

For example, the user can set the result keyword "CRITICAL" to be a warning alert by using "UNKNOWN|WARNING|CRITICAL" under the column of warning threshold and delete "CRITICAL" under column critical threshold.

Stateful Alerts

Alerts are stateful, which get cleared automatically as ZFS configuration settings meet the Health Check recommendations. For details of remediation, please refer to "Remediation" section.

 

Remediation For Health Check

If the user encounters alerts on Enterprise Manager stating Health Check issue, please contact the administrator. This section is intended for Oracle ZFS Storage Appliance Administrator.

Repair Action for Cluster Links

Check on the cable setting, contacting the administrator and reconfigure the cluster configuration on the owner node.

For more information, please refer to Sun ZFS Storage 7000 System Administration Guide

Repair Action for Analytics Retention Policy

1. Log into ZFS Storage Appliance and set <property_type>=<property_value_in_hours> and commit the change. For example (1 month = 672 hours): 

analytics settings set retain_second_data=672
analytics settings commit

2. Verify the change:

analytics settings show

Repair Action for DNS Configuration

Please consult with the DNS Network administrator for the specific domain or servers.

1. Log into ZFS Storage Appliance.

2(a). If set domain property, run: set domain=<domain_value> and commit the change, for example:

configuration services dns set domain=my.example.com
configuration services dns commit

2(b). If set servers property, run: set servers=<server_value> and commit the change, for example:

configuration services dns set servers=0.0.0.0
configuration services dns commit

3. Verify the changes:

configuration services dns show

Repair Action for Datasets Check

1. Log into ZFS Storage Appliance.

2. Show all active datasets, select the dataset to prune and choose: prune <hour/minute/second> to discard hour/minute/second data . For example:

analytics dataset show
select dataset-000
confirm prune second

Now, if the user run command: show, the user should be able to see activity: pruning (x % completed)

Note: Command: destroy <dataset> will discard the dataset and disable it, the user needs to create it again, run: create <dataset_name>

Repair Action for Backend Check

  • If Path Count is 1 (single path): Replace the device if both SIMs are online.
  • If Faulted: Replace the disk/SIM
  • If Not Presented: If it should be missing, leave it, otherwise, reinsert or replace the disk/SIM. The Exalogic should have all slots propagated with disks.
  • If Firmware Revision mismatch: If the disk is not in the process of upgrading it's firmware, replace the disk or having a filed engineer manually upgrade the device.

Repair Action for L2Arc Header Size Check

Ensure that the Oracle ZFS Storage Appliance containing this warning is up to date with the latest Oracle ZFS Storage Appliance software.

Repair Action for Locked Server Check

Upgrade Oracle ZFS Storage Appliance software to current version.

 

 

Check for Currency 06-MAR-2018

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback