Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-2019842.1
Update Date:2017-10-05
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  2019842.1 :   Oracle ZFS Storage Appliance: Potential Performance Impact using Analytics  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun ZFS Backup Appliance
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7720 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Purpose
Details
 Analytics Overview
 Execution overhead of enabling statistics
 Storage overhead of enabling statistics
 Best Practices Guidelines for using Analytics
 Access to Oracle Support
References


Applies to:

Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Backup Appliance - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Purpose

 

To provide technical information regarding the use of Analytics in the Oracle/Sun ZFS storage appliances, specifically on the potential performance impact of some of the available datasets and metrics. An overview of the different types of Analytics statistics and their execution and storage overhead is covered, as well as best practice guidelines for using the Analytics without impacting long term system performance.  

Details

Analytics Overview

Analytics is a key feature provided by Oracle ZFS storage Appliances to aid in real-time monitoring, troubleshooting and tuning the appliance to maximize its performance in terms of both throughput and latency, enabling production requirements and Service Level Agreements (SLA’s) to be met. The Analytics feature facilitates the collection and graphical representation of a variety of statistics that offer broad observability across all appliance resources; disk, CPU, network, cache, memory, as well as per-protocol statistics, logical file system operations and implementation-specific components of the appliance (ARC cache, log devices, cache devices, etc). These datasets show how the appliance is behaving under the given load and how the clients are using it.

Some of the Analytics data are sourced from the Solaris kernel kstats facility. The kstats facility is used throughout the Solaris kernel to maintain raw data on event counts that can easily be consumed by user utilities. These are called static statistics. Capturing and displaying the kstat-based data in Analytics is very lightweight with negligible effect on the performance of the system. These statistics are sufficient to obtain a general view of the system behavior.  CPU percent utilization, network device bytes per second and NFSv3/NFSv4 operations per second are some examples of static statistics. The other Analytics data are created dynamically and use DTrace technology to trace the events and aggregate the data. These dynamic statistics are not usually maintained by the system. HTTP/WebDAV requests per second, Replication bytes per second and ZFS DMU operations per second are some of the dynamic statistics.  Apart from the raw data, most of the statistics also have drilldowns available, based on client, type of operation, filename, etc. To determine if a particular statistic is kstat-based or DTrace-based, enable that statistic and hold the shift key down – click the “Export data” icon from the worksheets view on the Browser User Interface (BUI).  If an option to open a script file is displayed, then it is a dynamic  (DTrace-based) statistic. If the shift-click does not do anything further, then it is a kstat-based statistic. For the details on the Analytics interface refer the chapter 2 “Analytics Interface” of “Oracle ZFS Storage Appliance Analytics Guide”*. This documentation is also available while using the appliance BUI, accessible via the “Help” button.

A set of statistics that are most useful are made readily available by default on newly installed  appliances, while the others are available only when the “Advanced Analytics” feature is enabled. These advanced statistics are of lesser interest and not typically needed for general system observability. For details on how to enable the Advanced Analytics feature, see the “Preferences” section in chapter 3 “Configuration” of “Oracle ZFS Storage Appliance Administration Guide”*. A group of statistics that can provide broad observability across protocols with minimal data collection overhead are enabled and archived by default. For the complete list of Analytics datasets and the default statistics available, refer the chapter 3 “Statistics and Datasets“of the “Oracle ZFS Storage Appliance Analytics Guide”*.

 

Execution overhead of enabling statistics

Enabling any statistics will incur some performance cost for data collection and aggregation; this is called the probe effect. For kstat-based static datasets, the execution overhead is negligible as they are mostly sourced from operating system counters that are already being maintained. But dynamic statistics have the potential to seriously impact the system performance in terms of inducing latency on operations as well as a reduction in throughput. These analytics use DTrace to trace the events and collect data every second, thus the cost of each of these statistics is proportional to the number of events being traced. In many cases, this overhead will not make a noticeable difference on the system performance. However, for systems under high load, including benchmark loads, the overhead of statistic collection can begin to be noticeable. Drilldowns will incur further overhead for all events. Tracing network data and even drill-down details when the network load is 200 packets per second will not likely induce any noticeable performance problems. Capturing and displaying the same statistics with a network load of 100,000 packets per second will be much more likely to induce a noticeable performance regression. The type of drill-down data will also be a factor, e.g. capturing “TCP packets broken down by size” or “TCP bytes broken down by client” requires capturing and aggregating significantly more data in the underlying DTrace generated to make this data available, and thus can further impact delivered performance. Performance of some of the storage applications may depend on other activities. For example, the throughput of file system operations over NFS or NDMP Data Backup will depend on the throughput of the underlying network. So if a dynamic Network Analytics like “TCP bytes per second” is enabled under heavy load, the performance of the NFS operations or NDMP Data Backup can degrade significantly, even if statistics related to these protocols are not enabled. 

The impact of the dynamic statistics can be determined by enabling and disabling them while running under steady load and observing the overall performance differences via the static statistics.

 

Storage overhead of enabling statistics

By default, the appliance indefinitely retains all analytics data for all active datasets on a per-second basis. Leaving the dataset retention policy at default could potentially consume large amounts of disk space and create large datasets that are slow to manipulate in the BUI. The amount of data to be retained will depend on the type of statistics and the activity rate. Raw statistics do not require much space. The drilled down data will consume more space and this amount will depend on the number of breakdowns, length of the drilldown name and the type of drilldowns enabled. For example, when saving the data from the statistic “NFS bytes broken- down by file name”, the file names could be long and, since they are saved with their entire pathname, there may be a long character string to be saved for each entry. For by-file and by-hostname drilldowns, the number of drilldowns per second may reach into the hundreds depending on how many different files or hosts had activity in a given second.

The data retention policy can be modified from the default values to reduce the space required to retain data. The retention policy can limit the amount of data stored by changing the frequency from per-second to per-minute, or per-hour over a period of time. The data will be discarded after the retention period. The Analytics data is always collected and stored on a per-second basis. If a data retention policy is set, each dataset gets trimmed after it is collected and only the data pertaining to the set interval will be retained, while the rest of the historical data is deleted.  Per-second data is the finest granularity and requires more disk space than per-minute or per-hour data. Setting a longer retention period also corresponds to storing more data. The size of the datasets can be monitored from the Analytics->Datasets view in the BUI. See the “System Disks” section in chapter 3 of “Oracle ZFS Storage Appliance Customer Service Manual”* for system disk usage and available space.

The Analytics datasets can also be archived to facilitate future viewing and historical analysis of the data. The archived datasets are continuously read and saved to the system disks in one second summaries. The archived data is not discarded automatically. Though the archived data is compressed when saving to the disk, this could present an issue with system disk usage with growing data size. The retention policy setting does not apply to the archived data. The factors affecting the amount of disk space needed for archived data are the same as described above. For additional details on performance overheads associated with Analytics data collection, refer the Chapter 4 "Performance Impact" of “Oracle ZFS Storage Appliance Analytics Guide”*

 

Best Practices Guidelines for using Analytics

The default set of statistics that are enabled and archived on appliances provide broad observability across the various modules of the storage system and the supported protocols with minimal statistic collection overhead. The users are advised to utilize these datasets for monitoring the general system performance. The other static Analytics statistics can be enabled as required, as the performance impact of these are negligible.      

For troubleshooting specific performance bottlenecks, the dynamic Analytics statistics and drilldowns can be employed. But they should be enabled only for a short duration as they can potentially cause significant performance overhead. Once the issues are analyzed, these datasets should be suspended. The dynamic statistics must never be left running for a long term.  To suspend the statistic, click the power icon for that dataset in the Analytics->Datasets view from the BUI.  

Since the appliances' default settings indefinitely retains all analytics data for all active datasets, it is recommended that the retention policies are set. Refer the “Settings” section in chapter 1 of the “Oracle ZFS Storage Appliance Analytics Guide” * for details on how to set the retention policies. Periodically discarding the highest fidelity data can significantly reduce the disk space required by Analytics. It is recommended that only the minimum amount of data is retained according to the business requirements, including compliance needs. Since the statistics drilldowns will consume more space, enable them only as is necessary for general system monitoring or troubleshooting of specific issues. To optimize the storage space, suspend the statistics when the requirements are met.

When archiving Analytics datasets monitor the growing sizes of the datasets from the Analytics->Datasets view in the BUI and destroy datasets that are growing too large. Since the statistics drilldowns will consume more space, enable them only as is necessary for general system monitoring or troubleshooting of specific issues. To reduce the storage overhead, the statistics can be suspended when the requirements are met. The suspended datasets do keep their data for later viewing.

Access to Oracle Support

Contact the Oracle support for any technical issues with using Oracle/Sun ZFS storage appliance products. Oracle customers have access to electronic support through My Oracle Support. For information, visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=info or visit http://www.oracle.com/pls/topic/lookup?ctx=acc&id=trs if you are hearing impaired.

 

 *These documents are available for different software releases and products. The appropriate version can be found from the Oracle ZFS Storage Appliance documentation library in http://www.oracle.com/goto/ZFSStorage/docs.

 

References

http://docs.oracle.com/cd/E51475_01/html/E52874/index.html
<BUG:20275711> - ANALYTICS CAN HAVE A PROFOUND EFFECT ON IMPORT TIMES IN BOTH TAKEOVER/FAILBACK
<NOTE:1401595.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to 'excessive' analytics collected
<NOTE:1589345.1> - Sun Storage 7000 Unified Storage System: ip.bytes[hostname] Analytics Dataset enabled can inhibit Network Performance
<NOTE:1572205.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hangs when accessing the 'status' or 'analytics' page
<BUG:18793297> - PACKET LOSS ON IPMP3 INTERFACE
<BUG:17440592> - DROPOUTS IN IXGBE0 INTERFACE
<BUG:18701686> - BUI/CLI SLOW TO RESPOND AND FC CLIENTS LOST CONNECTIONS
<BUG:15676439> - SUNBT6993881-AK-2011.04.24 ACCRUED ANALYTICS DATA CAN COMPLETELY WEDGE AKD
http://docs.oracle.com/cd/E51475_01/html/E52873/index.html
http://docs.oracle.com/cd/E51475_01/html/E52872/index.html
<BUG:17538159> - POOR PERFORMANCE OVER 10 GBE
<NOTE:2103699.1> - Oracle ZFS Storage Appliance: Analytics Best Practices
<NOTE:1461959.1> - Sun Storage 7000 Unified Storage System: How to configure Analytics for dataset retention policies

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback