![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1401282.1 : Sun Storage 7000 Unified Storage System: How to Troubleshoot Unresponsive Administrative Interface (BUI/CLI hang)
To assist a user in resolving management BUI/CLI connectivity/responsiveness issues. In this Document
Applies to:Sun Storage 7410 Unified Storage System - Version All Versions and laterOracle ZFS Storage ZS3-4 - Version All Versions and later Oracle ZFS Storage ZS3-BA - Version All Versions and later Oracle ZFS Storage ZS4-4 - Version All Versions and later Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later 7000 Appliance OS (Fishworks) PurposeThe purpose of this document is to assist a user in resolving management BUI/CLI connectivity/responsiveness issues. If ssh to the appliance drops the user into the emergency shell, the end user must open a support session to allow the Oracle System Support team remote access to the system to troubleshoot and fix this issue. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance Community
Customers are not permitted to run commands at the emergency shell.
Troubleshooting StepsPlease validate that each troubleshooting step below is true for the affected environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step. Symptoms:The usual symptoms of an Unresponsive Administrative Interface issue are:
See the following Internal-Only Documents for collecting useful data.
Causes and Resolutions:The initial step is to check the basic configuration/operation of the appliance management connectivity, please see:
To be added to ... 'non-exhaustive' list all possible causes of 'BUI/CLI hang' conditions:
... to provide further diagnostic/context data to assist in isolating the cause of the issue.
Excessive kernel virtual memory (exceeding the 32-bit VM limit)For any system running Appliance Release versions earlier than 2010.Q3.4 or 2011.1, running (many) aksh scripts can exhaust the Appliance management daemon kernel memory. See Document 1334777.1 - Sun Storage 7000 Unified Storage System: System hang - aksh scripts can exhaust memory Attempting to login to the CLI, generates a 'fatal error: no memory' message. See Document 1325025.1 - Sun Storage 7000 Unified Storage System: aksh fatal error: no memory For any system running Appliance Release versions earlier than 2011.1, the creation and deletion activities for a large amount of VDI LUNs can cause a BUI/CLI hang condition. See Document 1408593.1 - Sun Storage 7000 Unified Storage System: Creation/deletion of large amount of VDI LUNs can cause BUI/CLI hang To monitor the memory used by akd, a workflow can be used. See Document 1391232.1 - Sun Storage 7000 Unified Storage System: The workflow to check memory usage of the akd. Excessive amount of 'old' analyticsDue to the detailed amount of information available when using analytics, and the 'always on' operation for the collection of the default set of analytics, collection of 'excessive' analytics data can eventually cause a 'hang' condition for the Appliance management interfaces (BUI/CLI). See Document 1401595.1 - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to 'excessive' analytics collected A 'hang' condition for the BUI/CLI may also result due to the 'total' amount of analytics currently being collected See Document 1572205.1 - Sun Storage 7000 Unified Storage System: BUI/CLI hangs when accessing the 'status' or 'analytics' page A 'hang' condition for the Appliance management interfaces (BUI/CLI) may result due to a known analytics compilation bug. See Document 1468128.1 - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to analytics compilation (CCP) bug Excessive amount of 'old' log filesFor any system running Appliance Release versions earlier than 2010.Q1.0, system libraries used by akd can exceed a 256 file descriptor limit if many (old) logfiles are present. This can cause a 'hang' condition for the Appliance management interfaces (BUI/CLI). See Document 1408493.1 - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to 'excessive' amount of 'old' log files Excessive use of contractsWhenever a workflow terminates abnormally, it leaves a unused 'contract id'. Eventually, the contract limit may be exceeded and processes are unable to start. Error messages may include "Resource temporarily unavailable". See Document 1410873.1 - Sun Storage 7000 Unified Storage System: SMF unable to spawn processes due to contract exhaustion Excessive amount of AKD process (memory) heap fragmentationFor any system running Appliance Release version 2011.1.6.0 and earlier, the akd process controlling the management interface can run out of memory because of memory fragmentation issues due to large number of oversize allocations. See Document 1494369.1 - Sun Storage 7000 Unified Storage System: BUI unavailable and seeing errors like "failed to update kstat chain: Not enough space" System pool is 'full'The system may experience 'hang' conditions due to the system zpool nearing 100% capacity. In this situation, try to reduce the system pool capacity (to below 80%). See Document 1392082.1 - Sun Storage 7000 Unified Storage System: How to free some space on system pool Faulty (?) hardware issueFor example, a 'flaky' disk which isn't getting faulted can cause akd to be mostly but not completely unresponsive. Check the FMA events (in the BUI : Maintenance > Problems > Active Problems) for a bad disk not getting faulted. If such a disk is preventing completion of zfs operations within a normal timeframe, the nas lock may be getting held up together with the zfs lock, causing akd to be unresponsive. See Document 2055701.1 - Oracle ZFS Storage Appliance : Identifying Bad Disk Drives Causing Performance and Other Problems Issue with supportbundle upload or 'phone home'For any system running Appliance Release versions earlier than 2011.1.6.0, the scrk/curl thread within the akd daemon can hang. See Document 1553935.1 - Sun Storage 7000 Unified Storage System: BUI/CLI hang when attempting to 'phone home' or upload supportbundle Changing the replication target IP addressAKD can hang when changing the replication target IP address if the 'old' IP address is unavailable. See bug 18827266 (Updating target IP can hang up the NAS class if the old IP is unavailable) Fixed in Appliance Firmware Release 2013.1.6.0
Issues with Replication updatesReplication Issue: 22120225 - BUI/CLI Inaccessible ( Cloud Infrastructure) This appears to be a Replication issue: it seems that bugs: 22259667 - akd is slow waiting for zfs property update due to large number of datasets Closed/Could Not Reproduce and in turn: 21116328 - nas_cache needs more scalable locking for property handling code ... work in progress are potentially the underlying issue here.
====================================================================
Further Assistance Required:At this point, if you have validated that each troubleshooting step above is true for your environment and the problem still exists, further troubleshooting is required.
It may be necessary for the Oracle Support Engineer to remotely run some 'emergency shell' commands. To accomplish this, the Oracle Support Engineer may request that you initiate an Oracle Shared Shell session. It would be useful if you are already familiar with this remote access tool - please see:
https://www.oracle.com/us/support/systems/premier/shared-shell-sun-systems-163755.html
Other useful information:1. The Online Appliance Wiki documentation can be found at: https://<appliance-ip-address>:215/wiki/index.php 2. To upgrade to the latest Appliance Firmware Release: There are many improvements in later Appliance Firmware releases, please check the current Appliance Firmware revision and, if required, upgrade to the latest release: 3. If the BUI and CLI are completely hung, and you are unable to access the console via the Service Processor, then if you wish to reset the system and still gather some useful diagnostic information you can do this by issuing a NMI reset to the system. This will cause the system to gather a kernel crash dump. The procedure to do this is documented in: Document 1173064.1 - Sun Storage 7000 Unified Storage System: How to generate NMI to collect a system core dump
Back to Document 1416406.1 ZFS Storage Appliances Troubleshooting Resource Center.
***Checked for relevance on 24-MAY-2018*** References<NOTE:1391232.1> - Sun Storage 7000 Unified Storage System: The work flow to check memory usage of the akd.<NOTE:1416406.1> - Sun ZFS Storage Appliances Troubleshooting Resource Center <NOTE:1392082.1> - Sun Storage 7000 Unified Storage System: How to free some space in the 'system' pool <NOTE:1494369.1> - Sun Storage 7000 Unified Storage System: BUI unavailable and seeing errors like "failed to update kstat chain: Not enough space" <NOTE:1334777.1> - Sun Storage 7000 Unified Storage System: System hang - aksh scripts can exhaust memory <NOTE:1392845.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Loss of Network Connection to the Management Interface <NOTE:1325025.1> - Sun Storage 7000 Unified Storage System: aksh fatal error: no memory <NOTE:1408593.1> - Sun Storage 7000 Unified Storage System: Creation/deletion of large amount of VDI LUNs can cause BUI/CLI hang <NOTE:1401595.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to 'excessive' analytics collected <NOTE:1468128.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to analytics compilation (CCP) bug <NOTE:1410873.1> - Sun Storage 7000 Unified Storage System: SMF unable to spawn processes due to contract exhaustion <NOTE:1572205.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hangs when accessing the 'status' or 'analytics' page <NOTE:1642216.1> - Exalogic: SSH fails to connect to ZFS node with message "aksh-wrapper: No Such file or directory" <NOTE:1553935.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hang when attempting to 'phone home' or upload supportbundle <NOTE:1019887.1> - Sun Storage 7000 Unified Storage System: How to Collect a Support Bundle using the BUI or CLI <NOTE:1345655.1> - How to Identify the Serial Number of a ZFS Storage Appliance or 7000 Series Unified Storage System <NOTE:1173064.1> - Oracle ZFS Storage Appliance: How to generate a system core dump in case of system hang (BUI and CLI fails to respond) using NMI when directed to do so by an Oracle Support Engineer <NOTE:1401288.1> - Sun Storage 7000 Unified Storage System: Data collection for akd hang issues <NOTE:1408493.1> - Sun Storage 7000 Unified Storage System: BUI/CLI hang due to 'excessive' amount of 'old' log files <NOTE:1543359.1> - Sun Storage 7000 Unified Storage System: Restarting the Appliance Kit Management Daemon (AKD) may impact production data services Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||
|