![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 2043383.1 : Oracle ZFS Storage Appliance: Some systems with LOW Memory configurations might not run reliably/stable with 2013.1.x releases
It has been reported in some situations that ZFS Storage Appliances configured with 24GB of RAM display some instability after they have been updated from 2011 code to 2013.1.x (OS8) code. Applies to:Sun ZFS Storage 7120 - Version All Versions and later7000 Appliance OS (Fishworks) SymptomsIt has been reported in some situations that ZFS Storage Appliances configured with 24GB of RAM display some instability after they have been updated from 2011 code to OS8 code basis. Systems configured with 24GB are also referred to as 'low memory configurations' and after updating the appliance kit software from a 2011 code release to an 2013.1.x (OS8) release, the additional new features in that new major release require more memory to be allocated and could lead into a situation where the appliance might stop providing service to clients and appear hung due to kernel memory exhaustion. The affected system might run stable for days or even weeks and then stop, there is no sign of broken hardware or other defects on the system and a reboot clears the situation. After about the same time, the appliance again stops working and a reboot clears the immediate situation. Before the software update, the ZFS Storage Appliance has been working reliably and been stable for a reasonable time and since the update it seems stop working on a certain pattern.
CauseThe new software release is a major release and with it there are new features available to the system. These new software features might consume more memory and along with the demand of ZFS for memory the system could come into a state when all memory has been allocated and now new memory allocations - required for system functions - cannot be allocated. At his point, the kernel of the appliance seems to stop working, while in fact it is struggling with memory allocation. You may be impacted by Bug 19591405 (akd running in RT class is blocking critical memory management threads) - please Document ID 2018298.1
Bug 20648579 - comment: > This increase is specially significant between major releases, as this is the case: 2011.1.9 => 2013.1.2 > (while in 2011.1 there was 2%-3% free memory or 200M-300M). >I would suggest tuning ARC (capping) to 14848MB, leaving 1GB cushion.
SolutionTo determine the exact situation it will require a kernel crash dump, which will most likely be collected by issuing a NMI. Please engage Storage-TSC NAS Support for assistance.
Resolution options include : Change the akd threads to use the Time Sharing scheduling class instead of the Real Time Scheduling class In order to ease the memory reorganization and let the pageout and the kmem_cache_reap threads do their job, make the non cluster related threads
Limit the ZFS ARC (Adaptive Replacement Cache) Once the situation has been confirmed as Low Memory Configuration and Kernel running out of memory it may help to limit the ARC.
Rollback to the previous code release Another solution might be to rollback to the previous code release, if this is still on the system.
Install additional RAM An additional solution would be to install additional RAM. This will help to overcome situations where the system is running low on memory and the kernel is having difficulties allocating memory.
As the code is getting constantly improved, the handling of memory allocation and consumption may become optimized - so that no workaround or hardware update will be required.
***Checked for relevance on 30-MAY-2018*** References<BUG:20648579> - 7420 - AFTER UPGRADE TO 2013.06.05.2.12,1-2.1.6.2 AKD MEMORY UTILIZATION 2.66G<BUG:20922619> - ROOT CAUSE FOR MEMORY PRESSURE ISSUE <BUG:19591405> - AKD RUNNING IN RT CLASS IS BLOCKING CRITICAL SYSTEM MANAGEMENT THREADS <BUG:17709054> - 7120 LARGER FSYNC OPERATION SHOULD BE BLOCKED IN LOW MEMORY CONDITIONS <BUG:19949855> - OPTIMIZE THE NUMBER OF CROSS-CALLS DURING HAT_UNLOAD Attachments This solution has no attachment |
||||||||||||
|