![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1610490.1 : Sun Storage 7000 Unified Storage System: How to troubleshoot a (kernel) memory shortage
In this Document
Created from <SR 3-8078739601> Applies to:Sun Storage 7110 Unified Storage System - Version All Versions and laterSun Storage 7210 Unified Storage System - Version All Versions and later Sun Storage 7310 Unified Storage System - Version All Versions and later Sun Storage 7410 Unified Storage System - Version All Versions and later Sun ZFS Storage 7420 - Version All Versions and later 7000 Appliance OS (Fishworks) SymptomsThe following symptoms can be used to determine if we are in a kernel memory shortage situation : When a memory shortage occurs, the system typically collapses this way: - Data services become unavailable, - The BUI and CLI are unavailable also. - A connection to the ZFSSA through SSH is impossible. - Any connection to the ILOM of the ZFSSA through SSH, then starting the console shows no prompt at all.
Those are the typical symptoms helping to determine we are in a kernel memory shortage situation. ChangesThis can happen while no recent change has been committed to the system recently. CauseThe following troubleshooting steps can be done before calling the support center :
1/ Confirm we currently have in CLI > maintenance hardware show the full and regular amount of memory the system used to have. To do so, error and alert messages at the BUI can be checked. But, DIMMs can either fail showing an error message or can't be blacklisted by Solaris. This silent failure for DIMMs is called "Operating System DIMM blacklisting" and does not show any error message at the BUI. Those messages can possibly not be logged anywhere else than in the console. Once Solaris has printed those messages on the console at the first reboot, then it does not print anything else at any next reboot, nor logs any error in the log messages.Those DIMMs are just considered as out of the configuration. To troubleshoot those silent or cleared DIMM failures, customers have to check that the total amount of memory configured at installation is still present.
2/ If there are Readzillas on the ZFSSA, confirm we are not using too much memory with the L2ARC memory headers
If a system runs out of memory, it can be because of L2 headers buffers, filling up the ARC and preventing it from growing. This can happen when we add new trays to existing configurations, having to deal with bigger pools. This can also happen when we replace 'old' 100G SSDs with larger 500G SSDs. This might also happen when we configure small block sizes for shares or luns.
Note 1573028.1 : Sun Storage 7000 Unified Storage System: How to check how much memory the L2ARC headers are occupying in the ARC. This doc will allow to check either if we are hitting this configuration issue, but also if we might hit it one day, given the current configuration of the ZFSSA.
SolutionCall your support representative having checked the symptoms above are checked, and ask for help in this situation of memory shortage. Do not reboot the appliance.
Your support representative will collect a crash dump using NMI, and possibly engage next level of support to have the crash dump analyzed. Note 1173064.1 : Sun Storage 7000 Unified Storage System: How to generate NMI to collect a system core dump.
If a remote connection can be done, then TSC will check how memory behaves. TSC may need to engage next level of support to connect remotely on the appliance to do so.
The next level of support will check as a first step if there is evidence of a memory leak. As a second step, we should focus on the ARC : the Adaptive Replacement Cache is ZFS first level of cache. It grows with high use. When we are under memory pressure, the ARC tries to react to that memory pressure from the operating system. Sometimes, in some specific circumstances, the ARC may fail to react to that memory pressure and may exceeds its target. In that case, we must figure out what caused that memory pressure, and check if any memory leaks exists on any buffers. When all pathological behaviour is out of the scope, then limiting the ARC to leave more memory to the kernel may be required.
TSC should check this document : Note 1602108.1 : Sun Storage 7000 Unified Storage System: Tuning the ARC in case of memory pressure References<NOTE:1194226.1> - Oracle Shared ShellAttachments This solution has no attachment |
||||||||||||||||||||
|