![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1008390.1 : How to Verify whether a System Reboot is Caused by a Fatal Reset or a Red State Exception
PreviouslyPublishedAs 211473 Applies to:Sun Fire V445 Server - Version Not Applicable and laterSun Fire V480 Server - Version All Versions and later Sun Fire V490 Server - Version All Versions and later Sun Fire V880 Server - Version All Versions and later Sun Fire V890 Server - Version All Versions and later All Platforms To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, Oracle Entry level Servers. Goal This document will help identify if the reason for an unexpected or unexplained system reboot is due to a Fatal reset error or a Red State Exception (RSE) condition. SolutionSteps to Follow When errors like these occur, the OS is abruptly interrupted and can not continue to log error messages in /var/adm/messages or generate a core file. As a result, the system reboots but the error messages and all output will only appear on the system console (will be in console logs). So in order to do further troubleshooting, it is very important to gather the complete console logs at the time of the error (reboot). NOTE: If there is no console log, no useful further analysis leading to a root cause can be performed. 1. The system reboot could be due to fatal reset errors. The fatal errors are most often caused by hardware (bad CPU, MB switches, I/O bridge, etc.) and are the result of an 'illegal' hardware state that is detected by the system. The Fatal Reset error and all output are only logged to the system console (ttya or RSC). Here are examples of fatal errors caused by CPU and motherboard switch ASICs (the full fatal reset output is too long and is not included):
For systems using ALOM serial console the fatal error would be reported as:
When your system reboots after fatal error, you may also see a notice in the /var/adm/messages file like this one:
Also, the prtconf -vp may show Fatal Sys Hardware message under " reset-reason: "
In case the console logs have fatal errors. If your system is experiencing these errors, please contact a qualified engineer at My Oracle Support (MOS) for assistance. 1.a) For the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) and UltraSPARC IIIi platforms (V210/V240, V440) a trained MOS Engineer has access to important information along with an AFAR decoder and will carefully guide you through the steps to resolution. My Oracle Support can also assist you if you are experiencing V480 Fatal Resets with specific network and I/O configurations. 2. The unexpected reboot could also be due to Red State Exception (RSE) errors. The user needs to verify if the console output has any Red State Exception (RSE) errors. The RSE can be triggered by both Software and/or Hardware, but this condition is most commonly due to a hardware fault (bad DIMM or bad CPU/ L2SRAM). The RSE error and all output are only logged to the system console (ttya or RSC) and usually is reported by one of the CPUs:
If your system does reboot after RSE, you may also see ONLY a notice in the /var/adm/messages file like this one:
The prtconf -vp may show RED CPU RED-State message under " reset-reason: " #prtconf -vp
System Configuration: Sun Microsystems sun4u Memory size: 32768 Megabytes System Peripherals (PROM Nodes): banner-name: 'Sun Fire 880' watchdog-enable: reset-reason: 'RED CPU RED-State' <--- reset-reason In case the console logs have RSE errors, once again, this is a critical issue where you will need a qualified MOS Support Engineer to assist you, so please contact a qualified engineer at MOS for assistance.: 2.a) for the UltraSPARC III/IV platforms (280R, V480/V880, V490/V890) and UltraSPARC IIIi platforms (V210/V240, V440) please contact MOS for assistance.
References<NOTE:1000380.1> - Sun Systems Equipped With Schizo ASICs Version 2.3 or Higher May Experience Either Domain Stop (Dstop), Domain Pause or FATAL RESET Under Heavy I/O<NOTE:1003588.1> - V480 Fatal Resets with specific network and I/O configurations <NOTE:1004903.1> - Event Messages for UltraSPARC-III[R], UltraSPARC-III+[R], UltraSPARC-IIIi[R], UltraSPARC-IV[R] and UltraSPARC-IV+[R] CPU Modules <NOTE:1006524.1> - Troubleshooting Sun Fire[TM] V880/V890 FATAL Resets <NOTE:1006530.1> - Troubleshooting Sun Fire[TM] V880 RED STATE EXCEPTION <NOTE:1012214.1> - Troubleshooting Red State Exception Memory Errors Attachments This solution has no attachment |
||||||||||||
|