![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1007746.1 : SunFire[TM] 12K/15K/E20K/E25K: Expected behavior of domains in different scenarios when the SCs are powered down or rebooted
PreviouslyPublishedAs 210728 Applies to:Sun Fire 15K Server - Version All Versions and laterSun Fire 12K Server - Version All Versions and later Sun Fire E20K Server - Version All Versions and later Sun Fire E25K Server - Version All Versions and later All Platforms ***Checked for relevance on 17-Jan-2014*** GoalWhat happens to my running domains when both System Controllers are powered off or at the ok prompt? SolutionTo begin, let's take a look at three important services that the System Controller provides to the domain.
First, each SC provides a 75 MHz clock source to the entire platform. The clock is generated by hardware on the SC and is present when the SC has power. The two clock sources (one from each SC) are synchronized so that if one fails, the domains can continue to run off the other clock source. If an SC is at the ok prompt, it will still provide a clock source to the domains.
Second, the MAIN SC monitors the environmental status of each component in the platform. This is accomplished by the esmd daemon over the I2C bus. The esmd daemon monitors for high or low temperatures, voltages, and current levels. If esmd detects a dangerous value, it can signal SMS to take the appropriate action to protect the hardware - for example, increasing fan speed or powering off components. Since this is part of the SMS software, it will not run when SMS is stopped. Third, SMS will monitor the Operating System on each domain to ensure that it remains up and running. The SC periodically sends out a heartbeat signal to the domain. If it doesn't receive a timely response, it will send a reset to the domain to recover it. The SC will also restore the state of the domain if it panics or crashes due to a hardware stop. If SMS is not running and you lose a domain, it will not come back up until SMS is restarted. SMS is designed to protect the hardware from getting into a state that can cause permanent damage to the components, like overheating or shorting out. Therefore, if you try to run without any SC monitoring the platform, SMS may take down the domains to prevent possible damage. The following chart describes several different scenarios and what will happen to the domains. | Example # | MAIN SC State | SPARE SC State | Domain State |-----------+------------------+-------------------+--------------------- | 1 | ok prompt | ok prompt | stay up | 2 | init 6 | ok prompt | stay up | 3 | sms stop | powered off | graceful shutdown | 4 | init 0/shutdown | powered off | graceful shutdown | 5 | init 6 | powered off | graceful shutdown | 6 | halt | powered off | stay up | 7 | send break | powered off | stay up | 8 | lose power | powered off | global stop
More details:
NOTE: In all instances where the domains are taken down, SMS will automatically restore them to their previous state when it is brought back up.
Here is a sample of the output produced when the domains are shut down: # ./sms stop sms: SMS is being shutdown on the only present and powered on SC. Sep 15 12:20:16 sc0 sms-svc: sms: SMS is being shutdown on the only present and powered on SC. All domains are being shutdown gracefully and all boards are being powered off. . . Sep 15 12:20:16 sc0 sms-svc: All domains are being shutdown gracefully and all boards are being powered off. . . To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in an appropriate
My Oracle Support Community - Oracle Sun Technologies Community.
Attachments This solution has no attachment |
||||||||||||
|