![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1004712.1 : Sun Fire[TM] 12K/15K/E20K/E25K: Domain reboot hangs at "resetting..." and does not run HPOST
PreviouslyPublishedAs 206542 Applies to:Sun Fire 12K Server - Version Not Applicable and laterSun Fire E25K Server - Version Not Applicable and later Sun Fire 15K Server - Version Not Applicable and later Sun Fire E20K Server - Version Not Applicable and later All Platforms Symptoms
May 19 22:06:38 2003 # init 0
May 19 22:06:43 2003 May 19 22:06:44 2003 INIT: New run level: 0 May 19 22:06:44 2003 The system is coming down. Please wait. May 19 22:06:44 2003 System services are now being stopped. May 19 22:06:54 2003 Print services already stopped. May 19 22:07:26 2003 The system is down. May 19 22:07:56 2003 syncing file systems... done May 19 22:08:03 2003 Program terminated May 19 22:08:43 2003 {2} ok boot May 19 22:08:43 2003 Resetting... May 19 22:53:12 2003 May 19 22:53:12 2003 @(#)OBP 4.5.20 2003/02/13 18:08 Sun Fire 15000 May 19 22:53:12 2003 IOSRAM based Console initialized May 19 22:53:12 2003 Probing Pseudo NVRAM device
CauseBelow SMS 1.3 # Cmdline: /opt/SUNWSMS/SMS1.3/bin/hpost -d B -Q
Unable to open .postrc file /etc/opt/SUNWSMS/config/B/.postrc Permission denied Errors in .postrc file. Bailing out! As that message clearly indicates, hpost can not read the .postrc file in question, so the domain remains at "Resetting..." trying to execute HPOST on the domain. Ultimately, a setkeyswitch off and on is executed and the domain posts just fine, and then boots back up. SolutionResolution
May 8 15:40:47 2003 rebooting...
May 8 15:40:47 2003 Resetting... May 8 15:47:49 2003 May 8 15:48:02 2003 May 8 15:48:02 2003 May 8 15:48:02 2003 Sun Fire 15000, using IOSRAM based Console May 8 15:48:03 2003 Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved. May 8 15:48:03 2003 OpenBoot 4.5, 94208 MB memory installed, Serial #44593284. May 8 15:48:03 2003 Ethernet address 0:0:be:a8:70:84, Host ID: 82a87084. May 8 15:48:03 2003 May 8 15:48:03 2003 May 8 15:48:03 2003 May 8 15:48:04 2003 Rebooting with command: boot May 8 15:48:04 2003 May 8 15:48:05 2003 Boot device: /pci@1c,600000/pci@1/scsi@2/disk@0,0:a File and args: / ** The default monitoring controls are on.
* To turn off all domains state monitoring, change domain_mon to 0. * To turn off all domains recovery actions, change domain_asr to 0. * domain_mon = 1 domain_asr = 1 NOTE: Each domain can also have it's own dsmd_tuning.txt file which controls how dsmd behaves only for that specific domain. The domain specific dsmd_tuning.txt file would be in the domain configuration directory, /etc/opt/SUNWSMS/config/. Make sure domain_asr is not disabled here either.
Domain ASR should be re-enabled by changing "domain_asr = 0" in the correct dsmd_tuning.txt files and then restart dsmd to re-read it's configuration file. Dsmd is best restarted by stopping and starting SMS, but first make sure that failover is off and no platform configuration changes are occurring when you do the stop and start of SMS. Make the changes to both SCs so that the configuration of dsmd is the same regardless of which SC is the MAIN.
-------------------
# Cmdline: /opt/SUNWSMS/SMS1.3/bin/hpost -d B -Q
Unable to open .postrc file /etc/opt/SUNWSMS/config/B/.postrc Permission denied Errors in .postrc file. Bailing out!
As that message clearly indicates, hpost can not read the .postrc file in question, so the domain remains at "Resetting..." trying to execute HPOST on the domain. Ultimately, a setkeyswitch off and on is executed and the domain posts just fine, and then boots back up. When a domain is rebooted, the sms-dsmd user is responsible for executing HPOST on the domain. When a domain is keyswitched on/off it is the sms-svc user (or d omain specific user if using ACL - Access Control Lists). These different users both must have access to the configuration files for HPOST in order to properly recover a domain if necessary. The .postrc files and blacklist files used in HPOST need to be world readable (644) regardless of the owner of the file. If world readable, both sms-svc and sms-dsmd can read and configure a domain properly at this "Resetting..." stage of OBP.
If the reboot which started this issue is a result of a cron job, or panic on the weekend or overnight when people aren't around, this hang at "Resetting..." may last for long periods of time until manual intervention can bring it back up. The basic warning here is disable asr only when instructed to do so by Sun support, but know the risks of doing so, if operating less than SMS 1.3. This issue also stresses the importance of being sure the HPOST configuration files have the correct permissions to avoid such lengthy downtime, regardless of SMS version. These seemingly trivial changes could result in a domain remaining down for extended periods of time as the result of something so basic as a reboot.
Bug ID 4521655 was filed for the domain_asr behavior. Just know that this "Resetting..." hang isn't a bug. This is how asr worked prior to SMS 1.3. See Problem Resolution <Document 1004778.1> for details on why domain_asr might be disabled. Bug ID 4658538 allowed asr to be disabled and still allow for domain recovery through a reboot. Previously Published As 70064
Attachments This solution has no attachment |
||||||||||||
|