![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1001778.1 : Sun Fire[TM] 3800, 4800/4810, 6800, E2900, E4900, E6900, V1280 or Netra[TM] 1280, 1290 server: How to Gather Data from a Hung Domain [Video]
PreviouslyPublishedAs 202431 Applies to:Sun Fire E6900 Server - Version Not Applicable and laterSun Fire 3800 Server - Version Not Applicable and later Sun Fire 4800 Server - Version Not Applicable and later Sun Fire 4810 Server - Version Not Applicable and later Sun Netra 1280 Server - Version Not Applicable and later All Platforms GoalInstructions on how to gather data from a hung Sun Fire[TM] SF3800/SF4800/SF4810/SF6800/E4900/E6900/E2900/V1280. Available for this topic, a brief how-to video tutorial that provides step-by-step instructions answering Sun's most frequently asked questions. View the video and/or follow the detailed instructions below.
Solution
1. Ensure that the domain is actually hung: - Can you ping the domain? - Can you telnet to the domain? 2. Ensure that the SC (System Controller) is not hung, If you can access the System Controller, proceed to login to the SC and obtain a platform shell. A.If you get to the platform shell run the following commands: SCname:SC> showlogs SCname:SC> showplatform B. If the SC is hung See Document 1002033.1 for details on how to recover from a hung system controller. Then go back to step 2A. 3. Once in the platform shell attempt to get a domain shell: SCname:SC> console -d - If the command appears to hang, then we need to send a break signal to the domain. - if you are using telnet: Press CTRL ] at the telnet prompt type: send break - if you are connected to the SC via tip: use ~# At this point you should have a domain shell prompt, continue with the following commands, otherwise continue to step 4. - If you get the domain shell run the following commands: SCname:A> showdomain -p status SCname:A> showlogs Then type break to get to the OBP. if this takes you to the ok prompt then type sync to force a core file. 4. If you were not able to get to the ok prompt, then the system is really hung and we will need to send an XIR (externally initiated reset) to the domain. From the domain shell type: reset This command will give different behavior depending on what the OBP variable error-reset-recovery is set to. If this variable is set to sync, a core file will attempt to be taken. If it is set to boot, then the system will just reboot as if the boot command was issued at the ok prompt. If it is set to none it should drop you to the ok prompt, where you can run the following commands, the '#' sign represents the cpu that we took the XIR on, use that number in the cbuf command if possible run this command on each of the cpus (some depend on firmware level of the SC): {#} ok dump-sigblock {#} ok # cbuf {#} ok .xir-state-all - If you were not able to return to the ok prompt, but have a domain prompt type the following command: SCname:A> showresetstate 5. If none of these tactics work you may be forced in to just powering off the domain. If this is the case then do a setkeyswitch off for the domain.
Note: loghost setup for domain and platform may help in troubleshooting hang issues; please check reference section below for detailed information.
Instructions for Platforms employing Lights Out Management(LOM) 1. Ensure that the domain is actually hung:
2. Login to the LOM prompt via telnet/ssh or tip. A. once you get the lom prompt, run the following commands: lom>showsc -v lom>showlogs -v 3. Try to connect to the domain and see what state it is in: A. use the console commands to connect to domain lom> console B. If there's no response from console, use escape sequence to break out. The default escape sequence is "#." lom>console #. lom> C. Once the domain is confirmed to be un-reachable, go to next step. 4. Using the 'break' or 'reset' command to recover. A. Try to break into the OBP by 'break' and if you get to OBP, do a sync to collect a corefile. lom>breakThis will suspend Solaris. Do you want to continue? [no] yes Type 'go' to resume debugger entered.{3} ok sync B. If 'break' does not work, a 'reset' has to be used and 'showresetstate' collected as well. The behaviour of reset also depends on the settings used in OBP for error-reset-recovery which should preferably be set to 'sync'. lom>reset This will abruptly terminate Solaris. Do you want to continue? [no] yes lom>showresetstate 5. If none of the procedures above work, a poweroff/poweron needs to be issued. power off the platformlom> poweroff power on the platform, but do not start the domainlom> poweron allpower on the platform and start the domainlom> poweron
Also check:
References<NOTE:1002033.1> - Sun Fire[TM] v1280, E2900, 3800, 4800, 4810, 6800, E4900, E6900, and Netra 1280, 1290 Server: How to Recover from a Hung System Controller<NOTE:1008702.1> - Console Logging Options to capture Fatal Reset output for Sun systems <NOTE:1018813.1> - Sun Fire [TM] SF3800/SF4800/SF4810/SF6800 - E4900/E6900 Server: Domains running firmware 5.15.x or later with hang-policy set to "notify" may lose critical troubleshooting data <NOTE:778.1> - Troubleshooting Video Issues in MOS Attachments This solution has no attachment |
||||||||||||
|