![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Troubleshooting Sure Solution 1009309.1 : Proactive setup/troubleshooting of a Sun Fire[TM] 280R
PreviouslyPublishedAs 212887 Applies to:Sun Fire 280R Server - Version Not Applicable and laterAll Platforms PurposeThis document describes how to set up your system, Sun Fire [TM] 280R, so that in case trouble arises Sun support will be able to troubleshoot the system as good and as efficient as possible. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - SPARC Legacy Servers
Troubleshooting Steps1) Patches
How to find the Oracle Solaris Critical Patch Update (CPU) Patchsets, Recommended OS Patchsets for Oracle Solaris and Oracle Solaris Update Patch Bundles 1272947.1
2) Open Boot Prom (OBP)
diag-switch = true diag-level = min diag-script = normal auto-boot = true diag-device = error-reset-recovery = sync
* With the diag-switch set to true booting can take a long time, especially if the system contains a lot of memory. When this is not acceptable set it to false. * With the diag-script set to normal obdiag tests all devices expected to be present in the baseline configuration, so no pci cards. * With error-reset-recovery set to sync OBP invokes a sync, which will create a crash dump, after a XIR or a Red state * diag-level should be the factory default min. See - Guidance for POST Diagnostic Level Setting on Sun Fire[TM] 280R, V480, V490, V880, V890 and V880z servers. (Doc ID 1582330.1) 3) Configure the Remote System Controller (RSC)
SUNWrsc On the host system - RSC SUNWrscd On the host system - RSC user guide SUNWrscj On a client - RSC gui - Configure the RSC: # /usr/platform/`uname -i`/rsc/rsc-config
input-device = rsc-console output-device = rsc-console diag-out-console = true
4) Enable the watchdog reset mechanism
watchdog_enable=1
* a reboot is necessary to activate the setting 5) Configure Solaris to save a crash dump to disk after a panic
Crash dumps vary in size based on the memory configuration of the system and how much of that memory was in use. On systems with relatively small amounts of RAM (up to 5 GB), a guideline is to allow 35% of the amount of RAM per crash dump. For larger amounts of RAM, 2 GB is usually sufficient.
Crash dumps are enabled by default, and unless the dumpadm command was used to change it, the dump device is the primary swap partition (the first one listed by the swap -l command). If the dump device is a regular partition (begins with /dev/dsk), and is of sufficient size, no further configuration is necessary.
If the swap partitions are encapsulated by DiskSuite, you must use the name of the encapsulated partition, not one of the raw partitions it is made from. The output from dumpadm should look something like this one: Dump device: /dev/md/dsk/d1 (swap) If you are using the primary swap partition, use this dumpadm command to configure it: # dumpadm -d swap
If there is a spare partition with sufficient space, use the "dumpadm -d" command to configure that as the dump device. If the only available space for the dump device is an encapsulated Veritas partition, you must provide the path of the original disk device name, rather than the Veritas encapsulated path name for a dedicated device. For example: Dump device: /dev/dsk/c6t0d0s1 (dedicated) vs Dump device: /dev/vx/dsk/swapvol (dedicated) * For more info on setting up a dump device see: Technical Instruction Document 1004803.1 - Collecting System Crash Dump Images on Solaris[TM] 7 and later Technical Instruction Document 1017485.1 - Determining Approximate Crash Dump File Size 6) Configure an external loghost for the message files
* For information on the syslog mechanism see the following documents on sunsolve: Technical Instruction Document 1007237.1 - Setting up and debugging logging to remote hosts Technical Instruction Document 1004455.1 - Working with the Solaris[TM] Operating Environment messaging and logging daemon 7) When we do not have a stable system
diag-switch = true diag-script = all test-args = verbose, subtests * With the diag-script set to all obdiag tests all devices expected to be present in the baseline configuration, including pci cards.
set snooping=1 set snoop_interval=9000000 * A reboot is necessary to activate the setting * Enabling the deadman kernel will cost performance so do not leave this on as a default * Technical Instruction Document 1004530.1 - KERNEL: How to enable deadman kernel code. 8) Configure an console loghost
* For more info on setting up a console logging: Technical Instruction 1008702.1 - Console Logging Options to capture Fatal Reset output for Sun systems. 9) What to do when the system hangs - What is exactly hanging (system, RSC, network) a) The main system with Solaris
b) The RSC card
consolehistory showenvironment loghistory version - log in to the console (from the RSC) - when solaris is up and running
- when console found on ok-prompt
- when no output from the console
c) The network
* When there is no response from the system at all
10) What to do when the system has panicked and automatically rebooted
11) What to do when the system has panicked and sits at the ok-prompt
12) What information will Sun usually ask for in these situations
* Run explorer as follows: /opt/SUNWexplo/bin/explorer -w fru,default . The output file of the explorer will be located in /opt/SUNWexplo/output. * The crash dump consists of two files: unix.(nr) and vmcore.(nr) located in /var/crash/`uname -n`.
Attachments This solution has no attachment |
||||||||||||
|