![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1356790.1 : How to Perform On Site Diagnosis for a Down Starcat System:ATR:1356790.1:4
In this Document
Oracle Confidential (INTERNAL). Do not distribute to customers
Applies to:Sun Fire E20K Server - Version: Not ApplicableSun Fire E25K Server - Version: Not Applicable and later [Release: N/A and later] Sun Fire 12K Server - Version: Not Applicable and later [Release: N/A and later] Sun Fire 15K Server - Version: Not Applicable and later [Release: N/A and later] Information in this document applies to any platform. GoalTo aid Field Engineers in On site diagnosis of Down Hard Systems******************************************************************************** To report errors or request improvements on this procedure, please go to http://support.us.oracle.com and put a comment on Doc ID: 1356790.1 ******************************************************************************** SolutionDISPATCH INSTRUCTIONSWHAT SKILLS DOES THE ENGINEER NEED:(IS A SITE ENGINEER AVAILABLE?) System Management Services (SMS), Intermediate Solaris Skills Time Estimate: 120 minutes TASK COMPLEXITY: 4 FIELD ENGINEER INSTRUCTIONS PROBLEM OVERVIEW: Down System WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? : Down Hard, unknown reason. WHAT ACTION DOES THE ENGINEER NEED TO TAKE: 1. Validate whether the system is powered on or not (or if board power issues are present). # Are the LEDs lit, are the fans spinning? If nothing is powered on, then the issue is external to the server. # Confirm power to all the AC PSU's in the cabinet. # Investigate the system's power source, power cords, etc for a potential issue. 2. Validate the customer can log into the system controllers. # Inquire on the status of all the domain's System or I/O Boards. Make sure they can all power on or else the domain may fail to boot. (showboards, poweron )
3. Validate that the domain in question is not currently executing POST.
POST needs to complete before the domain reaches OBP ("ok" prompt) and then the domain can be booted. To display the state of the domain from the system controller run:
# If post is running, once it completes check the the most recent post log file ( $SMSVAR/SMS/adm/<domain letter>/post/ ) to ensure it ran to completion and the hardware passed post.
4. Validate the customer can connect to the domain console.
5. If it is able to get to OBP, it may or may not "auto boot" depending on configuration. If it stops at the ok prompt try typing boot and see what happens. "Auto-boot" can be configured in 2 places. #Setting auto-boot? at the ok prompt:
Other settings noted above may effect booting behavior as well. diag-switch? should be set to the default which is false. If it is true, the system will attempt to boot off the diag-device which is usually the network. boot-device settings may vary. See Step 5 for a more complete discussion of boot-device. # Setting auto-boot from the system controller:
6. Boot device issue are often causes of failure to boot. Trace the validity of the boot device. If the device being booted is an alias defined in devalias at the OBP, the device that the alias references must exist in probe-scsi-all.
# Alternate Boot device Often it is useful to boot off alternate boot devices to test whether the OS on the primary device is corrupt. It is also common to boot off of the OS mirror disk when the primary mirror is experiencing hardware issues. An alternate device might be the the network, a root mirror or an alternate disk image. The alternate boot devices are usually listed in the output of devalias. Alias names can be created, so there is no way to list all known aliases, but vx-rootdisk vx-rootmirror are common with Veritas Volume Manager environments. Any alias with the word mirror should also be investigated as a possible alternate. 6. Other aids in troubleshooting boot issues. # Verify Devices in POST. See above info on post logs. During a POST run, items that are CHS'ed are also listed at the beginning of post log. This can be compared to the showchs -b output.
# Verbose booting options for boot hanging. It is often helpful where booting hangs after seeing the SunOS starting to gather additional data. In cases like this it is useful to put Solaris into a verbose boot with a boot -v at the ok prompt. The auto-boot? setting must be set to false to prevent normal booting to allow manual boot commands. See Step 4 for information on setting auto-boot? to false. If the boot operation appears to hang in the middle of disk probing, this could give additional insight into the cause of the boot failure.
Note disablecomponent takes effect on the next boot.
sb9: will be disabled at the next post execution.
Remember to set the parameters back to their original after testing. Final Word on boot issues: If unsure how to proceed, or unable to perform the above process, collect as much information pertaining to the boot failure as possible (console logs, error messages, etc) and call back in and request next available engineer. OBTAIN CUSTOMER ACCEPTANCE WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE: After booting customer will need to verify system meets production requirements. PARTS NOTE: Parts may end up being required, but they are not part of this Action plan. Another Action Plan may be necessary. REFERENCE INFORMATION: Service Manuals, Admin Manuals, and SMS Command reference manuals: http://download.oracle.com/docs/cd/E19065-01/index.html KEYWORDS: ERRORS: Attachments This solution has no attachment |
||||||||||||
|