Asset ID: |
1-71-1361150.1 |
Update Date: | 2017-11-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1361150.1
:
How to Perform On Site Diagnosis for a Down System for Sun Fire T1000/T2000, SPARC Enterprise T1000/T2000, Sun Blade T6300 and Netra T2000:ATR:1361150.1:2
Related Items |
- Sun Fire T1000 Server
- Sun Blade T6300 Server Module
- Sun Fire T2000 Server
- Sun Netra T2000 Server
- Sun SPARC Enterprise T1000 Server
- Sun SPARC Enterprise T2000 Server
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
|
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP
Applies to:
Sun Fire T1000 Server - Version Not Applicable and later
Sun SPARC Enterprise T2000 Server - Version Not Applicable and later
Sun Fire T2000 Server - Version Not Applicable and later
Sun SPARC Enterprise T1000 Server - Version Not Applicable and later
Sun Netra T2000 Server - Version Not Applicable and later
Information in this document applies to any platform.
Goal
To aid Field Engineers in On site diagnosis of Down Hard Systems
********************************************************************************
To report errors or request improvements on this procedure,
please go to http://support.us.oracle.com and put a comment on Doc ID: 1361150.1
********************************************************************************
Solution
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE ENGINEER NEED:(IS A SITE ENGINEER AVAILABLE?)
System Controller, ALOM Application, Intermidiate Solaris Skills
TIME ESTIMATE: 120 Minutes
TASK COMPLEXITY: 2
FIELD ENGINEER INSTRUCTIONS
PROBLEM OVERVIEW: Down System
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :
Down Hard, unknown reason.
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
1. Validate whether the system is powered on or has shutdown (or if board power issues are present).
- Are the LEDs lit, are the fans spinning? Let the customer investigate the system's power source, power cords, power supplies, etc. for a potential issue.
- Refer to Doc ID 1018104.1 for help on diagnosing poweron issues on Sparc platforms.
- Visually inspect the top cover interlock switch, aka "Chassis Intrusion Switch". The system will not power-on if the Chassis Intrusion Switch is misplaced or damaged.
- The system may experience a shutdown with SC alert: "Chassis cover removed"
- If this is reported on Sun Fire or SPARC Enterprise T2000 and the top cover is NOT removed, the problem is due to a known bug, which is fixed in System Firmware 6.7.3. It is strongly recommended to upgrade the firmware to 6.7.3 or later. Refer to Doc ID 1020218.1 for more details.
2. Validate that there are no Fault LEDs (Wrench Lights) on System Components.
- In many cases, you will replace a component if it's Wrench Light is lit. Refer to the Service Manual, to monitor/ interpret the status of the server LEDs and to check the status of individual LEDs.
- When a Service LED(s) is on, the ILOM command show /SP/faultmgmt or the ALOM command showfaults provide details about any faults that can cause this indicator to be lit.
- Also, if a Fault/Service LED(s) is on, it may be a good idea to set the virtual keyswitch to Diag (see step 3) and monitor the POST execution to see if any faults are reported.
3. Verify if the system's keyswitch is "On".
- The system keyswitch on T1000/T2000 is virtual, not physical. To change the position of the virtual keyswitch, use the ALOM command sc> setkeyswitch. You can check the position of the virtual keyswitch via sc> showkeyswitch. The virtual keyswitch should be set to 'Normal' when the system is in normal operation.
- When the keyswitch is set to 'Standby', this will disable the poweron command or button from operating.
- If the virtual keyswitch is set to 'Diag', this will force the system to run servicemode diagnostics and you'll need to look at the console or terminal output to validate if POST is executing. The system should boot after POST completes.
- The 'Locked' position of the keyswitch disables flashupdate and break commands, but it doesn't affect the poweron command or button.
4. Once power is confirmed, connect to SC (system) console and request the messages that are showing up on the console.
- Each customer site is different, so they may have this port attached to a dumb terminal or a terminal concentrator. The terminal can be connected to the SC serial management port (or login remotely if SC network management port is configured).
- Doc ID 1004222.1 explains how to setup console logging and gather diagnostic information.
- If able to connect to the console, ask what messages are being displayed and collect the console output.
Try to validate if the system is executing Power On Self Test (POST).
- If it is executing POST, the testing MUST complete before the system can be booted. Interrupting POST (to get faster to OK prompt) will cause the system to go to an undefined state.
- If nothing is displayed, see if clicking "carriage return" a few times results in the "ok" prompt being displayed.
5. If you are able to get to OBP, try to type in "boot" and monitor the boot process.
- In case the system fails to "boot", refer to Analyzing boot failures or hangs on a Sun Fire[TM] T1000/T2000 server (Doc ID 1008345.1) for guidelines and troubleshooting steps.
- Some errors reported during boot may be caused by missing patches, /etc/system file settings or down-rev firmware: T1000 may report SUNW-MSG-ID SUNOS-8000-1L
- T2000 may report SUNW-MSG-ID SUN4-8000-ER, SUN4-8000-0Y, SUN4-8000-75, SUN4-8000-D4, or the PCI-Express subsystem.
- Refer to Doc ID 1000044.1 for more details.
6. Obtain ALOM information from the following commands which may isolate the problem:
showfru, showhost, showenvironment (just after the poweron command given), showlogs -v, consolehistory -v, showfaults -v, showplatform
NOTE: for guidance on collecting data refer to Troubleshooting data needed for TX000 servers (Doc ID 1518205.1)
7. If unable to perform the above process, collect as much information pertaining to the boot failure as possible (console logs, error messages, etc), call back in and request next available engineer.
CAUTION: If while on site the Field Engineer cannot solve the problem or requires additional assistance, update this SR requesting the SR to be transferred to the next available Engineer, otherwise the SR may auto-close when the FE completes his site visit.
OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Customer should verify system is stable for return to production.
PARTS NOTE:
No parts required for this action plan. Parts may end up being required, but they are not part of this Action plan. Another Action Plan may be necessary.
REFERENCE INFORMATION:
Product Documentation: Service Manuals, Admin Manuals, Product Notes:
Sun Fire T1000
Sun Fire T2000
SPARC Enterprise T1000
SPARC Enterprise T2000
Netra T2000
Sun Blade T6300
References
<NOTE:1518205.1> - Troubleshooting data needed for TX000 servers
Attachments
This solution has no attachment