Asset ID: |
1-71-1362871.1 |
Update Date: | 2017-11-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1362871.1
:
How to Perform On Site Diagnosis for a Down System for Sun Fire V125, V210, V215, V240, V245, V250, V440, V445, Sun Enterprise 250, Netra[TM] 210, 240, 440 Servers and Ultra 1/5/10/2x/30/45/60/80, Sun Blade 1x00/2x00 Workstations:ATR:1362871.1:2
Related Items |
- Sun Ultra 80 Workstation
- Sun Fire V215 Server
- Sun Enterprise 250 Server
- Sun Fire V245 Server
- Sun Ultra 45 Workstation
- Sun Fire V240 Server
- Sun Ultra 10 Workstation
- Sun Blade 2000 Workstation
- Sun Fire V445 Server
- Sun Netra 210 Server
- Sun Fire V250 Server
- Sun Fire V125 Server
- Sun Netra 440 Server
- Sun Fire V440 Server
- Sun Blade 1000 Workstation
- Sun Fire V210 Server
- Sun Ultra 25 Workstation
- Sun Ultra 24 Workstation
- Sun Blade 2500 Silver (1.6 Ghz) Workstation
- Sun Blade 2500 Workstation
- Sun Blade 2000 Workstation
- Sun Ultra 1 Workstation
- Sun Ultra 30 Workstation
- Sun Blade 1500 Workstation
- Sun Ultra 60 Workstation
- Sun Netra 240 (AC) Server
- Sun Ultra 5 Workstation
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP
Applies to:
Sun Netra 440 Server - Version Not Applicable and later
Sun Netra 240 (AC) Server - Version Not Applicable and later
Sun Fire V215 Server - Version Not Applicable and later
Sun Enterprise 250 Server - Version Not Applicable and later
Sun Ultra 45 Workstation - Version Not Applicable and later
Information in this document applies to any platform.
Goal
To aid Field Engineers in On site diagnosis of Down Hard Systems.
Common symptoms for system that is "down":
- A "down" system is unable to boot.
- It may fail to poweron.
- It may be at the OBP prompt ("ok").
- It is not reachable or "pingable".
- The system may experience a random poweroff with no ALOM/Solaris event logs
Solution
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE ENGINEER NEED:(IS A SITE ENGINEER AVAILABLE?)
System Controller, ALOM Application, Intermidiate Solaris Skills
TIME ESTIMATE: 120 Minutes
TASK COMPLEXITY: 2
FIELD ENGINEER INSTRUCTIONS
PROBLEM OVERVIEW:
Down System
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :
Down Hard, unknown reason.
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
1. Validate whether the system is powered on or has shutdown.
- Are the LEDs on, are the fans spinning? If nothing is powered on, then the issue is external to the server.
- Let the customer investigate the system's power source, power cords, power supplies, etc. for a potential issue.
- Refer to Doc ID 1018104.1 for help on diagnosing poweron issues on Sparc platforms.
- The Sun Fire V215/V245 system may experience an erroneous overtemp alarm resulting in an abrupt power off of the system. Messages similar to the following will be seen in the ALOM event log ("showlogs -v" command):
Feb 02 10:46:16 : 00040029: "Host system has shut down."
Feb 02 10:46:56 : 00070002: "Indicator PS0.DC_OK is now OFF"
Feb 02 10:46:56 : 00070002: "Indicator PS1.DC_OK is now OFF
- In addition, the "OVERTEMP LED" on the front of the system chassis will be lit (amber). If this is reported on V215/V245, the problem is due to a known bug, which is fixed with patch ID# 139735-01. Refer to Doc ID 1019824.1 for more troubleshooting details.
2. Validate that there are no Fault LEDs (Service Lights) on System Components.
- In many cases, you will replace a component if it's Wrench Light is lit.
- If only a single board is showing a Fault light, replacing it is not a bad idea.
- If multiple boards show wrench lights, it may be a good idea to switch the keyswitch to Standby, then back to Diagnostics and monitor the POST execution to see what faults are produced.
3. Verify if the system's keyswitch is "On".
- The keyswitch should be set to On (the "pipe" symbol (|) on the keyswitch) when the system is in normal operation.
- If the keyswitch is set to Standby, you need to turn it to "On" which will initiate POST and then boot.
- If the keyswitch is set to Diagnostics, you need to look at the console or terminal output to validate if POST is executing. The system should boot after POST completes.
4. Once power is confirmed, connect to SC (system) console and request the messages that are showing up on the console.
- You might be able to have a customer log in remotely or attach a terminal directly to the system.
- The terminal should be able to be connected to the system serial port (ttya).
- Each customer site is different, so they may have this port attached to a dumb terminal or a terminal concentrator.
- Doc ID 1004222.1 explains how to setup console logging and gather diagnostic information.
- If able to connect to the console, request and check what messages are being displayed.
Try to validate if the system is executing Power On Self Test (POST) or fsck.
- If it is executing POST, the testing MUST complete before the system can be booted. Interrupting POST (to get faster to OK prompt) will cause the system to go to an undefined state.
- If nothing is displayed, see if clicking "carriage return" a few times results in the "ok" prompt being displayed. If this works and displays the "ok" prompt, type "boot" now and monitor the boot process.
- Once console is established Solaris is often at the 'fsck' prompt asking for confirmation to fsck root or another filesystem. Answer 'y' to proceed with the fsck.
5. If you are able to get to OBP, try to type in "boot" and monitor the boot process.
- # In case the system fails to "boot", refer to Diagnosing Issues booting off disk for Sun SPARC Systems (Doc ID 1002932.1) for guidelines and troubleshooting steps.
Known boot issue impacting the following SPARC platforms:
- Sun Fire V210/V240 Server with OBP firmware 4.30.3.b (as delivered in patch ID# 142700-01)
- Netra 210/240 Server with OBP firmware 4.30.3.b (as delivered in patch ID 142700-01)
- Systems with patches that deliver OBP firmware version 4.30.3, 4.30.3.b or 4.30.4 installed (listed below) will Fail to Boot if they have a PCI card with an internal pci-pci bridge installed in certain PCI slot(s). Refer to Doc ID 1022142.1 for more details.
6. Obtain information from the following ALOM commands, which may help to isolate the problem:
showfru, showhost, showenvironment (just after the poweron command given), showlogs -v, consolehistory -v, showfaults -v, showplatform
If unsure how to proceed, or unable to perform the above process, collect as much information pertaining to the boot failure as possible (console logs, error messages, etc), call back in and request next available engineer.
OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Customer should verify system is stable for return to production.
CAUTION: If while on site the Field Engineer cannot solve the problem or requires additional assistance, update this SR requesting the SR be transferred to the next available Engineer, otherwise the SR may auto-close when the FE completes his site visit.
PARTS NOTE:
No parts required for this action plan. Parts may end up being required, but they are not part of this Action plan. Another Action Plan may be necessary.
REFERENCE INFORMATION:
Product Documentation: Service Manuals, Admin Manuals, Product Notes:
Sun Fire V125
Sun Fire V210
Sun Fire V215
Sun Fire V240
Sun Fire V245
Sun Fire V250
Sun Fire V440
Sun Fire V445
Netra 210
Netra 240
Netra 440
References
<NOTE:1002932.1> - Diagnosing Boot Issues on V210/V240/V215/V245/V440/V445, T1000/T2000, V480/V490/V880/V890 servers
<NOTE:1022142.1> - Patches Delivering OBP Firmware Versions 4.30.3, 4.30.3.b or 4.30.4 (WITHDRAWN) may Cause a System to Fail to Boot
Attachments
This solution has no attachment