Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1609483.1
Update Date:2017-11-27
Keywords:

Solution Type  Technical Instruction Sure

Solution  1609483.1 :   How to Perform On Site Diagnosis for a Down System for SPARC T5-2/T5-4/T5-8/T5-1B, Netra T5-1B Servers:ATR:1609483.1:2  


Related Items
  • Netra SPARC T5-1B Server Module
  •  
  • SPARC T5-8
  •  
  • SPARC T5-4
  •  
  • SPARC T5-2
  •  
  • SPARC T5-1B
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: SPARC-CAP VCAP
  •  




Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP

Applies to:

Netra SPARC T5-1B Server Module - Version All Versions and later
SPARC T5-4 - Version All Versions and later
SPARC T5-2 - Version All Versions and later
SPARC T5-8 - Version All Versions and later
SPARC T5-1B - Version All Versions and later
Information in this document applies to any platform.

Goal

To aid Field Engineers in On site diagnosis of Down Hard Systems

*****************************************************************************
To report errors or request improvements on this procedure,
please go to http://support.us.oracle.com and put a comment on Doc ID: 1609483.1
*****************************************************************************

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED:
SPARC T5-x, System Processor, ILOM/ALOM Application, OS Solaris Skills

TIME ESTIMATE: 120 Minutes

TASK COMPLEXITY: 2

FIELD ENGINEER INSTRUCTIONS

PROBLEM OVERVIEW:
System Down (unable to boot)

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

Down Hard, unknown reason.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

1. Validate whether the system is powered on or has shutdown.
# Are the LEDs lit? If nothing is powered on, then the issue is external to the server.
# Let the customer investigate the system's power source, cords, power supplies, etc. for a potential issue.
# Refer to Doc ID 1018104.1 for help on diagnosing power-on issues on Sparc platforms.

2. Validate that there are no Fault LEDs on System Components.
# When the Service LED is on, the ILOM commands show /SP/faultmgmt or show faulty provide details about any faults that can cause this indicator to be lit.
# Also, if a Fault LED(s) is on, it may be a good idea to set the virtual keyswitch to Diag (see step 3) and monitor the POST execution to check if any faults are reported.

3. Verify if the system's keyswitch is "On".

The system keyswitch on SPARC T5-x systems is virtual, not physical. To change the position of the virtual keyswitch, use the ILOM command -> set /SYS keyswitch_state=value. You can check the position of the virtual keyswitch via -> show /SYS keyswitch_state.

# The virtual keyswitch should be set to 'Normal' when the system is in normal operation.
# When the keyswitch is set to 'standby', this will disable the poweron command or button from operating.
# If the virtual keyswitch is set to 'diag', this will force the system to run servicemode diagnostics and you'll need to look at the console or terminal output to validate if POST is executing. The system should boot after POST completes.
# The 'locked' position of the keyswitch disables load -source (Firmware Update) and set /HOST/ send_break_action=break commands, but it doesn't affect the start /SYS command or power button.

4. Request the console messages (extended POST output when possible).
# If able to connect to the console, request the output with messages being displayed. Doc ID 1004222.1 explains how to setup console logging and gather diagnostic information.
# Each customer site is different, so they may have this port attached to a dumb terminal or a terminal concentrator.
# Try to validate if the system is executing Power On Self Test (POST). If the system is executing POST, the testing MUST complete before the system can be booted. Interrupting POST (to get faster to OK prompt) will cause the system to go to an undefined state. From the "ok" prompt, type "boot" and monitor the boot process
# The minimum firmware for proper OBP/POST diagnosibility on SPARC T5-x is SysFW 9.1.0.b and later (addresses C2C and PCIE root complex issues - see doc 1582207.1)
# To check the SysFW version on the server, use the following ILOM command: -> show /HOST sysfw_version

5. If you are able to get to OBP, try to type in "boot" and monitor the boot process.

# To troubleshoot known product and boot issues consult consult doc 1508713.1 (T5-2), doc 1513559.1 (T5-4/T5-8), doc 1513593.1 (SPARC T5-1B, Netra T5-1B)
# The /SP/policy PARALLEL_BOOT property, when enabled, allows the host to boot/poweron in parallel with the SP if an auto-power policy (HOST_AUTO_POWER_ON or HOST_LAST_POWER_STATE) is on or a user presses the power button while the SP is in the process of booting. ILOM has to be running in order to allow the host to power on when the power button is pressed or the auto-power policies are set.
# When this property is set to disabled, the SP boots first, then the host boots.

6. If the Service Processor is accessible, collect a Snapshot, as it will contain critical and valuable information to troubleshoot the failure. If you can't collect a snapshot, as a last resort get ILOM output from root user commands:

version,   show /SP/logs/event/list,    show faulty,    show -l all /,    show /HOST/console/history,     show /HOST/console/bootlog

7. If unsure how to proceed, or unable to perform the above process, collect as much information pertaining to the boot failure as possible (console logs, error messages, etc), call back in and request next available engineer.

NOTE: See also document: Troubleshooting data needed for T3-x, T4-x & T5-x servers (Doc ID 1470580.1)

OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Customer should verify system is stable for return to production.

CAUTION: If while on site the Field Engineer cannot solve the problem or requires additional assistance, update this SR requesting the SR be transferred to the next available Engineer, otherwise the SR may auto-close when the FE completes his site visit.

PARTS NOTE:
No parts required for this action plan. Parts may end up being required, but they are not part of this Action plan. Another Action Plan may be necessary.

REFERENCE INFORMATION:

Product Documentation: Service Manuals, Admin Manuals, Product Notes:

T5-2:  http://docs.oracle.com/cd/E28853_01/index.html
T5-4:  http://docs.oracle.com/cd/E29659_01/index.html
T5-8:  http://docs.oracle.com/cd/E35078_01/index.html
T5-1B: http://docs.oracle.com/cd/E35199_01/index.html
Netra T5-1B: http://docs.oracle.com/cd/E35777_01/index.html

References

<NOTE:1470580.1> - Troubleshooting data needed for T3-x, T4-x, T5-x, T7-x, & S7-x servers

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback