Asset ID: |
1-75-1321710.1 |
Update Date: | 2015-12-02 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1321710.1
:
Sun Enterprise[TM] 10000: Troubleshooting Power Puck Issues
Related Items |
- Sun Enterprise 10000 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
- _Old GCS Categories>Sun Microsystems>Servers>High-End Servers
|
In this Document
Applies to:
Sun Enterprise 10000 Server - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
Purpose
This document provides way to identify and resolve domain unexpected outages caused by failing or failed power pucks on various boards in the E10000.
Troubleshooting Steps
Background and Manifestation of Domain Outages Caused by Power Puck Failures
A power puck is a DC-DC converter on a system board, centerplane support board, or control board. Persistent power puck failures result in an inability to power on the affected board. At times, the power puck will continue to provide enough voltage to power on the affected board, but lack enough voltage for cpus. Either type of power puck failures have caused domain outages.
Power puck failures on a board have caused domain Arbstop events with wfail signatures matching the following list. NOTE: The list should not be considered all-inclusive, and power puck issues are not the only cause of the observed Arbstops. Please use the recommendations in the next section to confirm the cause is a power puck failure on a board.
wfail reports Illegal Coherent condition/access proc 0
wfail reports Port 0 UPA fatal error
wfail reports Sysboard Request Parity Error Mask
wfail reports MC Timeout: waiting for data to match address
wfail reports MC Timeout: waiting for address to match data
wfail reports Port 0 unexpected foreign PIO queue p_reply received
Identifying and Confirming Power Puck Failures
Voltage issues caused by power puck failures are logged to the platform logs on the SSP. Look for messages like the below example.
From /var/opt/SUNWssp/adm/messages:
May 2 11:17:59 ssp procesvolt: Warning: Voltage readings have exceeded the thresholds on system board 4
May 2 11:17:59 ssp procesvolt: Voltage data for board 4, range trap: sysBrdStarfire3p3VDC.0 0.68 V
Running the power command with no options from the Main SSP will also confirm power puck failures on the various boards. Look for extremely low values in the columns marked with >>> in the below example.
ssp% power
Good 48V Bulk Power Supplies: 0 1 2 3 4 6 7
Number of Good 48V Bulk Power Supplies: 7 (N+1 redundancy ok)
Required 48V Power Supplies for 14 System Boards: 6
Number of Good Peripheral Cabinet Power Supplies: 0
Centerplane Support Board Average Voltages (V):
CSB# 5VDC Vcc HK 3.3VDC Vdd HK 3.3VDC Vdd Core
---- ----------------- ------------- -------------------------
0 4.988 5.022 3.373 3.296 3.295 3.292
1 5.017 4.998 3.502 >>> 1.079 1.079 1.080 <<<
System Board Average Voltages (V):
3.3VDC 5VDC 3.3VDC VDC 5VDC
SB# Vdd Vcc HK Vdd HK Vdd Core Vcc
--- ------- ------- -------- ---------- --------
0 3.295 4.976 3.381 1.904 5.005
1 3.295 4.993 3.417 1.902 4.995
2 3.300 5.000 3.407 1.903 4.998
3 3.300 5.000 3.402 1.895 4.998
4 3.301 5.030 3.395 >>> 0.681 <<< 5.005
5 3.297 4.978 3.419 1.904 5.005
6 3.300 5.015 3.409 1.908 4.998
8 3.306 5.008 3.417 1.904 5.003
9 3.297 5.008 3.417 1.906 5.005
10 3.304 4.993 3.417 1.902 4.993
11 3.297 4.978 3.418 1.906 4.995
12 3.293 5.008 3.416 1.903 4.998
14 3.298 4.915 3.406 1.904 4.995
15 3.293 5.015 3.417 1.909 4.993
Control Board Average Voltages (V):
5VDC 5VDC 3.3VDC 5VDC Vcc 5VDC
CB# Vcc Vcc HK Vdd HK Peripheral Vcc Fans
--- -------- -------- --------- ---------- ---------
0 5.039 5.071 3.427 5.105 5.348
1 5.089 5.049 3.425 5.125 5.348
Resolution
Replace the board with the confirmed failed or failing power puck.
Attachments
This solution has no attachment