Asset ID: |
1-72-2285584.1 |
Update Date: | 2018-05-22 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2285584.1
:
Memory Riser seeing SPT-8000-DH faults on SPARC TX-2 could cause all other memory risers and motherboard to fail the same way
Related Items |
- SPARC T4-2
- SPARC T8-2
- SPARC T7-2
- SPARC T5-2
- SPARC T3-2
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T7
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Complicated onsite diagnosis required by a field engineer
Created from <SR 3-15262781935>
Applies to:
SPARC T7-2 - Version All Versions and later
SPARC T8-2 - Version All Versions and later
SPARC T5-2 - Version All Versions and later
SPARC T4-2 - Version All Versions and later
SPARC T3-2 - Version All Versions and later
Information in this document applies to any platform.
Symptoms
When powering on the host, motherboard and all 8 memory risers faulted with SPT-8000-DH:
##### fma/@usr@local@bin@fmdump_-v.out #####
2017-07-04/05:48:10 f961b866-4100-6f0b-b3b9-b6a10e9dc4a9 SPT-8000-DH
FRU = /SYS/MB
2017-07-04/05:48:10 6fff9e8a-26e2-c37a-dbe4-8d0eb4faf54a SPT-8000-DH
FRU = /SYS/MB/CM0/CMP/MR0
2017-07-04/05:48:11 c8720c77-dd8a-6464-cc01-f615e99191c8 SPT-8000-DH
FRU = /SYS/MB/CM0/CMP/MR1
2017-07-04/05:48:11 c66216d4-5cc4-c095-ffcd-e994078ea35d SPT-8000-DH
FRU = /SYS/MB/CM0/CMP/MR2
2017-07-04/05:48:11 8fea7f45-13da-e637-f25e-ef09c37ac7bb SPT-8000-DH
FRU = /SYS/MB/CM0/CMP/MR3
2017-07-04/05:48:11 ddb26bb2-2997-6ee6-9d04-b22f1b2e11a1 SPT-8000-DH
FRU = /SYS/MB/CM1/CMP/MR0
2017-07-04/05:48:12 94ba3116-9356-4238-a9ec-ba2675fd38d9 SPT-8000-DH
FRU = /SYS/MB/CM1/CMP/MR1
2017-07-04/05:48:12 0a10561f-0b27-4037-f2be-9dfc05ede708 SPT-8000-DH
FRU = /SYS/MB/CM1/CMP/MR2
2017-07-04/05:48:13 e162e2c7-7d5d-4da1-9dbc-e589533271e9 SPT-8000-DH
FRU = /SYS/MB/CM1/CMP/MR3
2017-07-04/06:23:32 25127b96-20af-68bb-8ecf-8b3ce5eb2f8c SPT-8000-DH
FRU = /SYS/MB
From output of 'fmdump -ev', there are POK events against motherboard and all 8 memory risers:
2017-07-04/05:47:48 ereport.chassis.pok.fail-asserted@/SYS/MB
2017-07-04/05:47:48 ereport.chassis.pok.fail-asserted@/SYS/MB/CM0/CMP/MR0
2017-07-04/05:47:48 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR0
2017-07-04/05:47:49 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR0
2017-07-04/05:47:50 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR0
2017-07-04/05:47:50 ereport.chassis.pok.fail-asserted@/SYS/MB/CM0/CMP/MR1
2017-07-04/05:47:51 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR1
2017-07-04/05:47:52 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR1
2017-07-04/05:47:52 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR1
2017-07-04/05:47:53 ereport.chassis.pok.fail-asserted@/SYS/MB/CM0/CMP/MR2
2017-07-04/05:47:53 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR2
2017-07-04/05:47:54 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR2
2017-07-04/05:47:55 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR2
2017-07-04/05:47:55 ereport.chassis.pok.fail-asserted@/SYS/MB/CM0/CMP/MR3
2017-07-04/05:47:55 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR3
2017-07-04/05:47:56 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR3
2017-07-04/05:47:57 ereport.chassis.pok.fault-info@/SYS/MB/CM0/CMP/MR3
2017-07-04/05:47:58 ereport.chassis.pok.fail-asserted@/SYS/MB/CM1/CMP/MR0
2017-07-04/05:47:58 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR0
2017-07-04/05:47:59 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR0
2017-07-04/05:48:00 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR0
2017-07-04/05:48:00 ereport.chassis.pok.fail-asserted@/SYS/MB/CM1/CMP/MR1
2017-07-04/05:48:00 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR1
2017-07-04/05:48:01 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR1
2017-07-04/05:48:02 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR1
2017-07-04/05:48:02 ereport.chassis.pok.fail-asserted@/SYS/MB/CM1/CMP/MR2
2017-07-04/05:48:02 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR2
2017-07-04/05:48:03 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR2
2017-07-04/05:48:04 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR2
2017-07-04/05:48:04 ereport.chassis.pok.fail-asserted@/SYS/MB/CM1/CMP/MR3
2017-07-04/05:48:05 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR3
2017-07-04/05:48:06 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR3
2017-07-04/05:48:07 ereport.chassis.pok.fault-info@/SYS/MB/CM1/CMP/MR3
2017-07-04/05:48:09 ereport.chassis.pok.fail-asserted@/SYS/MB
2017-07-04/05:48:09 ereport.chassis.pok.fail-asserted@/SYS/MB
The above pok events are triggered first by the "/SYS/MB/3V3_DC_POK_FLT":
2017-07-04/05:47:48 ereport.chassis.pok.fail-asserted@/SYS/MB
detector = /SYS/MB/3V3_DC_POK_FLT
hidden = true
and then the list of the 8 memory riser pok faults:
2017-07-04/05:47:48 ereport.chassis.pok.fail-asserted@/SYS/MB/CM0/CMP/MR0
detector = /SYS/MB/CM0/CMP/MR0/DC_POK_FLT
hidden = true
.
.
Changes
Cause
3.3V rail is common to the motherboard and all 8 risers, if you have a fault on one riser, then it is inevitable to get faults on all 8 memory risers, though this doesn't necessarily help with definitive RCA, but it does indicate one of the memory risers, or more is likely the problem.
Solution
The following action can be taken to isolate a failed memory riser. Note, while not a supported configuration, it is possible to power on the server with only 4 risers installed.
Always verify there are no AC input, or power related issues first.
- Power off and remove ac power cords.
- Remove 4 risers, leaving only CM0 slots populated.
- Reconnect power cords and login to ILOM
- Acquit the faults reported against the missing risers: fmadm acquit UUID
- Power on the server: ->start /SYS
- If the server still fails to power on, repeat steps 1-5 using the other group of risers.
- Repeat as needed until failed risers are identified.
Attachments
This solution has no attachment