Asset ID: |
1-72-2348450.1 |
Update Date: | 2018-01-15 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2348450.1
:
SPARC T4-2 server doesn't recognize it's Memory Risers and DIMM's
Related Categories |
- PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
|
A SPARC T4-2 system that has 4 Memory Riser on it, went down and it was not possible to configure any MCU on POST due to no Memory Riser or DIMM was accessible, system wasn't able to pass POST.
When listing the FRU information from ILOM system was not displaying any Memory Riser on the components list of ILOM Snapshot or ILOM command:
-> show -o table -level all /SYS fru_part_number
In this Document
Created from <SR 3-16637006971>
Applies to:
SPARC T4-2 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Other Facts:
FRU id corrupt FMA message ILOM-8000-2V was found on ILOM after reseating all of it's Memory DIMM's and Risers
Symptoms
Customer reported: system poweroff without manual intervention
From POST output:
Serial console started. To stop, type #.
2018-01-12 16:42:17 0:0:0> NOTICE: Initializing TPM with:
tpm_enable = false
tpm_activate = false
tpm_forceclear = false
2018-01-12 16:42:17 0:0:0> NOTICE: TPM found: Ver 1.2, Rev 1.2, SpecLevel 2, errataRev 0, VendorId 'IFX'
2018-01-12 16:42:19 0:0:0> NOTICE: TPM initialized successfully. Current state is: disabled
2018-01-12 16:42:19 0:0:0> NOTICE: Serial#: 000000000000002a.0159ccc07d224154
2018-01-12 16:42:19 1:0:0> NOTICE: Serial#: 000000000000002a.0159ccc07d2241a6
2018-01-12 16:42:19 0:0:0> NOTICE: Version: 003e003012030607
2018-01-12 16:42:19 1:0:0> NOTICE: Version: 003e003012030607
2018-01-12 16:42:19 0:0:0> NOTICE: T4 Revision: 1.2
2018-01-12 16:42:19 1:0:0> NOTICE: T4 Revision: 1.2
2018-01-12 16:42:19 0:0:0> ERROR: Can't read BoB device type from FRUID, disabling MCUs
2018-01-12 16:42:19 0:0:0> ERROR: Can't read BoB device type from FRUID, disabling MCUs
2018-01-12 16:42:19 0:0:0> ERROR: Can't read BoB device type from FRUID, disabling MCUs
2018-01-12 16:42:20 0:0:0> ERROR: Can't read BoB device type from FRUID, disabling MCUs
2018-01-12 16:42:23 1:0:0> NOTICE: /SYS/MB/CMP1/MCU0 is disabled
2018-01-12 16:42:23 0:0:0> NOTICE: /SYS/MB/CMP0/MCU0 is disabled
2018-01-12 16:42:23 1:0:0> NOTICE: /SYS/MB/CMP1/MCU1 is disabled
2018-01-12 16:42:23 0:0:0> NOTICE: /SYS/MB/CMP0/MCU1 is disabled
2018-01-12 16:42:23 1:0:0> ERROR: Not all MCUs enabled. Unsupported Config.
2018-01-12 16:42:23 0:0:0> ERROR: Please refer to the service documentation for supported memory configurations.
2018-01-12 16:42:23 1:0:0> NOTICE: /SYS/MB/CMP1/MCU0 is disabled
2018-01-12 16:42:23 0:0:0> ERROR: Not all MCUs enabled. Unsupported Config.
2018-01-12 16:42:23 1:0:0> NOTICE: /SYS/MB/CMP1/MCU1 is disabled
2018-01-12 16:42:23 0:0:0> NOTICE: /SYS/MB/CMP0/MCU0 is disabled
2018-01-12 16:42:23 0:0:0> NOTICE: /SYS/MB/
===================
When checking the system there was no Memory Riser found or any FRU id information:
-> show -o table -level all /SYS fru_part_number
Target | Property | Value
------------------------------------------+--------------------------------------------------+-------------------------------------------------------------------------
/SYS/FANBD | fru_part_number | 7051522
/SYS/MB | fru_part_number | 7049060
/SYS/MB/SP | fru_part_number | 7054434
/SYS/MB_ENV | fru_part_number | 7024515
/SYS/PS0 | fru_part_number | 7048278
/SYS/PS1 | fru_part_number | 7048278
/SYS/SASBP | fru_part_number | 511-1246-04
Changes
No changes were done on the system, this was an unexpected outage.
Cause
After troubleshooting there's a failing Memory Riser that prevented the rest of the Memory Risers and DIMM's to appear on the system configuration
Following is the output after populating all memory riser board except suspected one
-> show -o table -level all /SYS fru_part_number
Target | Property | Value
------------------------------------------+--------------------------------------------------+-------------------------------------------------------------------------
/SYS/FANBD | fru_part_number | 7051522
/SYS/MB | fru_part_number | 7049060
/SYS/MB/CMP0/MR0 | fru_part_number | 7051516
/SYS/MB/CMP0/MR0/BOB0/CH0/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR0/BOB0/CH1/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR0/BOB1/CH0/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR0/BOB1/CH1/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR1 | fru_part_number | 7051516
/SYS/MB/CMP0/MR1/BOB0/CH0/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR1/BOB0/CH1/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR1/BOB1/CH0/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP0/MR1/BOB1/CH1/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP1/MR0 | fru_part_number | 7051516
/SYS/MB/CMP1/MR0/BOB0/CH0/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP1/MR0/BOB0/CH1/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP1/MR0/BOB1/CH0/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/CMP1/MR0/BOB1/CH1/D0 | fru_part_number | 7014642,M393B5273CH0-YH9
/SYS/MB/SP | fru_part_number | 7054434
/SYS/MB_ENV | fru_part_number | 7024515
/SYS/PS0 | fru_part_number | 7048278
/SYS/PS1 | fru_part_number | 7048278
/SYS/SASBP | fru_part_number | 511-1246-04
populated the suspected memory riser board on 0 and rest of the slots with good memory riser board
-> show -o table -level all /SYS fru_part_number
Target | Property | Value
------------------------------------------+--------------------------------------------------+-------------------------------------------------------------------------
/SYS/FANBD | fru_part_number | 7051522
/SYS/MB | fru_part_number | 7049060
/SYS/MB/SP | fru_part_number | 7054434
/SYS/MB_ENV | fru_part_number | 7024515
/SYS/PS0 | fru_part_number | 7048278
/SYS/PS1 | fru_part_number | 7048278
/SYS/SASBP | fru_part_number | 511-1246-04
Solution
Troubleshooting action plan:
1. To remove all power cords
2. To install CMP0/MEM0 only
3. To re-plug power cords
4. To gather output from ILOM:
show -o table -level all /SYS fru_part_number
5. To verify if the Memory Riser and it's Memory DIMM's are on the list
If the DIMMs are displayed:
Repeat with
CMP0/MR0 + CMP0/MR1
CMP0/MR0 + CMP0/MR1 + CMP1/MR0
Until all good risers are installed and all it's memory DIMM's are seen.
NOTE: If the DIMM's and Riser does not appear, put the bad Riser on a side and test the next on configuration.
====================
The result of this troubleshooting action plan was to determine and replace the defective memory riser.
References
<NOTE:1415583.1> - How to Remove and Replace a SPARC T4-2 / Netra T4-2 Memory Risers and DIMMS:ATR:1415583.1:0
Attachments
This solution has no attachment