Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2273241.1
Update Date:2017-06-08
Keywords:

Solution Type  Problem Resolution Sure

Solution  2273241.1 :   SPARC T5-4 stuck on Reconfiguring System (Hostconfig) after passing POST  


Related Items
  • SPARC T3-4
  •  
  • SPARC T5-4
  •  
  • SPARC T4-4
  •  
  • SPARC T5-8
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5
  •  
  • Microlearning>Text>ML-TXT-Document
  •  


After a firmware upgrade to version 9.6.8.b the SPARC T5-4 server got stuck on Reconfiguring System (Hostconfig) after passing max POST successfully and Initializing Memory stage.

System firmware was downgraded to 9.5.1.b version (previous firmware revision) without allowing the system to get to OpenBoot Prompt getting stuck on the same stage.

Created from <SR 3-15033301862>

Applies to:

SPARC T5-4 - Version All Versions to All Versions [Release All Releases]
SPARC T5-8 - Version All Versions to All Versions [Release All Releases]
SPARC T4-4 - Version All Versions to All Versions [Release All Releases]
SPARC T3-4 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
SPARC

Symptoms

SPARC T5-4 server got stuck on "Reconfiguring System" after having a max POST and passing successfully from Initializing Memory stage. After 40 minutes the system goes off.

The following example messages were found within the ILOM Snapshot each time the system got stuck on "Reconfiguring System" stage:

 

On /ilom/@persist@host_logs@hostconsole.log

2017-06-01 20:29:42.163 0:0:0>Setup CPU MBISI
2017-06-01 20:29:42.507 0:0:0>POST Phase Complete
2017-06-01 20:29:42.517 0:0:0>POST Exit reason = 0
2017-06-01 20:29:42.526 0:0:0>Board Phase runtime: 00:13:22, Total CPUs 512, Total DRAM 1024 GB
2017-06-01 20:29:42.543 0:0:0>End of POST
2017-06-01 20:29:48 2:00:0> NOTICE: SPARC-T5 Revision 1.2 Speed 3600MHz
2017-06-01 20:29:48 3:00:0> NOTICE: SPARC-T5 Revision 1.2 Speed 3600MHz
2017-06-01 20:29:48 0:00:0> NOTICE: SPARC-T5 Revision 1.2 Speed 3600MHz
2017-06-01 20:29:48 1:00:0> NOTICE: SPARC-T5 Revision 1.2 Speed 3600MHz
2017-06-01 20:30:06 0:00:0> NOTICE: Initializing Memory
2017-06-01 20:30:47 0:00:0> NOTICE: Reconfiguring System <------------ System sits there

 

Changes

There was a firmware update activity to version 9.6.8.b

=====================

Troubleshooting done without good results:
===========================

1) Customer upgraded the FW to the latest level.

2) Host didn't finish the POST. Got hung after "Reconfiguring System" . Customer waited for more than 40min and no success.

3) Removed all the power cables, waited 1min and reinstalled. The same problem.

4) Tried to downgrade the FW to the original level 9.5.1.b and the same problem.

5) Tried to remove the Power Cables and Fiber Cables. Reinstalled only the Power cables and the same problem.

6) Set the bootmode to factory default. The same problem.

7) Reset the SP to factory default and the same problem.
===========================

Cause

Hostconfig is in charge to to present the current configuration to the OpenBoot Prom (primary host domain) from POST.

Hostconfig and POST are loaded into memory on the first available CPU Module, and by switching the configuration (PM0>PM1) we change the physical memory location used by both Hostconfig and POST.

This allowed the system to initialise correctly, and subsequent testing identified a fault on CM0 as the probable root cause of the hang.

Solution

Action plan that worked:

1 Update the system firmware again using patch 25790122: FIRMWARE: SPARC T5-4+T5-8 SUN SYSTEM FIRMWARE 9.6.8.B

2. Switch PM0 with PM1

3. Start the system and run new POST to verify passed from Reconfiguring System:

-> start /SYS

-> start /SP/console

 ===================

 

NOTE: In a single Processor Module configuration, try swapping DIMM's on first DIMM slots for each BOB/CH0 (for setting minimal configuration please check Service Manual of the server).

 

A fault was found on Processor Module 1 after system was able to pass POST and pass Reconfiguring System stage:

faultmgmtsp> fmadm faulty

------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2017-06-02/12:18:15 c85976f2-9fd6-e725-abcb-8e300a505572 SPT-8001-XC Critical

Problem Status : open
Diag Engine : fdd 1.0
System
Manufacturer : Oracle Corporation
Name : SPARC T5-4
Part_Number : 31930909+7+1
Serial_Number : AK00117917

System Component
Firmware_Manufacturer : Oracle Corporation
Firmware_Version : (ILOM)3.2.8.1.a,(POST)5.3.7,(OBP)4.38.7,(HV)1.15.7
Firmware_Release : (ILOM)2016.12.08,(POST)2016.11.30,(OBP)2016.11.30,(HV)2016.11.30

----------------------------------------
Suspect 1 of 1
Problem class : fault.chassis.voltage.isolated
Certainty : 100%
Affects : /SYS/PM1
Status : faulted

FRU
Status : faulty
Location : /SYS/PM1
Manufacturer : Oracle Corporation
Name : TLA,PM,T5-4,T5-8
Part_Number : 7056873
Revision : 08
Serial_Number : 465769T+13248803JH
Chassis
Manufacturer : Oracle Corporation
Name : SPARC T5-4
Part_Number : 31930909+7+1
Serial_Number : AK00117917
Resource
Location : /SYS/PM1/CM0

Description : A power supply has failed to maintain a good POK (Power On
OK) condition.

Response : The system will shutdown in a non-graceful fashion.

Impact : The platform will restart with the affected component
deconfigured.

Action : Please refer to the associated reference document at
http://support.oracle.com/msg/SPT-8001-XC for the latest
service procedures and policies regarding this diagnosis.

 

Processor Module 1 was replaced and system came up clean allowing all LDOM's to work without faults.

 

 

References

<NOTE:1527635.1> - How to Replace a SPARC T5-4 or T5-8 Processor Module:ATR:1527635.1:0
<NOTE:1527507.1> - How to Remove and Replace a SPARC T5-4 Server Service Processor:ATR:1527507.1:0
<NOTE:1999520.1> - Disabled PCIe devices due to stale faultDB and deconfigDB entries w/SysFW 9.3.0.x

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback