Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1269016.1
Update Date:2017-10-18
Keywords:

Solution Type  Problem Resolution Sure

Solution  1269016.1 :   Sun Fire V1280, E2900 and Netra 1280, 1290: DC-DC convertor voltage failure. Voltage ramp timed out after 500 msec.  


Related Items
  • Sun Netra 1290 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Netra 1280 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Exx00
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  


DC-DC convertor voltage failure

In this Document
Symptoms
Cause
Solution
References


Applies to:

Sun Fire E2900 Server - Version Not Applicable and later
Sun Fire V1280 Server - Version Not Applicable and later
Sun Netra 1280 Server - Version Not Applicable and later
Sun Netra 1290 Server - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

During recovery from an AC power outage, the following errors may be seen for a particular system board:

Tue Nov 16 11:12:28 qalw8-4sc lom: /SB0: RepeaterHpu.prepare: sun.serengeti.HpuFailedException: /SB0/bbcGroup0/cpuAB: 
CpuSafariGroup.setPower: DC-DC convertor voltage failure. Voltage ramp timed  out after 500 msec.
Safari Group being powered down
Tue Nov 16 11:12:28 qalw8-4sc lom: sun.serengeti.HpuFailedException:
/SB0: unable to prepare board due to SBBC group failure.


The key signature for this issue in the errors are:

  • Voltage ramp timed  out after 500 msec
  • DC-DC convertor voltage failure
  • Safari Group being powered down
  • unable to prepare board due to SBBC group failure.

Note that the word converter is misspelled in the symptom string, and that there is also an extra space between the words timed and out.

The same issue has been witnessed on Serengeti and Starcat.

For Serengeti, the messages look like the following:

Jun 17 13:17:21 sf6800-sc0 Platform.SC: [ID 268649 local0.error]
/partition0/domain1/SB4: RepeaterHpu.prepare: sun.serengeti.HpuFailedException:
/partition0/domain1/SB4/bbcGroup0/cpuAB: CpuSafariGroup.setPower: 
DC-DC convertor voltage failure. Voltage ramp timed  out after 500 msec. 
Safari Group being powered down
Jun 17 13:17:21 sf6800-sc0 Platform.SC: [ID 163292 local0.error]
/partition0/domain1/SB4: RepeaterHpu.prepare: sun.serengeti.HpuFailedException:
/partition0/domain1/SB4/bbcGroup1/cpuCD: CpuSafariGroup.setPower: 
DC-DC convertor voltage failure. Voltage ramp timed  out after 500 msec. 
Safari Group being powered down

Jun 17 13:17:21 sf6800-sc0 Domain-B.SC: [ID 139020 local0.error] 
sun.serengeti.HpuFailedException: /partition0/domain1/SB4: 
unable to prepare board due to SBBC group failure.
Jun 17 13:17:21 sf6800-sc0 Domain-B.SC: [ID 610448 local0.warning] Excluded unusable,
unlicensed, failed or disabled board: /N0/SB4

Cause

It is possible the system board is defective, but it is also quite likely that the issue is caused by a power sequencing bug that can be resolved following a specific power cycle process.  Before replacing the system board those actions should be attempted.

First action would be to upgrade the FW to the latest version.

See Firmware Download and update procedure for Sun Fire[TM] v1280, 3800, 4800, 4810, 6800, E2900, E4900, E6900, and Netra 1280, 1290 systems. (Doc ID 1006281.1)

 

If FW upgrade does not fix, please raise an SR to Oracle Support in order to have an engineer guide you through the necessary steps described in the internal section of this doc.

Final step would eventually be proceed with the HW replacement

 

 

NOTE:  There has been multiple attempts to mitigate power sequencing issues in FW.  Don't be surprised to find additional corner cases not covered by current FW. 

Implementing the steps described in this document before replacing the HW is a prudent action to take if this issue is encountered.

 

Solution

Previous workarounds for this type of power sequencing issue involved shutting down the system with poweroff and then unplugging all four AC power cords, waiting 30 seconds or so for all of the PSU capacitors to bleed out, and then plugging everything back in.

NOTE:  There has been multiple attempts to mitigate power sequencing issues in FW.  Don't be surprised to find additional corner cases not covered by current FW.

We should be able to accomplish the same thing with the following command sequence.  The key is to drop the +48V main bus to zero and let it sit for a period of time (say 30 seconds or so) before powering it back on.

Follow this procedure (specific to v1280, E2900, N1280, N1290):

lom> poweroff
lom> poweroff all
lom> poweron all
lom> showboards -ev
lom> showcomponet -v
lom> flashupdate ...       <-- this would be a good time to update the FW
lom> resetsc
lom> poweron

Notes:

  • This procedure is only documented for v1280, E2900, N1280, and N1290 where it has been tested.  The issue can take place on Serengeti/Starcat, but this procedure has not yet been tested (therefore is not documented).
  • The lom command poweron will power on everything that is off and then run POST and OBP.
  • The poweron all command just powers everything up without running POST and OBP.

If we have the same problems on the way up, we may have had a more serious fault during the power outage and will have to replace the board(s).  Replacement boards may come with updated components which require updated firmware.  If possible, please update your firmware to the latest revision.  You'll have to get the boards to poweron first, however.

Additional Information:
The system board voltage sequencing code has been enhanced a number of times to avoid the "Voltage ramp timed  out" problems.  The most recent fixes (for 6241193) are delivered in FW released in March and April 2006:  5.18.6, 5.19.5, and 5.20.0.

See the References section of this document for the list of potential related BUGs.

 

References

<BUG:4931946> -
<BUG:5041545> - RC: SCM09MS1: MAINTENANCE/PATCHING INTERNAL TRACKING
<BUG:4394549> - REFRESH OF PJMAINTH INSTANCE
<BUG:5062717> -
<BUG:5068597> - REMOVE VALUE FROM ACTION DROP DOWN FOR DATMART PROPERTIES PAGE
<BUG:6196909> - INCOMPLETE CALL STACK FOR LINUX 32-BIT WITH 2.6.9 KERNEL
<BUG:6241193> - V6.1.0C6B45 (SYS0). ERROR MESSAGE CONNXNS WITH MISMATCHED RATE CODES OR A/Z...

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback