Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1582330.1
Update Date:2018-02-15
Keywords:

Solution Type  Technical Instruction Sure

Solution  1582330.1 :   Guidance for POST Diagnostic Level Setting on Sun Fire[TM] 280R, V480, V490, V880, V890 and V880z servers.  


Related Items
  • Sun Fire V880z Visualization Server
  •  
  • Sun Fire V880 Server
  •  
  • Sun Fire V890 Server
  •  
  • Sun Fire 280R Server
  •  
  • Sun Fire V480 Server
  •  
  • Sun Fire V490 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-Vx80
  •  




In this Document
Goal
Solution
References


Applies to:

Sun Fire V880 Server - Version All Versions and later
Sun Fire 280R Server - Version All Versions and later
Sun Fire V890 Server - Version All Versions and later
Sun Fire V480 Server - Version All Versions and later
Sun Fire V880z Visualization Server - Version All Versions and later
Information in this document applies to any platform.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community, SPARC Legacy Servers.


Goal

 The purpose of this document is to provide guidance and information on post diag-level settings for Sun Fire[TM] 280R, V480, V490, V880, V890 and V880z servers.

Solution

 Power on Self Test (POST) performs initial testing of hardware before booting the operating system.  The Sun Fire[TM] 280R, V480, V490, V880, V890 and V880z servers are capable of providing several levels of testing:

 

Post diag-level settingTesting Performed
off
  • No post testing occurs
min
  • Complete POST CPU and their respective cache testing occurs
  • Complete POST Centerplane testing occurs
  • Complete POST I/O loopback testing
  • Memory (DIMM) address/data path testing occurs
max
  • All min mode testing plus
  • IO Controller(IOC) is tested from all CPUs
  • Complete DIMM Memory cell testing ( Block Memory test)

 

 Change Request to have max post ignore a certain number of CEs:

Bug 18083388 - Vx80/Vx90 Post needs to have max tolerate ce's like serengeti/lw8/starcat

 

 


min is the factory default setting.

Over the course of the product lifecycle there were several DIMM issues which prompted the suggestion that max be made the suggested POST level to provide additional DIMM block testing.

These DIMM issues were seen early in the product life cycle before Solaris could adequately react to a reasonable number of Correctable Errors (CEs).  Modern versions of Solaris react to questionable bits in memory by retireing the page in memory that uses that bit.   When thresholds for the number of retired pages are met, Solaris will diagnose the DIMM as faulty. See the document in the reference section for more information on those thresholds.

Page retirement is the default behavior in Solaris 10, and was implemented through Kernel Jumbo Patches(KJP) in Solaris 8 and 9.  With most systems now running a level of Kernel Jumbo Patch which supports Memory Page Retirement, Systems TSC at Oracle is now suggesting that the POST level be decreased back to the originally intended setting of min.

For more about Kernel Jumbo Patch levels where Memory Page Retirement is enabled see:

  Solaris[TM] 8/9: When Is Page Retirement Enabled? (Doc ID 1009256.1)


POST running at max is causing an unnecessarily high number of DIMM replacements to occur.  These replacements are costly in terms of downtime, and exposes the system to the risk that the replacement DIMM is DOA, or may experience an early life failure causing an unplanned outage.  When DIMMs with CEs are returned the DIMMs frequently are found to be No Trouble Found (NTF) when shipped back to Oracle repair facilities and tested.  Correctable Errors are often transient, and do not recur when removed from one system and installed in another.  They often do not even recur when retested in the same system a second time..

Action:
If system is running a version of KJP that supports Memory Page retirement diag-level should be set to min

Setting it through Solaris

# eeprom diag-level=min


Setting it at OBP

ok setenv diag-level min
ok reset-all


max post should only be used when IOC issues are being seen, or  Solaris is unable to properly diagnose an Unrecoverable Error (UE) in memory, or Fatal resets. 

 

A power cycle may be necessary for the changes to take effect. See:

Customize executing POST and system diagnostics settings (Doc ID 1001798.1)

Note: Individual PCI IO Cards are not covered by either min or max POST.

References

<NOTE:1009426.1> - Solaris[TM] Operating System: Page retirement limits
<NOTE:1001798.1> - Customize executing POST and system diagnostics settings

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback