Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1005247.1
Update Date:2017-02-02
Keywords:

Solution Type  Technical Instruction Sure

Solution  1005247.1 :   Proper usage of post-tolerate-ce and mpr-support-enable on Midrange Servers  


Related Items
  • Sun Fire 4810 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Netra 1290 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire E4900 Server
  •  
  • Sun Netra 1280 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-x8x0/Ex900
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange V and Netra Servers
  •  

PreviouslyPublishedAs
207346


Applies to:

Sun Fire E4900 Server - Version Not Applicable and later
Sun Fire E6900 Server - Version Not Applicable and later
Sun Fire 3800 Server - Version Not Applicable and later
Sun Fire 4800 Server - Version Not Applicable and later
Sun Fire 4810 Server - Version Not Applicable and later
All Platforms

Goal

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in an appropriate
My Oracle Support Community - Oracle Sun Technologies Community.

 

Both post-tolerate-ce and mpr-support-enable perform similar functions. It is important to understand what each does.

 

Solution


Proper usage of post-tolerate-ce and mpr-support-enable on Midrange Servers

With current versions of Solaris[TM], it is no longer necessary for POST to disable memory with a small number of correctable errors because after booting, the operating system can retire individual pages that have correctable errors using Memory Page Retirement (MPR). MPR is useful because it is not possible for POST to disable a single page or an individual DIMM. POST must disable an entire bank, for what may be a single correctable event.

See <Document: 1009256.1> for which versions of Solaris 8 and 9 support MPR.

To support MPR, several domain boot parameters have been added to POST.

They are set with the setupsc command on the Sun Fire[TM] 2900, and V1280 and with the setupdomain command for the Sun Fire 3800/4800/4810/6800 E4900/E6900.

These boot parameters are tolerate_mem_ce and mpr_support_enable.

 

tolerate_mem_ce  = Tolerate all CEs. Let the OS retire the pages.
* Use when OS supports MPR AND ScApp < 5.20.0
mpr_support_enable = Tolerate CE's to a point defined by a threshold and blocksize.
This will cause a DIMM with CEs that exceed the threshold to be marked
as bad by POST. This threshold is defined internally by POST.
* Use when OS supports MPR AND ScApp >= 5.20.0
** Default behavior for 5.20.1 and higher
Both False = Old behavior. POST will disable a complete bank for a CE.
* Use when OS does not support MPR.
** Default behavior prior to 5.20.1
Both True = Should never be set this way.
This restriction is enforced in firmware for V1290/E2900/Netra1280/Netra1290
in 5.20.4, and for the rest of the Midrange Servers in 5.20.3. In releases
where setting both these variables to true is possible, the system will
function as if tolerate_mem_ce was true and mpr_support_enable was false.
These messages in the logs will alert the user to the dual variable setting:

lom>showlogs
...snip...
Wed Mar 09 18:55:49 e2900-sca11-a-ssc1 lom: [ID 869502 local0.warning] WARNING: Both tolerate_mem_ce and mpr_support_enable were set in nvci. Falling back to tolerate_mem_ce behavior.
Wed Mar 09 18:55:49 e2900-sca11-a-ssc1 lom: [ID 869502 local0.warning] WARNING: Both tolerate_mem_ce and mpr_support_enable were set in nvci. Falling back to tolerate_mem_ce behavior.
...snip...

/var/adm/messages
...snip...
Mar 9 19:23:49 e2900-sca11-a lw8: [ID 943475 kern.warning] 3/9/16 6:55:49 PM WARNING: Both tolerate_mem_ce and mpr_support_enable were set in nvci. Falling back to tolerate_mem_ce behavior.^M
...snip...

 

 

tolerate_mem_ce was introduced in ScApp 5.19.0, and was enhanced in 5.20.0 with mpr_support_enable. mpr_support_enable is better than tolerate_mem_ce because it implements thresholds to the number of correctable errors that POST will allow. If the threshold is met, POST will give up and fail the bank. With tolerate_mem_ce POST will ignore CEs and continue testing, which may impact POST execution time.

mpr_support_enable is the default behavior for 5.20.1 and higher. If upgrading to 5.20.1 or higher verify that MPR is enabled in Solaris. If a version of Solaris without MPR is installed, both tolerate_mem_ce and mpr_post_enable should be set to false for that domain.

References:
<Document: 1009256.1> Solaris[TM] 8/9: When Is Page Retirement Enabled?
<Document: 1010905.1> Sun Enhanced Memory DIMM Replacement Policy

 

 

BUG 15324402 - SUNBT6411108  mpr support should be enabled by default on serengeti

Bug 15359412 : SUNBT6489696 PREVENT USER FROM SETTING BOTH TOLERATE_MEM_CE AND MPR_SUPPORT_ENAB

Bug 15311980 : SUNBT6383689 PREVENT USER FROM SETTING 'POST-TOLERATE-CE = TRUE' AND 'MPR-SUPPOR

Bug 15290057 - SUNBT6332032 Serengeti SCAPP should support Solaris Memory Page Retirement

 

The default block size and threshold can be changed, but should not be done without a L2 Escalation. They are not user tunable.

 

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback