![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type FAB (standard) Sure Solution 1463634.1 : FAB: Standard: Reactive: A small population of T4-4 systems may experience C2C FMA SUN4V-8002-KQ Faults that can be repaired with updated coherency link tuning values.
Affected X-Options: 7101695 - Processor Module, 3.0GHz, T4-4 Affected Parts: 7019789 - FRU, Processor Module Assy, UltraSPARC T4, 8-Core 3.0GHz (7015550) In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:SPARC T4-4 - Version Not Applicable to Not Applicable [Release N/A]Information in this document applies to any platform. __________ Affected X-Options: 7101695 - Processor Module, 3.0GHz, T4-4 Affected Parts: 7019789 - FRU, Processor Module Assy, UltraSPARC T4, 8-Core 3.0GHz (7015550) SymptomsA small population of systems shipped may experience an elevated rate of chip-to-chip (C2C) link replays that will trigger an FMA SUN4V-8002-KQ fault. The fault can be seen by executing "fmadm faulty", where the resulting fault will appear as sample that follows: --------------- ------------------------------------ -------------- --------- Host : ssccn4-m1 Action : Use 'fmadm faulty' to provide a more detailed view of this event. ------------------------------------- end of fault.cpu.generic-sparc.c2c report --------------------------------- The above is a system generated report and references an outdated/dead URL. For more information on this subject refer to the following internal only link; https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1452064.1 NOTE: The presence of ereports for C2C replays is normal and an expected part of normal system operation. Ereports for C2C replays will appear as follows: ereport.cpu.generic-sparc.c2c-link The existence of the above c2c ereports does not indicate improper or unexpected system operation. FMA will assess the rate of C2C replays and post a SUN4V-8002-KQ fault as noted above should the rate of replays become excessive. Impact The system will remain operational. C2C replays are link retries that are successful and therefore pose no issue with data integrity. The elevated level of replays that results in FMA SUN4V-8002-KQ fault only indicates a link degraded in performance to a level that we do not normally expect in a properly running system, and not an actual failure. FMA SUN4V-8002-KQ fault is triggered in an attempt at pre-emptive hardware failure detection. In systems with sub-optimal tuning, the issue is not related to the hardware actually degrading, but one of marginal link tuning. The Sun_SPARC_T4-4_PM_E0010556.pkg will install tuning parameters that are optimally tuned. ChangesContributing Factors This issue only impacts part number 7019789. Part numbers 7048833 and 7051795 already have the updated link tuning so this FAB does not apply to those two part numbers. This issue is not specific to any particular configuration. The rate of C2C replays may vary with the system configuration (2P vs 4P) and from power cycle to power cycle. In addition, all system Processor Module (PM) replacements done for any reason, also require the Sun_SPARC_T4-4_PM_E0010556.pkg to be applied in order to ensure that all processor modules installed have the new tuning values. This is particularly important for 4P systems with two PM modules to ensure that the tuning values for links that run between both processor modules (PM) are identical. CauseThe root cause of this fault stems from non-optimal link tuning values originally set that did not allow the C2C link circuitry to make the needed dynamic adjustments as material characteristics changed across production lots. The tuning parameters originally programmed proved to be sub-optimal to handle component variation within the expected design margins, stressing the dynamic tuning capabilities of the hardware, resulting in an increase of C2C replays. The new link tuning values offer more margin that will allow operation across the entire process environment as was originally intended. SolutionWorkaround No workaround available - see Resolution section below.
Installation of the Sun_SPARC_T4-4_PM_E0010556.pkg will rectify the C2C FMA SUN4V-8002-KQ fault resolution. Oracle recommends installing the updated link tuning package only on systems that experience SUN4V-8002-KQ faults. Proactive updating of link tuning settings on systems that have not experienced SUN4V-8002-KQ faults does not bring any benefits and is not necessary. If customer is willing to apply link tuning package on their own for affected systems without Oracle FE onsite, SR owner can work with the customer and provide appropriate guidance. Below are STEP-BY-STEP instructions for applying the patch which is available in Reference DocID 1452064.1 via the below URL; https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1452064.1 Note: Check ILOM snapshot for: $ grep E0010556 ilom/@usr@local@bin@featurecheck_-show_features.out STEP #1: (Applying Sun_SPARC_T4-4_PM_E0010556.pkg) The Sun_SPARC_T4-4_PM_E0010556.pkg is applied as follows: a) Transfer patch to a local FTP or HTTP server. The package can be downloaded from b) login into the Service Processor via ILOM cli c) The host must be powered off to apply the patch. (From ILOM cli : stop /SYS) d) Load the patch using the ILOM cli "load command". From ILOM cli: load -source tftp://localFTPserver/Sun_SPARC_T4-4_PM_E0010556.pkg - or - e) The load command will automatically restart/reboot the service processor(ILOM) with Once a PM module has been updated, the presence of the patch can be checked as follows: -> show /SP/logs/event/list NOTE: The above is an example of the ILOM event log output where 2 PMs were updated with STEP #3: (Clear all prior FMA SUN4V-8002-KQ Faults) Once the Sun_SPARC_T4-4_PM_E0010556.pkg has been successfully loaded and verified, then Clear any existing SUN4V-8002-KQ faults from the OS level. Use fmadm to obtain the uuid of any SUN4V-8002-KQ faults. Using ILOM, clear any faults on PM0 and PM1: -> set /SYS/PM1 clear_fault_action=true NOTE 1: Applying the Sun_SPARC_T4-4_PM_E0010556.pkg will resolve C2C FMA SUN4V-8002-KQ However, it is possible that C2C FMA SUN4V-8002-KQ faults are due to degraded hardware in which case they will not be remedied by Sun_SPARC_T4-4_PM_E0010556.pkg. In a situation where either the tuning package has already been applied, or the the Processor Module(s) is new part number with correct tuning applied (see PNs above), following actions should be taken: 2) If problem persists, carefully inspect and reseat Processor Modules If the SUN4V-8002-KQ fault persists after performing above steps, then normal hardware debug process should be followed to correctly identify and replace the faulted PM. Mention in the SR that the above actions were taken and HW replacement is now planned.
Identification of Affected Parts (how to) All T4-4 Processor Modules with the following Part Numbers that are experiencing C2C FMA SUN4V-8002-KQ faults require the Sun_SPARC_T4-4_PM_E0010556.pkg to be applied:
A Processor Module part number can be identified by typing the following at the ILOM prompt: -> show /SYS/PM0 fru_part_number /SYS/PM0 Note: PM modules with later production part numbers will NOT have Sun_SPARC_T4-4_PM_E0010556.pkg entries in the ILOM event log output as they were shipped from the factory with the new link training tuning settings, and hence do not require updating. Processor Modules with new link training tuning settings: 7051795 NOTE: All PMs provided by the RSL will have the updated parameters and do not require the tuning package to be installed, including part number 7019789. Please note PMs may show PN in 7019789 in fruid even though they were repaired to PN 7051795 to apply updated parameters. Physical label on the PM should be checked to be PN 7051795 in that case or the method described above used to verify updated parameters.All PMs provided by the RSL will have the updated parameters and do not require the tuning package to be installed, including part number 7019789. References BugID: 7110931: SSC RQT Fault: fault.cpu.generic-sparc.c2c MOS DocID: 1452064.1 Contacts Contributor: joe.carr@oracle.com, nitin.malhotra@oracle.ocm Attachments This solution has no attachment |
||||||||||||||||||
|