Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2289763.1
Update Date:2017-11-15
Keywords:

Solution Type  Problem Resolution Sure

Solution  2289763.1 :   Upgrade System Firmware to v9.7.6.b (or later) to Avoid some SPARC T7-4 CLINK Issues on 1 PM/1 PFM Configurations  


Related Items
  • SPARC T7-4
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T7
  •  




In this Document
Symptoms
Cause
Solution


Applies to:

SPARC T7-4 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Important: System Firmware 9.7.6.b will not fix "FATAL: T7-4 PFM Retimer Init failed" failures. For example:

In ereports.log:
2017-05-15/10:30:39 ereport.hc.abort@/SYS/PM0/CM0/CMP
reason = T7-4 PFM Retimer Init failed.

system_component_firmware_manufacturer = Oracle Corporation
system_component_firmware_versions = xxx
system_component_firmware_releases = xxx

In Hostconsole.log:
2017-05-15 10:45:01 0:00:0> NOTICE: Pretraining Coherency Links (1 of 3)
2017-05-15 10:45:06 0:00:0> NOTICE: Pretraining Coherency Links (2 of 3)
2017-05-15 10:48:08 0:00:0> FATAL: T7-4 PFM Retimer Init failed.

2017-05-15 10:48:08 0:00:0> NOTICE: Waiting for poweroff or powercycle from the SP
2017-05-15 10:49:16 SP> NOTICE: Host is off
2017-05-15 10:49:19 SP> NOTICE: Start Host in progress: Step 2 of 7
2017-05-15 10:49:20 SP> NOTICE: Start Host in progress: Step 3 of 7
2017-05-15 10:49:23 SP> NOTICE: Start Host in progress: Step 4 of 7

These are hardware errors and can only be fixed by replacing the PFM (Processor Filler Module).

Only happens on SPARC T7-4 configuration with a single Processor Module (PM) and a Processor Filler Module (PFM) installed. This issue does not affect the SPARC T7-4 two Processor Modules (2 PM's) configurations.

1. fault.cpu.generic-sparc.c2c to PM0 or PFM for one or more of CLINKs in the list below. Listed below are the Clinks that go through through the PFM retimer.

/SYS/PM0/CM0/CMP/CLX0/CLINK0
/SYS/PM0/CM0/CMP/CLX0/CLINK1
/SYS/PM0/CM0/CMP/CLX1/CLINK2
/SYS/PM0/CM0/CMP/CLX1/CLINK3
/SYS/PM0/CM1/CMP/CLX0/CLINK2
/SYS/PM0/CM1/CMP/CLX0/CLINK3
/SYS/PM0/CM1/CMP/CLX1/CLINK0
/SYS/PM0/CM1/CMP/CLX1/CLINK1

For example,

-> show faulty

Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0 | fru | /SYS/PFM1
/SP/faultmgmt/0/ | class | fault.cpu.generic-sparc.c2c
faults/0 | |
/SP/faultmgmt/0/ | sunw-msg-id | SPSUN4V-8000-2S
faults/0 | |
/SP/faultmgmt/0/ | component | /SYS/PFM1
faults/0 | |
.
.
.
/SP/faultmgmt/1 | fru | /SYS/PM0
/SP/faultmgmt/1/ | class | fault.cpu.generic-sparc.c2c
faults/0 | |
/SP/faultmgmt/1/ | sunw-msg-id | SPSUN4V-8000-2S
faults/0 | |
/SP/faultmgmt/1/ | component | /SYS/PM0/CM1/CMP/CLX1/CLINK1
faults/0 | |

.
.
.
/SP/faultmgmt/1/ | component | /SYS/PM0/CM0/CMP/CLX1/CLINK3
faults/1 | |
/SP/faultmgmt/1/ | uuid | 62c420ce-c773-62a7-f593-d03f2f42db
faults/1 | | aa
/SP/faultmgmt/1/ | timestamp | 2017-02-03/05:22:15
faults/1 | |

 

2. Clink training error during powering on the system or link training fault ereport.

For example,

2017-02-27 10:14:53 0:00:0> NOTICE: Pretraining Coherency Links (3 of 3)
2017-02-27 10:15:51 0:00:0> NOTICE: Training Coherency Links
2017-02-27 10:15:52 0:00:0> ERROR: /SYS/PM0/CM0/CMP/CLX0/CLINK1/LANE1: SB link failed to train

2017-02-27/10:15:53 ereport.hc.dev_fault@/SYS/PM0/CM0/CMP/CLX0/CLINK1/LANE1
reason = SB link failed to train
link-status = 0x4000340000000200
SB-lanes = 0xb5e20
system_component_firmware_manufacturer = Oracle Corporation
system_component_firmware_versions = (ILOM)3.2.7.1.c,(POST)5.5.3.a,(OBP)4.40.3,(HV)1.17.3.a
system_component_firmware_releases = (ILOM)2016.09.16,(POST)2016.08.26,(OBP)2016.08.17,(HV)2016.09.16

 

3. Excessive crc-link ereports under SP.

For example,

faultmgmtsp> fmdump -e
TIMESTAMP EREPORT
2017-02-01/15:47:21 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:48:53 ereport.cpu.generic-sparc.c2c-link-reinit@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:49:46 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:52:04 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:54:30 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:57:04 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:58:14 ereport.cpu.generic-sparc.c2c-link-reinit@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/15:59:42 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/16:01:49 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/16:03:06 ereport.cpu.generic-sparc.c2c-link-reinit@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/16:06:06 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/16:20:01 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/16:32:55 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3
2017-02-01/16:54:48 ereport.cpu.generic-sparc.c2c-link@/SYS/PM0/CM0/CMP/CLX1/CLINK3

.
.
.

 

Cause

 

 

 

 

 

Working theory is: The clock synthesizer on main module spreads the coherency link clock to all the CM/CPU's, so for 2PM configuration, we have all four CMs running at the same clock/frequency. However, for 1PM configuration, the repeater on PFM only takes reference clock during initialization, and later on, the retimer uses the recovered clock from data. If the PFM retimer PLL is slow to lock or cannot deal with SSC well, we will have the CLINK issue with 1PM configuration.

Solution

Upgrade to system firmware 9.7.6.b (or later) which has parameter tuning to improve coherency link clock quality in SPARC T7-4 PM/PFM configurations.

After firmware upgrade, if Symptoms 1, 2 and/or 3 continue to occur on the PFM retimer CLINKs listed in Symptom 1, replace PFM hardware first and for any other CLINKs replace the PM0 hardware first.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback