Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1948360.1
Update Date:2017-02-22
Keywords:

Solution Type  Sun Alert Sure

Solution  1948360.1 :   On Rare Occasions, SPARC T4 Series Servers Running Firmware Versions 8.4.0.a through 8.5.1.b may Experience an Outage  


Related Items
  • Netra SPARC T4-1 Server
  •  
  • Sun Software - Generic
  •  
  • SPARC T4-2
  •  
  • SPARC SuperCluster T4-4
  •  
  • StorageTek Virtual Storage Manager System 6 (VSM6)
  •  
  • SPARC T4-1
  •  
  • SPARC T4-1B
  •  
  • Netra SPARC T4-2 Server
  •  
  • Sun Hardware - Generic
  •  
  • SPARC T4-4
  •  
  • Netra SPARC T4-1B
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

Netra SPARC T4-2 Server
StorageTek Virtual Storage Manager System 6 (VSM6)
Sun Hardware - Generic
Sun Software - Generic
SPARC T4-1
SPARC
Information in this document applies to any platform.
SPARC T4-4 Server
SPARC Supercluster T4-4
StorageTek VSM 6
Netra SPARC T4-1 Server
Netra SPARC T4-1B Server Module
Netra SPARC T4-2 Server
_______________________________________



Date of Resolved Release: 25-Nov-2014
_______________________________________

Description

T4 series servers installed with Sun System Firmware 8.4.0.a through 8.5.1.b may, in rare cases, exhibit the following symptoms:

    - Excessive correctable DIMM events leading to excessive page retires and DIMMs being incorrectly faulted
    - A 'send_mondo' or unrecoverable hardware error system panic (system outage)
    - A system redstate (system outage)
    - A system Hypervisor abort (system outage) 

Occurrence

The issue has been observed in very rare instances when T4 system firmware is upgraded to version 8.4.0 through 8.5.1.b. Currently all confirmed occurrences have been limited to the SPARC T4-4 Server series and the SPARC Supercluster T4-4, but the other products listed in this alert are also potentially affected.

It has been observed that systems that hit this issue usually experience memory errors within two days after upgrading the firmware as stated above, but this is not always the case. If a system has been stable for a period of time on system firmware 8.4.0 or later, then the likelihood of hitting this problem is quite low. However, the resolution referenced below is still recommended as a risk avoidance measure.

The full list of patches affected is as follows:

SPARC T4-1 Server

  • Patch 150676-01 through 150676-06: SPARC T4-1 Sun System Firmware 8.4.0.a, 8.4.0.b, 8.4.0.c, 8.4.1.a, 8.4.2.c, 8.4.2.d
  • Patch 151295-01 through 151295-02: SPARC T4-1 Sun System Firmware 8.5.0.a, 8.5.1.b

SPARC T4-1B Server Module

  • Patch 150679-01 through 150679-04: SPARC T4-1B Sun System Firmware 8.4.0.a, 8.4.0.c, 8.4.1.a, 8.4.2.c
  • Patch 151298-01 through 151298-02: SPARC T4-1B Sun System Firmware 8.5.0.a, 8.5.1.b

SPARC T4-2 Server

  • Patch 150677-01 through 150677-05: SPARC T4-2 Sun System Firmware 8.4.0.a, 8.4.0.b, 8.4.0.c, 8.4.1.a, 8.4.2.c
  • Patch 151296-01 through 151296-02: SPARC T4-2 Sun System Firmware 8.5.0.a, 8.5.1.b

SPARC T4-4 Server

  • Patch 150678-01 through 150678-05: SPARC T4-4 Sun System Firmware 8.4.0.a, 8.4.0.c, 8.4.1.a, 8.4.2.c, 8.4.2.d
  • Patch 151297-01 through 151297-02: SPARC T4-4 Sun System Firmware 8.5.0.a, 8.5.1.b

SPARC Supercluster T4-4

  • Patch 18163942: QUARTERLY FULL STACK DOWNLOAD PATCH FOR SUPERCLUSTER (JAN 2014 - 11.2 AND 12.1)
  • Patch 18517092: QUARTERLY FULL STACK DOWNLOAD PATCH FOR SUPERCLUSTER (APR 2014 - 11.2 AND 12.1)
  • Patch 18965131: QUARTERLY FULL STACK DOWNLOAD PATCH FOR SUPERCLUSTER (JUL 2014 - 11.2 AND 12.1)
  • Patch 19621160: QUARTERLY FULL STACK DOWNLOAD PATCH FOR SUPERCLUSTER (OCT 2014 - 11.2 and 12.1)

StorageTek VSM 6

  • Contact Oracle technical support

Netra SPARC T4-1 Server

  • Patch 150680-01 through 150680-06: Netra SPARC T4-1 Sun System Firmware 8.4.0.a, 8.4.0.b, 8.4.0.c, 8.4.1.a, 8.4.2.c, 8.4.2.d
  • Patch 151299-01 through 151299-02: Netra SPARC T4-1 Sun System Firmware 8.5.0.a, 8.5.1.b

Netra SPARC T4-1B Server Module

  • Patch 150682-01 through 150682-04: Netra SPARC T4-1B Sun System Firmware 8.4.0.a, 8.4.0.c, 8.4.1.a, 8.4.2.c  
  • Patch 151301-01 through 151301-02: Netra SPARC T4-1B Sun System Firmware 8.5.0.a, 8.5.1.b

Netra SPARC T4-2 Server

  • Patch 150681-01 through 150681-05: Netra SPARC T4-2 Sun System Firmware 8.4.0.a, 8.4.0.b, 8.4.0.c, 8.4.1.a, 8.4.2.c
  • Patch 151300-01 through 151300-02: Netra SPARC T4-2 Sun System Firmware 8.5.0.a, 8.5.1.b

Symptoms

A system may experience symptoms similar to the following:

1. Unrecoverable Hardware Error system panic preceded by L2$ memory errors

      FMA events:

      2014-08-21/23:31:36  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-08-21/23:31:36  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-08-21/23:31:36  ereport.cpu.generic-sparc.l2data-uc@/HOST

Solaris Unrecoverable Hardware Error (UHE) panic reported on the host console:

      panic[cpu147]/thread=1007f032c000: Unrecoverable hardware error

      000002a11878ac00 unix:process_nonresumable_error+2ec (2a11878ae50, 0, 2, 2a11878ad10, 2a11878ad68, 100000000)
      %l0-3: 0000000003000000 0000000000000040 0000000000000100 000003000012c790
      %l4-7: 0000000000000000 00000000000000ff 0000000000000000 ffffffffffffff7f
      000002a11878ada0 unix:ktl0+64 (30485cfc920, 0, 1000, 15f35e20, 10398000, af9af1)
      %l0-3: 000003000012c000 0000000000000498 0000000800001604 000000000102a23c
      %l4-7: 00001006b610ef30 00000000102a6400 0000000000000000 000002a11878ae50
      000002a11878aef0 unix:hat_unload_callback+26c (1063e400, ffffffff63d00000, 0, 1, 1004d3506540, 1004d3506540)
      %l0-3: 0000000000000000 0000000000000004 0000000000000004 000000000fffffff
      %l4-7: 0000000000000019 0000000000000000 000003000ffb2928 0000030485cfc920
      000002a11878b3f0 genunix:anon_private+190 (2a11878b5c8, 100777678000, ffffffff63d00000, 301728b6900, 301728bd000, 100776e97590)
      %l0-3: 000003000012c000 000000000000000b 0000000000000000 000010076a5060c0
      %l4-7: 0000000000000000 000003000012c000 00000200eedd2000 0000000000000002
      000002a11878b4f0 genunix:segvn_faultpage+6e8 (1004d3506540, 100777678000, ffffffff63d00000, 10076a5060c0, 0, 0)
      %l0-3: 0000000000000000 0000000000000001 0000000000000002 00001007fce87c48
      %l4-7: 00001006fe360480 0000000000000000 000000000000000b 0000000000000001
      000002a11878b600 genunix:segvn_fault+b24 (ffffffff63d00000, 100777678000, ffffffff63d00000, 0, 0, 1)
      %l0-3: ffffffff63d02000 0000000000000000 0000000000000000 000002a11878b7a0
      %l4-7: 0000000000000002 00001007fce87c48 00001006fe360480 0000000000000001
      000002a11878b800 genunix:as_fault+3f0 (1004d3506540, 100777678000, 1, 100789061080, 2, 1006b531b058)
      %l0-3: 0000000000000001 ffffffff63d00000 0000000000002000 ffffffff63d02000
      %l4-7: ffffffff63d00000 0000000000000001 0000100777678000 0000000000002000
      000002a11878b8f0 unix:pagefault+8c (fff8000100000000, 1006ab680008, 5, 0, 1, 0)
      %l0-3: 0000000000000000 00001006b531b008 000003000012c000 0000000000000000
      %l4-7: 0000000000000002 0000000000000000 ffffffff63d00000 0007ffff00000000
      000002a11878b9b0 unix:trap+e20 (2a11878bb80, 0, 100789061080, 10000, ffffffff7ee19044, 0)
      %l0-3: 000002a11878bad0 0000000000010033 00001006ab680008 0000000000000001
      %l4-7: 0000000000000002 0000000000010000 0000000000001c00 0000000000010080

2. HV Abort preceded by L2$ memory errors

      FMA events:

      2014-05-06/16:34:38  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-05-06/16:34:38  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-05-06/16:34:38  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-05-06/16:34:38  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-05-06/16:34:43  ereport.cpu.generic-sparc.hv-abort@/HOST

Hypervisor Abort reported on the host console:

      ABORT: ../../../greatlakes/src/mmu.s, line 0x41e: DMMU error in hypervisor PC = 8a159a0

3. Redstate triggered by L2$ event

      FMA events:

      2014-07-23/19:25:44  ereport.hc.unspecified.redstate@/SYS/PM1/CMP1/CORE5/P0
      2014-07-23/19:25:44  ereport.hc.unspecified.redstate@/SYS/PM0/CMP1/CORE2/P5

Redstate reported on the host console:

      2014-07-24 02:26:10  3:5:0> NOTICE:     nesr            : 700000
      2014-07-24 02:26:10  1:2:5> NOTICE:     nesr            : 400000
      2014-07-24 02:26:10  3:5:0> NOTICE:     near            : 0
      2014-07-24 02:26:10  1:2:5> NOTICE:     near            : 0
      2014-07-24 02:26:10  3:5:0> NOTICE:     desr            : 0
      2014-07-24 02:26:10  1:2:5> NOTICE:     desr            : 0
      2014-07-24 02:26:10  3:5:0> NOTICE:     dfesr           : 0
      2014-07-24 02:26:10  1:2:5> NOTICE:     dfesr           : 0
      2014-07-24 02:26:10  3:5:0> NOTICE:     pesr            : 200
      2014-07-24 02:26:10  1:2:5> NOTICE:     pesr            : 100
      2014-07-24 02:26:10  3:5:0> NOTICE:     dsfsr           : 80
      2014-07-24 02:26:10  1:2:5> NOTICE:     dsfsr           : 0
      2014-07-24 02:26:10  3:5:0> NOTICE:     dsfar           : f301346800
      2014-07-24 02:26:10  1:2:5> NOTICE:     dsfar           : 158baf10
      2014-07-24 02:26:10  3:5:0> NOTICE:  tl      tpc             tnpc             tstate tt     htstate
      2014-07-24 02:26:10  1:2:5> NOTICE:  tl      tpc             tnpc             tstate tt     htstate
      2014-07-24 02:26:10  3:5:0> NOTICE:  1 000000010782cf60 000000010782cf64 0000004482001203 00a 0000000000000400
      2014-07-24 02:26:10  1:2:5> NOTICE:  1 000000000100f77c 000000000100f780 0000004480001406 180 0000000000000400
      2014-07-24 02:26:10  3:5:0> NOTICE:  2 0000000008a4c840 0000000008a4c844 000001994f001003 00a 0000000000000004
      2014-07-24 02:26:10  1:2:5> NOTICE:  2 0000000008a213e4 0000000008a213e8 0000014480001006 032 0000000000000004
      2014-07-24 02:26:10  3:5:0> NOTICE:  3 0000000008a4c840 0000000008a4c844 000002444f001003 00a 0000000000000004
      2014-07-24 02:26:10  1:2:5> NOTICE:  3 0000000008a4c070 0000000008a4c074 0000024480001006 00a 0000000000000004
      2014-07-24 02:26:10  3:5:0> NOTICE:  4 0000000008a4c840 0000000008a4c844 000003444f001003 00a 0000000000000004
      2014-07-24 02:26:10  1:2:5> NOTICE:  4 0000000008a4c774 0000000008a4c778 000003994f001006 032 0000000000000004
      2014-07-24 02:26:10  3:5:0> NOTICE:  5 0000000008a4c840 0000000008a4c844 000003444f001003 00a 0000000000000004
      2014-07-24 02:26:10  1:2:5> NOTICE:  5 0000000008a4c070 0000000008a4c074 000003994f001006 00a 0000000000000004
      2014-07-24 02:26:10  3:5:0> NOTICE:  6 0000000008a1b2b8 0000000008a1b2bc 000003444f001003 00a 0000000000000004
      2014-07-24 02:26:11  1:2:5> NOTICE:  6 0000000008a4c774 0000000008a4c778 000003444f001006 032 0000000000000004
      2014-07-24 02:26:11  3:5:0> NOTICE:  
      2014-07-24 02:26:11  1:2:5> NOTICE:  
      2014-07-24 02:26:11  3:5:0> ERROR:   Redstate trap occurred on node 3 strand 40
      2014-07-24 02:26:11  1:2:5> ERROR:   Redstate trap occurred on node 1 strand 21
      2014-07-24 02:26:15  3:5:0> ERROR:   Powering down due to Redstate

4. A 'send_mondo system panic' preceded by L2$ memory errors

      FMA events:

      2014-08-02/00:41:20  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-08-02/00:41:20  ereport.cpu.generic-sparc.l2data-uc@/HOST
      2014-08-02/00:41:21  ereport.cpu.generic-sparc.l2data-uc@/HOST

Solaris Panic reported on the host console:

      panic[cpu54]/thread=1005ea1b38a0: send_mondo_set: timeout

      000002a118aecc70 unix:send_mondo_set+560 (1, bec53, 3e, c3097b50968a, c3097b509310, 3000006e790)
      %l0-3: 00000000299e2800 0000000000000001 000000001057e1b8 0000c3097b50968a
      %l4-7: 000000000ab21dae 00000000010c8000 00000000000001f8 00000000010c8000
      000002a118aecd40 unix:xt_some+1a8 (2a118aed028, 102741c, 2a119a0c000, 40001015800, 2a118aecdf0, 0)
      %l0-3: 00000000104512c8 0000000000000178 0000000000000000 0000000000000036
      %l4-7: ffffffffffffffff 0000000000000000 0000000000000001 0000000000000000
      000002a118aecf70 unix:sfmmu_flush_pages+444 (30, 2a119a0c000, 1, 2a118aed028, 1, 2a118aed5d8)
      %l0-3: 0000040001015800 0000000000000000 0000000000000000 0000040001013c00
      %l4-7: 0000000000000036 0000000001027400 0000000000000000 0000000000000030
      000002a118aed1b0 unix:sfmmu_tlb_range_demap+ec (2a118aed5a0, 2a119a0c000, 0, 0, 2a119a0e000, 0)
      %l0-3: 0000000000000000 0000040001015800 0000000000000001 000000000000000d
      %l4-7: 0000000000000000 0000040092660018 0000000000002000 0000000000002000
      000002a118aed260 unix:hat_unload_callback+82c (1, 2a119a0e000, 0, 0, 40001015800, 40001015800)
      %l0-3: 0000000000000001 0000000000000001 0000000000000001 000000000fffffff
      %l4-7: 0000000000000000 000002a119a0e000 00000400a523b058 000003011b4fd100
      000002a118aed760 genunix:segkp_release_internal+90 (1002057d6d58, ffffffffffffffff, 2a119a0c000, d, 10396ff8, 106682a8)
      %l0-3: 00000000010d2f68 00000000010d2f68 000010023e10a388 0000000000000001
      %l4-7: 0000000000000002 0000000000000001 0000000000001fff 00000000010d2f70
      000002a118aed810 genunix:schedctl_freepage+18 (1005a148a938, 2a119a0c000, f2e1c000, 4, f2e1c000, 100d7800)
      %l0-3: 000010023e10a3b8 000010023e10a392 00000000f2e1e000 0000000000000000
      %l4-7: 000010028db92008 0000000000000001 000000001064c9f0 000000001064d0f0
      000002a118aed8c0 genunix:schedctl_proc_cleanup+3c (1006043a1bf0, 10d2c00, 106670f8, 10667000, 1005ce5d4008, 10020ba45b90)
      %l0-3: 00001006043a1bf0 00000000db5fffff 00000000db5ffc00 0000000000000050
      %l4-7: 0000000010658f98 0000000010658c00 00000000010d2f60 0000000000000000
      000002a118aed970 genunix:proc_exit+20c (ffff0000, 0, 0, 5a006002, 0, 1)
      %l0-3: 0000000000000000 0000000000000000 00001005ea10e108 0000000000000000
      %l4-7: 00001005ea1b38a0 00001006043a1bf0 0000000000000000 00001005ce5d4008
      000002a118aeda20 genunix:exit+8 (1, 0, 2c400, 60000, ffffffff, 60000)
      %l0-3: 000000003e270000 0000000000003e27 0000000000000001 000003000006e000
      %l4-7: 000003b77bb8fef8 0000000000000000 0000000000000000 0000000000000000

Workaround

This issue is addressed in the following releases:

SPARC T4 servers and Netra SPARC T4 servers:

SPARC T4-1 Server

  • Patch 151682-01: SPARC T4-1 Sun System Firmware 8.6.0.b

SPARC T4-1B Server Module

  • Patch 151685-01: SPARC T4-1B Sun System Firmware 8.6.0.b

SPARC T4-2 Server

  • Patch 151683-01: SPARC T4-2 Sun System Firmware 8.6.0.b

SPARC T4-4 Server

  • Patch 151684-01: SPARC T4-4 Sun System Firmware 8.6.0.b

Netra SPARC T4-1 Server

  • Patch 151686-01: Netra SPARC T4-1 Sun System Firmware 8.6.0.b

Netra SPARC T4-1B Server Module

  • Patch 151688-01: Netra SPARC T4-1B Sun System Firmware 8.6.0.b

Netra SPARC T4-2 Server

  • Patch 151687-01: Netra SPARC T4-2 Sun System Firmware 8.6.0.b

SPARC Supercluster T4-4 and StorageTek VSM 6:

  • Contact Oracle technical support

Patches

<SUNPATCH:151682-01>, <SUNPATCH:151683-01>
<SUNPATCH:151684-01>, <SUNPATCH:151685-01>
<SUNPATCH:151686-01>, <SUNPATCH:151687-01>
<SUNPATCH:151688-01>

History

25-Nov-2014: Document released, status is Resolved

This issue was initially reported under Bug 18895455, which has been
closed as a duplicate of Bug 19721476.

The resolution that applies to SPARC Supercluster T4-4 currently says
"contact technical support." This will be updated to reference
the software bundle containing the fix when the bundle is released.

So far the issue has only been seen on systems that contain DIMM
part number M393B2K70CM0-YF8.

To confirm the part number from snapshot data:

{snapshot}/ilom

/usr/gnu/bin/egrep -A 8 "type = DIMM" @usr@local@bin@collect_properties.out | egrep "fru_part_number|fru_serial_number" | sed 'N;s/\n/ /' | awk '{ print $3" : "$6 }'

Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
Internal Contributors/Submitters listed below

Internal Contributor/Submitter: Alex Aftandilian, Matt Finch, Justin Hatch, Marcel Widjaja
Internal Eng Responsible Engineer: Alex Aftandilian
Oracle Knowledge Analyst: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems Group - SYS
Internal Escalation ID:
Internal Resolution Patches: 151682-01, 151685-01
151683-01, 151684-01 ,151688-01, 151687-01

References



<BUG:19721476> - COMPUTE_PCHG_POWER_DOWN ERRONEOUSLY CALLED ON YF AFTER REMOVING RF/T3 SUPPORT






Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback