Date of Preliminary Release: 05-Aug-2016
Date of Resolved Release: 17-Aug-2016
__________________________________________________
Description
A regression in error handling functionality of certain type of errors in SPARC T4 servers with System Firmware Version 8.9.5 will cause the system to panic. This regression also impacts SPARC T5 and SPARC M5-32/M6-32 Servers running System Firmware Version 9.6.5.
The affected firmware Patches 8.9.5 and 9.6.5 are WITHDRAWN. (Replaced by 8.9.5.a and 9.6.5.a - see "Resolution")
Occurrence
This issue can occur on the following platforms:
SPARC Platform
- SPARC T4-1 with Firmware version 8.9.5 (patch 152475-01)
- SPARC T4-2 with Firmware version 8.9.5 (patch 152476-01)
- SPARC T4-4 with Firmware version 8.9.5 (patch 52477-01)
- SPARC T4-1B with Firmware version 8.9.5 (patch 152478-01)
- Netra SPARC T4-1 with Firmware version 8.9.5 (patch 152479-01)
- Netra SPARC T4-2 with Firmware version 8.9.5 (patch 152480-01)
- Netra SPARC T4-1B with Firmware version 8.9.5 (patch 152481-01)
- SPARC T5-2 with Firmware version 9.6.5 (patch 23763456)
- SPARC T5-4 with Firmware version 9.6.5 (patch 23763457)
- SPARC T5-8 with Firmware version 9.6.5 (patch 23763457)
- SPARC T5-1B with Firmware version 9.6.5 (patch 23763458)
- Netra SPARC T5-1B with Firmware version 9.6.5 (patch 23763459)
- SPARC M5-32 with Firmware version 9.6.5 (patch 23763460)
- SPARC M6-32 with Firmware version 9.6.5 (patch 23763460)
Note: No other platforms or systems are affected by this issue.
To determine the firmware version installed on the system, use the following ILOM command:
-> show /HOST sysfw_version
Symptoms
Should the described issue occur, the system will experience a panic. Occasionally, the system may freeze and need to be power cycled to recover. The panic string on the host console will show output similar to the following:
send mondo timeout [retries: 0xd42ff] cpuids: 0x16
timeout: attempted to reset cpu22
panic: failed to stop cpu22: attempting reset
panic: failed to reset cpu22
...
panic[cpu114]/thread=2a100013b80: send_mondo_set: timeout
...
000002a1000124f0 unix:send_mondo_set+5a8 (10113c00, d42ff, 7e, 19b15f65250,
19b15f6502f, 300000e6800)
%l0-3: 000004000b17b400 0000000000000001 00000000205539f8 0000019b15f65250
%l4-7: 00000000000000b0 0000000000000016 0000000000000001 0000000000000001
000002a1000125c0 unix:xc_all+670 (10094b34, 0, 0, 7f, 86f, 208a2628)
%l0-3: 000002a100012a78 0000000000000003 0000000020613e80 00000000203c1178
%l4-7: 0000000000001000 000000000000007f 0000000000000001 000002a100012a80
000002a100013880 genunix:thread_reaper+128 (208a2c00, 10137800, 101bf000,
208b3d28, 208a2c10, 20875c7c)
%l0-3: 0000000020122c00 00030400d6bf4f00 00030400d6d7bbc0 00000000208b3c00
%l4-7: 000000002084b800 000000002084b800 00000000208b4000 0000000020875c00
...
syncing file systems... done
Deferred dump not available.
dumping to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel sections:
zfs
0:05 98% done (kernel)
0:05 100% done (zfs)
100% done: 693745 (kernel) + 11961 (zfs) pages dumped, dump succeeded
rebooting...
Resetting...
Workaround
There is no workaround for this issue.
Resolution
This issue is addressed on the following platforms:
SPARC Platform
- SPARC T4-1 with Firmware version 8.9.5.a patch 152475-02 or later
- SPARC T4-2 with Firmware version 8.9.5.a patch 152476-02 or later
- SPARC T4-4 with Firmware version 8.9.5.a patch 152477-02 or later
- SPARC T4-1B with Firmware version 8.9.5.a patch 152478-02 or later
- Netra SPARC T4-1 with Firmware version 8.9.5.a patch 152479-02 or later
- Netra SPARC T4-2 with Firmware version 8.9.5.a patch 152480-02 or later
- Netra SPARC T4-1B with Firmware version 8.9.5.a patch 152481-02 or later
- SPARC T5-2 with Firmware version 9.6.5.a patch 24441083 or later
- SPARC T5-4 with Firmware version 9.6.5a. patch 24441085 or later
- SPARC T5-8 with Firmware version 9.6.5.a patch 24441085 or later
- SPARC T5-1B with Firmware version 9.6.5.a patch 24441087 or later
- Netra SPARC T5-1B with Firmware version 9.6.5.a patch 24441088 or later
- SPARC M5-32 with Firmware version 9.6.5.a patch 24441089 or later
- SPARC M6-32 with Firmware version 9.6.5.a patch 24441089 or later
Consult the following link for available system firmware download links:
https://www.oracle.com/technetwork/systems/patches/firmware/release-history-jsp-138416.html
Patches
<Patch:152475-02> <Patch:152476-02> <Patch:152477-02> <Patch:152478-02>
<Patch:152479-02> <Patch:152480-02> <Patch:152481-02> <Patch:24441083>
<Patch:24441085> <Patch:24441085> <Patch:24441087> <Patch:24441088>
<Patch:24441089>
History
05-Aug-2016: Document released, status is Preliminary
17-Aug-2016: All FW patches are released, issue is Resolved
23-Aug-2016: Correction to patchID in Resolution for SPARC T5-1b
This regression was caused by a mis-merge of the backport for 20806606.
This mis-merge only occurred in the code branch for the minor 1.15.x gate from
which we later branched 1.15.5-micro which 9.6.5/8.9.5 consumed.
As such, the earlier release of 9.5.4/8.8.4 w3hich also included the change for
20806606 is unaffected by this issue.
Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
submitter/responsible engineer listed below.
Internal Contributor/Submitter: marcel.widjaja@oracle.com
Internal Eng Responsible Engineer: josline.nyagahima@orale.com
Internal Eng Business Unit Group: SPARC
References
<BUG:24402060> - MIS-MERGE BROKE HYPERVISOR ERROR HANDLING FOR RESUMABLE/NON-RES ERROR REPORTS
Attachments
This solution has no attachment