Date of Workaround Release: 05-Jul-2013
Date of Resolved Release: 21-Aug-2013
__________________________________________
Description
When guest domains running Solaris 11.1.7.5.0 (or greater) on T5 and M5 systems are live migrated, those guest domains may experience data corruption or hangs, or the guest domain's operating system may panic.
This is due to an issue in the Hypervisor firmware which makes older system firmware incompatible with the newer Solaris releases 11.1.7.5.0 and greater.
Note: This issue is also being tracked in Problem <Document:1567072.1>.
Occurrence
This issue can occur on the following systems with Solaris 11.1.7.5.0 or greater:
SPARC Platform
- Guest domains on Sun SPARC T5 Servers with System Firmware versions 9.0.0.d, 9.0.0.h and 9.0.0.i
- Guest domains on Sun SPARC M5-32 Server with System Firmware versions 9.0.1.f and 9.0.1.g
Notes:
1. This issue only impacts SPARC T5 and M5 platforms running Oracle VM Server for SPARC versions 3.0.0.3 or earlier.
2. Primary domains are not impacted by this issue.
To determine the Solaris version of the guest domain, use the following command on the guest domain:
$ pkg info entire
Name: entire
Summary: entire incorporation including Support Repository Update (Oracle Solaris 11.1.7.5.0).
Description: This package constrains system package versions to the same
build. WARNING: Proper system update and correct package
selection depend on the presence of this incorporation.
Removing this package will result in an unsupported system. For
more information see https://support.oracle.com/CSP/main/article
?cmd=show&type=NOT&doctype=REFERENCE&id=1501435.1.
Category: Meta Packages/Incorporations
State: Not installed
Publisher: solaris
Version: 0.5.11 (Oracle Solaris 11.1.7.5.0)
Build Release: 5.11
Branch: 0.175.1.7.0.5.0
Packaging Date: Sat May 04 02:41:45 2013
Size: 5.46 kB
FMRI: pkg://solaris/entire@0.5.11,5.11-0.175.1.7.0.5.0:20130504T024145Z
In the output above, the "Version" text shows that Solaris 11.1.7.5.0 is in use:
Version: 0.5.11 (Oracle Solaris 11.1.7.5.0)
To determine the version of Oracle VM Server for SPARC and the version of Sun System Firmware, use the following command on the primary domain:
# ldm -V
Logical Domains Manager (v 3.0.0.3)
Hypervisor control protocol v 1.11
Using Hypervisor MD v 1.4
System PROM:
Hostconfig v. 1.3.0.h @(#)Hostconfig 1.3.0.h 2013/05/16 16:58
Hypervisor v. 1.12.0.g @(#)Hypervisor 1.12.0.g 2013/05/16 16:40
OpenBoot v. 4.35.0.a @(#)OpenBoot 4.35.0.a 2013/03/01 14:53
Sun System Firmware versions 9.0.0.d, 9.0.0.h and 9.0.0.i for T5 have a hypervisor version of 1.12.0.d, 1.12.0.f and 1.12.0.g respectively. Versions 9.0.1.f and 9.0.1.g for M5 have a hypervisor version of 1.12.1.c and 1.12.1.d respectively. All of these versions are vulnerable to this issue.
Symptoms
If the described issue occurs resulting in data corruption, abnormal application behavior (such as application crashes) may be seen on the guest domain.
If the issue described occurs resulting in a guest domain hang, the guest domain and its applications will be unresponsive.
If the issue described occurs resulting in a guest domain system panic, a system panic message may be seen on the guest domain console after the guest is migrated. Due to the non-deterministic nature of the problem, the panic does not generate reliably reproducible panic output and therefore cannot be documented here.
Workaround
There are two separate workaround options available that can be used to avoid this issue. The first workaround (A) is preferred, since it only temporarily impacts a guest domain and does not require any additional steps to remove the workaround after migration. The second workaround may also be safely used but may impact performance until the workaround is explicitly removed by the user.
Workaround A:
The first workaround option needs to be performed every time a guest domain is to be live migrated. It requires that DRM policies for the guest domain be temporarily disabled, and then the 'disable_mmu_group_demap script' from Problem <Document:1567072.1> be executed on the guest domain prior to initiating a live migration operation. The script will temporarily disable a performance feature of the hardware, which in turn allows a successful live migration of the guest domain. Once live migration has completed, DRM policies will need to be re-enabled.
Please see Problem <Document:1567072.1> for the 'disable_mmu_group_demap' script and complete details for Workaround A.
Workaround B:
The second workaround option requires that the guest domain's '/etc/system' file be edited and the guest domain be rebooted, prior to live migrating the guest domain. Unlike the first workaround, this workaround will permanently disable a performance feature of the hardware, until the guest domain's '/etc/system' file is restored to its original state.
1. On the guest domain, append the following line to the '/etc/system' file:
set sfmmu_demap_xcall_optimization=2
2. Reboot the guest domain.
3. Migrate the guest domain.
Note: This workaround remains in effect until it is removed. If performance is acceptable, and the guest domain may need to be migrated in the future, it is suggested that this workaround be left in place pending the final resolution.
Steps for removing Workaround B:
1. On the guest domain, remove the following line from the /etc/system file:
set sfmmu_demap_xcall_optimization=2
2. Reboot the guest domain.
This issue is addressed in the following firmware patches:
- 17019067 for NETRA SPARC T5-1B SUN SYSTEM FIRMWARE 9.0.2.G
- 17019069 for SPARC T5-4+T5-8 SUN SYSTEM FIRMWARE 9.0.2.G
- 17019075 for SPARC T5-1B SUN SYSTEM FIRMWARE 9.0.2.G
- 17019079 for SPARC T5-2 SUN SYSTEM FIRMWARE 9.0.2.G
- 17019082 for SPARC M5-32 SUN SYSTEM FIRMWARE 9.0.2.E
Patches
<SUNPATCH:17019067>
<SUNPATCH:17019069>
<SUNPATCH:17019075>
<SUNPATCH:17019079>
<SUNPATCH:17019082>
History
05-Jul-2013: Document released; state Workaround
21-Aug-2013: Firmware patches available, issue is Resolved
This regression was triggered by the putback for Solaris bug 15765451 first released
in Solaris 11.1.7.5.0. Prior to this, the Hypervisor code that causes the issue was not exercised.
A mitigation resolution is pending completion for Oracle VM Server for SPARC version 3.0.0.4,
and will revert to using warm migration if Sun System Firmware 9.0.2 (or later) is not present.
This fix is expected to be released near the end of July.
It is a common practice to use live migration to evacuate platforms prior to performing system maintenance,
including firmware upgrades. Therefore, either the workarounds listed above, or the pending mitigation
resolution in Oracle VM Server for SPARC 3.0.0.4 should be used, prior to live migrating guest domains
when upgrading the firmware.
Please see Problem <Document:1567072.1> for the 'disable_mmu_group_demap' script
and complete details for Workaround A.
Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
responsible Engineer listed below.
Internal Contributor/Submitter: Justin.Frank@oracle.com
Internal Eng Responsible Engineer: Justin.Frank@oracle.com
Internal Knowledge Analyst: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems RPE
References
Attachments
This solution has no attachment