__________________________________________
Symptoms
When guest domains running Solaris 11.1.7.5.0 (or greater) on T5 and M5 systems are live migrated, those guest domains may experience data corruption or hangs, or the guest domain's operating system may panic.
Note: Please see Sun Alert <Document:1567076.1>, which is also tracking this issue. Attached to this document is the script used in the Workaround 'A" section.
Changes
Guest domains running Solaris 11.1.7.5.0 (or greater) on T5 and M5 systems are being live migrated.
Cause
This is due to an issue in the Hypervisor firmware which makes older system firmware incompatible with the newer Solaris releases 11.1.7.5.0 and greater.
This issue can occur on the following systems with Solaris 11.1.7.5.0 or greater:
- Guest domains on Sun SPARC T5 Servers with System Firmware versions 9.0.0.d, 9.0.0.h and 9.0.0.i
- Guest domains on Sun SPARC M5-32 Server with System Firmware versions 9.0.1.f and 9.0.1.g
Notes:
1. This issue only impacts SPARC T5 and M5 platforms running Oracle VM Server (Previously called Sun Logical Domains, or LDoms) for SPARC versions 3.0.0.3 or earlier.
2. Primary domains are not impacted by this issue.
3. Sun System Firmware versions 9.0.0.d, 9.0.0.h and 9.0.0.i for T5 have a hypervisor version of 1.12.0.d, 1.12.0.f and 1.12.0.g respectively. Versions 9.0.1.f and 9.0.1.g for M5 have a hypervisor version of 1.12.1.c and 1.12.1.d respectively. All of these versions are vulnerable to this issue.
Solution
There are 2 separate workarounds available to avoid this issue until a final resolution is available. The first workaround is preferred since it only temporarily impacts a guest domain and does not require any additional steps to remove the workaround after migration. The second workaround may also be safely used but may impact performance until the workaround is explicitly removed by the user.
Steps for applying Workaround A: (for use with the attached script)
The disable_mmu_group_demap script (as also referenced in Sun Alert <Document:1567076.1>), is attached below. Download and save this script for execution from the command line. As per Sun Alert <Document:1567076.1>, you must be the 'root' user to execute this script.
The first workaround needs to be performed every time a guest domain is to be live migrated. It requires that DRM policies for the guest domain be temporarily disabled, and then the attached disable_mmu_group_demap script be executed on the guest domain prior to initiating a live migration operation. This script will temporarily disable a performance feature of the hardware, which in turn allows a successful live migration of the guest domain. Once live migration has completed, DRM policies will need to be re-enabled.
1. If DRM policies are present, they must all be disabled on the primary domain using the 'ldm set-policy' command. For example:
# ldm set-policy enable=no name=<POLICY-NAME> <GUEST-NAME>
where <POLICY-NAME> is replaced by an actual policy name, and <GUEST-NAME> is replaced by the actual guest domain name. Although only one example is shown, this command must be run for all policies for the guest domain that is to be migrated. A list of all policies may be obtained by using the following command:
# ldm ls -o resmgmt <GUEST-NAME>
2. On the guest domain, as the 'root' user, run the attached disable_mmu_group_demap script and note the output:
# disable_mmu_group_demap
MMU group demap disabled successfully.
It is safe to do Live Migration.
3. Migrate the guest domain.
4. Once the guest is migrated, re-enable all of the policies that were disabled in step 1. For example:
# ldm set-policy enable=yes name=<POLICY-NAME> <GUEST-NAME>
Note: Once the guest domain has been successfully migrated and the policies re-enabled, no other steps are required in order to remove this workaround or to re-enable the performance feature of the hardware.
Steps for applying Workaround B:
The second workaround requires that the guest domain's '/etc/system' file be edited and the guest domain be rebooted, prior to live migrating the guest domain. Unlike the first workaround, this workaround will permanently disable a performance feature of the hardware, until the guest domain's /etc/system file is restored to its original state.
1. On the guest domain, append the following line to the /etc/system file:
set sfmmu_demap_xcall_optimization=2
2. Reboot the guest domain.
3. Migrate the guest domain.
Note: this workaround remains in effect until it is removed. If performance is acceptable, and the guest domain may need to be migrated in the future, it is suggested that this workaround be left in place pending the final resolution.
Steps for removing Workaround B:
1. On the guest domain, remove the following line from the /etc/system file:
set sfmmu_demap_xcall_optimization=2
2. Reboot the guest domain.
This issue is addressed in the following firmware patches:
- 17019067 for NETRA SPARC T5-1B SUN SYSTEM FIRMWARE 9.0.2.G
- 17019069 for SPARC T5-4+T5-8 SUN SYSTEM FIRMWARE 9.0.2.G
- 17019075 for SPARC T5-1B SUN SYSTEM FIRMWARE 9.0.2.G
- 17019079 for SPARC T5-2 SUN SYSTEM FIRMWARE 9.0.2.G
- 17019082 for SPARC M5-32 SUN SYSTEM FIRMWARE 9.0.2.E
Modification History:
05-Jul-2013: Document released
21-Aug-2013: Fimware patches available, issue is Resolved
This regression was triggered by the putback for Solaris bug 15765451 first released
in Solaris 11.1.7.5.0. Prior to this, the Hypervisor code that causes the issue was
not exercised.
A mitigation resolution is pending completion for Oracle VM Server for SPARC version 3.0.0.4,
and will revert to using warm migration if Sun System Firmware 9.0.2 (or later) is not present.
This fix is expected to be released near the end of July.
It is a common practice to use live migration to evacuate platforms prior to performing system
maintenance, including firmware upgrades. Therefore, either the workarounds listed above, or
the pending mitigation resolution in Oracle VM Server for SPARC 3.0.0.4 should be used, prior
to live migrating guest domains when upgrading the firmware.
Questions regarding this document should be emailed to the Contributors and Engineers below:
Internal Contributor/Submitter: madhavan.venkataraman@oracle.com, Justin.Frank@oracle.com
Internal Eng Responsible Engineer: madhavan.venkataraman@oracle.com, Justin.Frank@oracle.com
Internal Eng Business Unit Group: Systems RPE
References
Sun Alert
Attachments
This solution has no attachment