Asset ID: |
1-72-1581135.1 |
Update Date: | 2017-10-05 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1581135.1
:
Oracle VM guest bind failure due to Direct I/O device in an Unknown (UNK) state.
Related Items |
- Netra SPARC T5-1B Server Module
- SPARC T5-8
- SPARC T5-4
- SPARC T5-2
- SPARC T5-1B
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5
|
In this Document
Created from <SR 3-7566729561>
Applies to:
Netra SPARC T5-1B Server Module - Version All Versions and later
SPARC T5-4 - Version All Versions and later
SPARC T5-2 - Version All Versions and later
SPARC T5-1B - Version All Versions and later
SPARC T5-8 - Version All Versions and later
Information in this document applies to any platform.
Symptoms
Oracle VM guest bind failure due to Direct I/O device in an Unknown (UNK) state.
# ldm bind-dom guest_domain
Direct I/O operations are not permitted on a slot (/SYS/RCSA/PCIE2)
which has status EMP (empty) or unknown (UNK).
Cause
One or more devices are in a faulted or retired state.
Solution
Confirm the status of the Direct I/O PCIe device;
# ldm ls-io | grep UNK
Check, investigate and resolve any PCIe fault events involving the impacted device.
Solaris;
# fmadm faulty
ILOM CLI;
-> start /SP/faultmgmt/shell
-> fmadm faulty
Once all faults are resolved confirm the card status via the ILOM CLI;
-> show /System/PCI_Devices/Add-on/Device_{impacted-device}
For example where PCIe2 is shown as UNK;
-> show /System/PCI_Devices/Add-on/Device_2
Properties:
part_number = Not Recognized
description = Not Recognized
location = PCIE2 (PCIE 2)
pci_vendor_id = 0x0000
pci_device_id = 0x0000
pci_subvendor_id = 0x0000
pci_subdevice_id = 0x0000
In this case we can confirm the card is present (else Device_2 would not exist), though we cannot read the SEEPROM on the card.
To resolve this once all fault events are addressed, shut down all guest domains and the control domain, then reinitialise the platform to re-POST all hardware and reinitialise/probe the PCIe links/devices.
-> reset /SYS
Once POST is complete confirm the card is now reported correctly, for example;
/System/PCI_Devices/Add-on/Device_2
Properties:
part_number = SG-XPCIE2FC-QF8-Z
description = Sun StorageTek Dual 8 Gb Fibre Channel PCIe HBA
location = PCIE2 (PCIE 2)
pci_vendor_id = 0x1077
pci_device_id = 0x2532
pci_subvendor_id = 0x1077
pci_subdevice_id = 0x0171
Boot Solaris and check whether the following is reported;
NOTICE: One or more I/O devices have been retired
If any DIO endpoint PCIe device continues to be reported as UNK and you see the above message it is likely the io-retire service believes these devices are still unusable.
Verify by checking 'prtconf -vp' output and confirm which devices are reported as '(retired)'.
If you are confident the fault events have been cleared and the underlying hardware is functioning correctly again, use the following steps to clear any stale retirement logs;
# rm /etc/devices/retire_store
# fmadm reset io-retire
# bootadm update-archive
Then reboot the host and check all devices are reported normally (ldm ls-io, prtconf -vp, fmadm faulty) before bringing the guest domains back online.
Note : Typically repairing the fault event reported against the impacted PCIe device (fmadm repair) will update retire_store and io-retire, so the steps above are not normally required - if devices remain disabled or in an unknown state after following these steps please escalate for more detailed diagnosis.
References
<BUG:16887730> - IMPROVE T5-8 AND T5-4 IFC FAN CONTROL ALGORITHM FOR BETTER FAN SPEED CONTROL
Attachments
This solution has no attachment