![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 2156884.1 : How to Troubleshoot Event SPX86-8003-RR for Exadata with Unpublished Bug 22727539 fault_state(0x0d04)!
Exadata storage cell either X3-2L or X4-2L will reset and report in the ILOM the event SPX86-8003-RR IIO PCIE Fatal Error.The aim of this document is to help identify if the cause of this event was due to unpublished bug 22727539 and to explain how to resolve the problem. In this Document
Applies to:Exadata X4-2 Hardware - Version All Versions to All Versions [Release All Releases]Exadata X4-8 Hardware - Version All Versions to All Versions [Release All Releases] Exadata X3-2 Hardware - Version All Versions to All Versions [Release All Releases] Exadata X3-8b Hardware - Version All Versions to All Versions [Release All Releases] Information in this document applies to any platform. Goal Exadata storage cell either X3-2L or X4-2L will reset and report in the ILOM the event SPX86-8003-RR IIO PCIE Fatal Error.The aim of this document is Solution
Collect ILOM snapshot (Doc ID 1062544.1) and also sundiag (Doc ID 761868.1) collection .
1. Begin first with the ILOM snapshot. i). Look in the file fma/@persist@faultdiags@faults.log Check for the following type of event is reported against a PCI slot containing a flash card. 2016-06-19/03:36:54 e01dbe58-56a3-e4d6-d00d-d4634492c113 SPX86-8003-RR
timestamp ereports fault = fault.io.intel.iio.pcie-fatal@/SYS/MB/PCIE4
This confirms we have event SPX86-8003-RR ,now check that this is due to unpublished bug 22727539
ii).Look in the file ilom/@persist@hostconsole.log and check for the following message being reported before the host resets. mpt2sas2: fault_state(0x0d04)! It may be necessary to look in ilom/@persist@hostconsole.log.1 for this message. if both the SPX86-8003-RR event is seen in faults.log and message "mpt2sas2: fault_state(0x0d04)!" are reported in hostconsole.log then the storage cell may have encountered unpublished bug 22727539.
2.Confirm the problem by checking the sundiag collection. i) Check and make a note of the image version by viewing imageinfo-all.out Example: Active image version: 12.1.2.1.3.151021
ii) Check the firmware version on the F40 or F80 flash card. cd to the cell directory within the sundiag and unpack the file lsidiag-xxxxx-xxxxx-xxxxx-min.tz2 For example : tar -xf lsidiag-exastoracel02-20160627-045657-min.tz2
cd to the unpacked directory and view the file ddcli-listall.txt The bug occurs when the previous check in the snapshot are matched and the F40 or F80 Flash accelerator Module contain the following firmware. F40 will report firmware 09.05.42.00 Example shows F40 ID WarpDrive Package Version PCI Address
Example shows F80 ID WarpDrive Package Version PCI Address
The firmware can also be checked on the cell with the following command: /opt/oracle.SupportTools/CheckHWnFWProfile -action list -component Flash | grep -i 'cardfw' | uniq <CardFw FIRMWARE_ID="1" VALUE="09.05.42.00"/> This example shows F40
/opt/oracle.SupportTools/CheckHWnFWProfile -action list -component Flash | grep -i 'cardfw' | uniq <CardFw FIRMWARE_ID="1" VALUE="09.05.43.00"/> This example shows F80
The FMOD will show firmware UI03 # cellcli -e list physicaldisk attributes makeModel , physicalFirmware where diskType = FlashDisk "Sun Flash Accelerator F40 PCIe Card" UIO3 or "Sun Flash Accelerator F80 PCIe Card" UIO3
Resolution to the problem The patches which fix this problem will not show unpublished bug 22727539 in the README ,this is due to the firmware fix being part of the resolution to critical issue EX28 The problem is resolved by applying either a patch or image update as follows: i) From any earlier image Update to full Image 12.1.2.3.1 or higher. If update to image 12.1.2.3.1 is not possible then one of the following options must be applied to resolve the problem. ii) If running image 12.1.2.3.0 apply the interim fix from 12.1.2.3.0 Patch unpublished # 21749993 The patch is in two parts.Part one contains a set of rpm files ,this is described as the interim fix.Part 2 contains a full iso image and is iii) If running image 12.1.2.2.2 and the system needs to stay at this version then apply the interim fix for 12.1.2.2.2 unpublished Patch # 24306258 Or update the image to a higher release using the full release component of the patch. iv) If running image 12.1.2.2.1 and the system needs to stay at this version then apply the interim fix for 12.1.2.2.1 unpublished Patch # 22106928 Or update the image to a higher release using the full release component of the patch. v) If running image 12.1.2.2.0 and the system needs to stay at this version then apply the interim fix for 12.1.2.2.0 unpublished patch # 22086811 vi) If running image 12.1.2.1.3 and the system needs to stay at this version then apply unpublished patch #23263418 vii) If running image 12.1.2.1.2 and the system needs to stay at this release - there is currently no available patch**No MLR patch available yet** see unpublished bug 23257267 viii) If running image 12.1.2.1.1 and the system needs to stay at this release then apply unpublished patch #23193769
For any earlier image versions please contact the software support specialist (EEST) for guidance. When applying any update please check the storage cells are not currently exposed to Exadata critical issue EX17 by reviewing MOS Doc 1968234.1 Internal Patch details 12.1.2.3.2 - unpublished Patch 23200959 12.1.2.3.1 - unpublished Patch 24306177 12.1.2.3.0 - unpublished Patch 21749993 12.1.2.2.3 - unpublished Patch 23217781 12.1.2.2.2 - unpublished Patch 24306258 12.1.2.2.1 - unpublished Patch 22106928 12.1.2.2.0 - unpublished Patch 22086811 12.1.2.1.3 - unpublished Patch 23263418
Images 12.1.2.2.3 , 12.1.2.3.1 and 12.1.2.3.2 already contain the correct firmware to avoid this bug ,the patches above are for reference if updating from a lower image. Images 12.1.2.1.3 , 12.1.2.2.0 , 12.1.2.2.1 , 12.1.2.2.2 and 12.1.2.3.0 require the above respective patches to resolve the bug.
Once the patches have been applied the firmware will show as below 13.05.xx.xx which confirms the correct firmware to resolve this problem. The FMOD fimware will change to UI06 F40 will report firmware 13.05.10.00 or 13.05.10.01 depending on the patch applied
References: Diagnostic Information for ILOM, ILO , LO100 Issues (Doc ID 1062544.1) Oracle Exadata Diagnostic Information required for Disk Failures and some other Hardware issues (Doc ID 761868.1) Following software upgrade on X3 hardware with Exadata Smart Flash Cache Compression enabled, multiple flash drives may fail, Attachments This solution has no attachment |
||||||||||||||||
|