Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FABs available to Internals and Partners only
Information in this document applies to any platform.
Escalation ID: 41143985
_________
Affected Parts: (FRU/CRU Part Number / Description)
540-7695 - 16-Port Virtualized Multi-Fabric Network Express Module (X4238)
Symptoms
The Sun Blade 6000 (SPARC) will panic and X86 blades will lose communication with the affected NEM.
Example panic string:
panic[cpu100]/thread=2a104663ca0:
Fatal error has occured in: PCIe fabric.(0x0)(0x41)
Check the FMA errors after the blade reboots. Look for the following signature, to determine if it was a surprise down event on the NEM due to it processing a KILLALL signal. The FMA event will be on one of the NEM modules:
grep pcie_ue_status */fma/*fmdump*
pcie_ue_status = 0x20 = surprise down
Impact
The main power is turned off causing the NemHydra to power off. This will cause the blades OS to react to a network device loss.
Changes
Contributing Factors
Sun Blade 6000 Virtualized Multi-Fabric 10GbE Network Express Module.
Increased i2c activities could affect a corrupted read/write to the ADM1066.
The ADM1066 is a stand alone power sequencer and monitoring device which monitors multiple voltage rails and is also in charge of initiating power down with KILLALL signal to the NEM. The NEM contains two of these ADM1066 devices.
Cause
Root Cause
Due to the inability to consistently repeat this failure, we do not know what device asserts the KILLALL on the NEM's ADM1066. The SAS expander could be a possible suspect as by design it will assert KILLALL when i2c temperature (ambient and junction) readings exceed 75C, 120C. Although these temperatures were never observed in a failing environment, a corrupted temperature read could cause this effect.
By blocking the KILLALL signal on the ADM1066, the SAS expander can no longer shut down the NEM due to false overtemp reading. However the SAS expander will still turn on the LED when the warning threshold (65C, 100C) is actually reached. Also when a real NEM overtemp occurs, the voltage would increase and the main power sequencer will detect it and shutdown the NEM.
Solution
Workaround
No workaround available - see Resolution section.
Resolution
Patch 11884187: SUN BLADE 6000 10GBE VMF NEM SW 2.2.1 TOOLS AND DRIVERS
contains the Power Sequencer code update and SAS update Firmware.
Patch 11883817: SUN BLADE 6000 10GBE VMF NEM SW 2.2.1 FIRMWARE contains the
firmware for the NEM/SP.
This ILOM package contains fixes for the following reported issues:
1. 7017229 - lades fail to attach NEM Hydra on reboot or after crash
2. 7010225 - Onbox legal notices need to be updated to 2011
Reference the attached document for Power Sequencer firmware update instructions, which will require ILOM "escalation mode" and an FE on-site to perform these instructions.
References
Please review Reference <Document:1486997.1> for more details on the SunBlade 6000 Hydra NEM Upgrade Procedure.
For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:
https://my.oracle.com/site/cs/17516/17518/17988/24503/index.html
In addition to the above you may email:
FAB-Manager_US@oracle.com
Contacts
Contributor: daniel.p.lord@oracle.com
Responsible Engineer: richard.j.li@oracle.com
Responsible Manager: david.mullenex@oracle.com
Business Unit Group: Systems Group-x64 (X4100-X4600 (and M2), V20z/V40z/V60z/V65z, @Ultra20/40 (and M2) Workstations), Systems Group-SVS (SPARC Volume Systems, Horizontal @Systems,(includes T2000/Ontario)
References
PATCH:11884187
PATCH:11883817
Attachments
This solution has no attachment