Asset ID: |
1-73-1021661.1 |
Update Date: | 2016-03-07 |
Keywords: | |
Solution Type
FAB (standard) Sure
Solution
1021661.1
:
J4400 SIM cards randomly failing due to heartbeat timeout.
Related Items |
- Sun Storage 7410 Unified Storage System
- Sun Storage J4400 Array
- Sun Storage 7310 Unified Storage System
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun FAB
|
PreviouslyPublishedAs
273189
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FABs available to Partners and Internals only.
Applies to:
Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage J4400 Array - Version All Versions to All Versions [Release All Releases]
All Platforms
__________
BUG 15541083
Affected Parts:
375-3584 - J4400 SAS Interface Module (SIM)
Symptoms
This SIM failure is indicated by a blue LED on the failed SIM (visible from the rear of the chassis). The failure will also be visible by viewing the number of paths associated with a particular JBOD in the "BUI Maintenance->Hardware" view. JBODs with a failed SIM will report only 1 path instead of the usual 2 paths. The combination of a lit blue LED on the SIM and the missing path in the "Maintenance->Hardware" view is the definitive symptom of this condition. Additionally, the Back view of the JBOD chassis will show the failed SIM as missing. At the time of failure, the appliance will log an alert as in the below example;
The component 'SIM (0|1)' has been removed from chassis 'XYZ'
Impact
J4400 SIM cards randomly failing due to heartbeat timeout causes one of the two SIM modules in a JBOD to go offline, indicated by a blue light on failed SIM. Once failed, the JBOD has only one path available to connect the appliance head to the disks. Re-seating the failed SIM clears this issue.
Changes
Contributing Factors
The above listed products running SIM firmware less than 3R24 are subject to this issue.
The SIM failure condition is sporadic in nature. Customers with larger configurations tend to see this issue more than smaller configurations because each additional JBOD adds additional exposure. Among large configurations, some customers see this problem more often than others. Because manual intervention is required to clear the failure (re-seating the SIM module), customers who don't notice this failure tend to stack up failures on multiple JBODs over time.
Cause
Root Cause
The SIM failure is caused by a missed heartbeat signal. The SIM that detects the heartbeat timeout takes the action of disabling it's peer (assuming that it is hung or otherwise non-functional). See CR 6803801 for more details. Sun engineering has very strong evidence to suggest that upgrading the SIM firmware to 3R24 resolves this issue.
Solution
Implementation: Reactive
Workaround
Manually re-seat the failed SIM card. This may be done while the system is running, but care should be taken not to disturb the cabling to the remaining SIM or to other JBODs in the chain.
Resolution
Firmware 3R24 must be loaded on each attached JBOD SIM card in order to resolve the "Blue Light Special" issue. Firmware 3R24 is bundled with Appliance SW 2010.Q1 or later and is automatically updated once the Appliance SW is installed.
For installing Sun Storage 7000 Software Update 2010.Q1.1.0 or later please refer to...
Reference DocID 2021771.1
Identification of Affected Parts (how to)
As noted in the "Symptoms" section, SIMs status is indicated by the number of paths associated with a JBOD chassis. The Blue Light on the rear of a SIM module indicates a failure.
References
Bug Id: 6803801
Contributor/submitter: cliff.thomas@oracle.com
Responsible Engineer: zuheir.totari@oracle.com
Responsible Manager: Renee.Bennett@oracle.com
Services Knowledge Engineer: Joe.Davis@oracle.com
Internal Eng Business Unit Group
NWS (Network Storage)
Internal Sun Alert & FAB Admin Info
20-Nov-2009: Completed draft and sent to Extended Review.
24-Nov-2009: No feedback from Ext Rvw - sending to Publish.
23-Jun-2010: Major rewrite of the Solution section.
Attachments
This solution has no attachment