Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2237466.1
Update Date:2017-05-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  2237466.1 :   X5-2/X6-2 Node is Accessible but Freezes Intermittently with "Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c" Raised  


Related Items
  • Big Data Appliance X5-2 Starter Rack
  •  
  • Big Data Appliance X6-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
 Upgrade to UEK2 kernel to 2.6.39-400.280.1
References


Created from <SR 3-14051033798>

Applies to:

Big Data Appliance X5-2 Starter Rack - Version All Versions and later
Big Data Appliance X6-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

The initial symptom is that the node is accessible but freezes intermittently requiring reboot. This can happen on both X5-2/X6-2.

1. The console reports errors like:

sd 6:2:0:0: [sda] Unhandled error code
sd 6:2:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
sd 6:2:0:0: [sda] CDB: Read(10): ** ** ** ** ** ** ** ** ** **

2. The disk historical information collected via:

# MegaCli64 -AdpEventLog -IncludeDeleted -f <filename> -a0 -nolog

shows: 

# grep "Fatal firmware" <filename>
  
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c Sun Jul 24 2016
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Event Description: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c Sun Jan 15 2017

4. The "bdadiag snapshot" firmware logging in megacli64-FwTermLog.out shows:

T0: C0:EVT#11047-01/15/17 12:15:20: 15=Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
T0: C0:EVT#11048-01/15/17 12:15:20: 15=Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c

5. The filesystem may go into a readonly state.

6. Errors like the following may also be seen:

Aspen Fatal firmware error: Line 1259 in ../../raid/2108vI2o.c

  

To help troubleshoot the issue, enhance the data collected by enabling the firmware term log to persist across reboots.  To do so:

1. Verify the current battery status for the fwtermlog setting,as 'root' with:

# MegaCli64 -fwtermlog -bbuget -a0 

If the battery mode is off for the fwtermlog, output looks like: 

# MegaCli64 -fwtermlog -bbuget -a0
  
Battery is OFF for TTY history on Adapter 0
Exit Code: 0x00 

2. If the battery mode is off for the fwtermlog, turn on use of the battery for maintaining the fwtermlog aross cell reboots and power cycles:

# MegaCli64 -fwtermlog -bbuon -a0 

Output is like: 

# MegaCli64 -fwtermlog -bbuon -a0
  
Battery is set to ON for TTY history on Adapter 0

Cause

The above output like:

11352 -- 29 Jan 2017 14:23:32 - - Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c 

or

Aspen Fatal firmware error: Line 1259 in ../../raid/2108vI2o.c 

indicates a bug found in systems running an older megasas driver. UEK2 2.6.39-400.280.1 contains the driver fixes for this issue.

Bug 25872346 - server hangs with 15=Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c
Bug 23102145 - Aspen Fatal firmware error: Line 1259 in ../../raid/2108vI2o.c

Note: Fatal firmware errors like: "Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c" may be prevalent on Node 5 of the BDA when running BDD there. However there is no direct correlation between the issue and BDD. On the BDD node all data disks are combined into a single RAID partition. This leads to a pattern of disk use that is very different from a normal BDA node (possibly even using a different set of disk/controller commands). This different pattern of disk use could trigger a particular firmware bug that does not show up in normal BDA use (or much less frequently).

Solution

The fix for the internal bugs addressing: Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c and/or Aspen Fatal firmware error: Line 1259 in ../../raid/2108vI2o.c  is to upgrade the kernel to UEK2 2.6.39-400.280.1 which contains the driver fixes for this issue. On the BDA UEK upgrades are supported as long as the major UEK version is the same i.e. UEK2 to UEK2 and UEK4 to UEK4.  The UEK2 upgrade to 2.6.39-400.280 would apply therefore to BDA V4.6 and lower BDA Versions which support a UEK2 kernel.

Upgrade to UEK2 kernel to 2.6.39-400.280.1

To upgrade the UEK2 kernel to 2.6.39-400.280.1 follow the steps in: How To Upgrade a Kernel on BDA V4.2 and Higher/V4.1(Doc ID 2033797.1).

The bugs addressed by UEK2 kernel upgrade are:

For X5-2: Bug 23102145 - Aspen Fatal firmware error: Line 1259 in ../../raid/2108vI2o.c cause r/o filesys

Bug 23536267 - megaraid_sas for UEK2: Update threshold based reply post host index register.

Bug 25872346 - server hangs with 15=Fatal firmware error: Line 1218 in ../../raid/2108vI2o.c.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback