Asset ID: |
1-72-1362174.1 |
Update Date: | 2015-09-01 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1362174.1
:
Exadata Compute Node / Exalogic RAID Controller Failed
Related Items |
- Linux OS
- Exalogic Elastic Cloud X3-2 Hardware
- Exadata Database Machine V2
|
Related Categories |
- PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
- _Old GCS Categories>ST>Server>Engineered Systems>Exadata>Hardware
|
In this Document
Created from <SR 3-4598608111>
Applies to:
Linux OS - Version Oracle Linux 5.0 to Oracle Linux 5.0 [Release OL5]
Exadata Database Machine V2 - Version All Versions and later
Exalogic Elastic Cloud X3-2 Hardware - Version X3 and later
Information in this document applies to any platform.
Symptoms
This can be seen on any Exadata or Exalogic system.
- On the affected compute node the filesystems become read only.
- It's not possible to remount them as read/write :
# mount -o remount,rw /
# mount: block device /dev/sda1 is write-protected, mounting read-only
- MegaCLI64 commands do not work correctly:
# /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0
User specified controller is not present.
Failed to get CpController object.
Exit Code: 0x01
- The console log reports messages like :
ADP_RESET_GEN2: retry time=3e8, hostdiag=a4
megaraid_sas: FW was restarted successfully, initiating next stage...
megaraid_sas: HBA recovery state machine, state 2 starting...
printk: 9 messages suppressed.
printk: 9 messages suppressed.
megaraid_sas: out: controller is not in ready state
megasas: waiting_for_outstanding: after issue OCR.
megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000
megaraid_sas: pending commands remain even state = f0000000
megaraid_sas: pending commands remain even after reset handling.
megasas[0]: Dumping Frame Phys Address of all pending cmds in FW
megasas[0]: Total OS Pending cmds : 0
megasas[0]: 64 bit SGLs were sent to FW
megasas[0]: Pending OS cmds in FW :
megasas[0]: Frame addr :0x37f22800 : <3>megasas[0]: frame count : 0x1, Cmd : 0x2, Tgt id : 0x0, lba lo : 0x167727f, lba_hi : 0x0, sense_buf addr : 0x37f20500,sge count : 0x1
.....
0x7f77f400 : <3>megasas[0]: Dumping Done.
megasas: failed to do reset
sd 0:2:0:0: megasas: RESET -1140663 cmd=2a retries=0
megasas: cannot recover from previous reset failures
sd 0:2:0:0: megasas: RESET -1140663 cmd=2a retries=0
megasas: cannot recover from previous reset failures
sd 0:2:0:0: timing out command, waited 360s
end_request: I/O error, dev sda, sector 23119751
printk: 8 messages suppressed.
Buffer I/O error on device sda1, logical block 2889961
lost page write due to I/O error on sda1
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
...
_journal_remove_journal_head: freeing b_committed_data
__journal_remove_journal_head: freeing b_committed_data
journal commit I/O error
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
Cause
This is likely to be a failure of the LSI RAID Controller.
Solution
Hardware SR needed to replace the LSI controller on the affected compute node ( 6GIGABIT SAS RAID PCI EXPRESS HBA, B4 ASIC ), then restart the compute node.
Attachments
This solution has no attachment