![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1514687.1 : Linux kernel faulty on unmap_single on Exadata compute nodes
Created from <SR 3-6517363331> Applies to:Exadata Database Machine X2-2 Full Rack - Version All Versions to All Versions [Release All Releases]Exadata Database Machine X2-2 Half Rack - Version All Versions to All Versions [Release All Releases] Exadata Database Machine X2-2 Qtr Rack - Version All Versions to All Versions [Release All Releases] Exadata Database Machine V2 - Version All Versions to All Versions [Release All Releases] Oracle Exadata Hardware - Version 11.2.0.2 to 11.2.0.3 [Release 11.2] x86 64 bit Symptoms
Buffer I/O error on device sdb, logical block 0
scsi 13:0:0:0: rejecting I/O to dead device Buffer I/O error on device sdb, logical block 0 unable to read partition table RDS/IB: connected to 172.16.3.16 version 3.1 RPC: bad TCP reclen 0x00000000 (non-terminal) RPC: bad TCP reclen 0x00000000 (non-terminal) RPC: bad TCP reclen 0x00000000 (non-terminal) RPC: bad TCP reclen 0x00000000 (non-terminal) Unable to handle kernel paging request at ffff80fd41180640 RIP: [<ffffffff8007e582>] unmap_single+0x24/0xc6 PGD 0 Oops: 0000 [1] SMP last sysfs file: /class/infiniband_mad/umad0/port CPU 4 Modules linked in: sr_mod cdrom oracleacfs(PFU) oracleadvm(PFU) oracleoks(PU) nfs lockd fscache nfs_acl krg_8_5_0_3005(PFU) ipmi_poweroff ipmi_watchdog ipmi_devintf ipmi_si(U) ipmi_msghandler sunrpc bonding(U) iscsi_tcp libiscsi scsi_transport_iscsi rds(U) ib_ipoib(U) ipoib_helper(U) ipv6 xfrm_nalgo crypto_api rdma_ucm(U) rdma_cm(U) ib_ucm(U) ib_uverbs(U) ib_umad(U) ib_cm(U) iw_cm(U) ib_addr(U) ib_sa(U) dm_mirror dm_log dm_multipath scsi_dh dm_mod video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac fuse(U) parport_pc lp parport mlx4_ib(U) ib_mad(U) ib_core(U) sg joydev shpchp ahci mlx4_core(U) igb i2c_i801 i2c_core pcspkr usb_storage ata_piix libata cciss(U) megaraid_sas(U) sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 11092, comm: oracle Tainted: PF 2.6.18-128.1.16.0.1.el5 #1 RIP: 0010:[<ffffffff8007e582>] [<ffffffff8007e582>] unmap_single+0x24/0xc6 RSP: 0018:ffff81048fb23cd8 EFLAGS: 00010202 RAX: ffff8100190c1000 RBX: 00000000ceba0300 RCX: 0000000000000000 RDX: ffffffffa5017ec8 RSI: 41e9f62821005000 RDI: ffff81127ff18870 RBP: 00083d4ea5017ec8 R08: ffff810009000000 R09: 00007f0000000000 R10: 0000000000000000 R11: 0000000000000046 R12: 0000000000000000 R13: 000000000000003d R14: ffff81127ff18870 R15: 0000000000000000 FS: 00002b9ac7839e60(0000) GS:ffff810140de5d40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffff80fd41180640 CR3: 0000000fd9ac0000 CR4: 00000000000006a0 Process oracle (pid: 11092, threadinfo ffff81048fb22000, task ffff810a66e4a820) Stack: 0000000000000286 ffff8103561da000 0000000000000000 ffffffff80150a56 ffff8112422b7840 ffff81126dcc3cc0 ffff8112422b7858 ffff81125642d1c0 000000000000014a ffffffff8848da6f ffff8112422b7840 ffff81126dcc3cc0 Call Trace: [<ffffffff80150a56>] swiotlb_unmap_sg+0xba/0x126 [<ffffffff8848da6f>] :rds:__rds_ib_teardown_mr+0x3d/0xa3 [<ffffffff8848dca1>] :rds:rds_ib_flush_mr_pool+0x1cc/0x2c7 [<ffffffff8848ddb6>] :rds:rds_ib_flush_mrs+0x1a/0x2e [<ffffffff8848475f>] :rds:rds_release+0x70/0xe5 [<ffffffff80055562>] sock_release+0x19/0x9a [<ffffffff8005575d>] sock_close+0x2c/0x30 [<ffffffff80012e22>] __fput+0xae/0x198 [<ffffffff80023de6>] filp_close+0x5c/0x64 [<ffffffff8001e333>] sys_close+0x88/0xbd [<ffffffff885e7b66>] :krg_8_5_0_3005:_close_origcode+0x78/0x1e2 [<ffffffff885e3b6a>] :krg_8_5_0_3005:_close_postcode+0x0/0x229 [<ffffffff885df7a1>] :krg_8_5_0_3005:syscall_wrappers_generic_flow+0x1f6/0x514 [<ffffffff885e7aee>] :krg_8_5_0_3005:_close_origcode+0x0/0x1e2 [<ffffffff885dfd8c>] :krg_8_5_0_3005:SYS_close_common_wrap+0x46/0xed [<ffffffff885e1588>] :krg_8_5_0_3005:SYS_close_wrap64+0x25/0x41 [<ffffffff8005e28d>] tracesys+0xd5/0xe0
# imageinfo
Kernel version: 2.6.18-128.1.16.0.1.el5 #1 SMP Tue Jun 30 16:48:30 EDT 2009 x86_64 Image version: 11.2.2.4.2.111221 Image activated: 2012-04-15 13:12:42 -0500 Image status: success System partition on device: /dev/sda1 # rpm -qa|grep ofa ofa-2.6.18-128.1.16.0.1.el5-1.4.2-14
CauseA couple of bugs were opened for the kernel crash. <Bug 13034913>, which was confirmed as duplicate of <bug 11847244>.
SolutionThe fix was delivered in OEL 5.5. However, it cannot be backported to OEL 5.3. Upgrade the kernel and ofa to this version or higher: kernel 2.6.18-194.3.1.0.3.el5 In some cases, the storage software was upgraded from old version. The kernel stays old in compute nodes, while it was updated in storage cells. It's possible for compute nodes to get kernel/ofa update from storage cell. Please refer to <note 1284070.1>.
References<NOTE:1284070.1> - Updating key software components on database hosts to match those on the cells<BUG:13034913> - EXADATA LONDVOP0101 SYSTEM PANIC <BUG:11847244> - NODE APPEARS HUNG PRIOR TO REBOOT Attachments This solution has no attachment |
||||||||||||
|