Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2093437.1
Update Date:2016-01-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  2093437.1 :   EXADATA : CRS Does Not Come Up in Active-Active IB Mode  


Related Items
  • Exadata X4-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




Created from <SR 3-11913768041>

Applies to:

Exadata X4-2 Hardware - Version All Versions to All Versions [Release All Releases]
Linux x86-64

Symptoms

In an active/active infiniband bonding set up, if one of the IB link is down, CRS on the node may not come up.

Changes

Default deployment of the Exadata X4-2 and later database machines, the InfiniBand bonding is set up in the active/active mode on all database servers and all storage cells.

Cause

The node is working in an active-active bond mode on the IB ports: 

/opt/oracle.cellos/ORACLE_CELL_OS_IS_SETUP

  active-bond-ib=yes

In this mode if one of the ports in the HCA is not working, can potentially lead to cluster communication issues.

Check with ibstat that both ports are running when this issue is seen.

[root@xxxx ~]# ibstat
CA 'mlx4_0'
CA type: MT4099
Number of ports: 2
Firmware version: 2.11.1280
Hardware version: 0
Node GUID: 0x0010e00001432320
System image GUID: 0x0010e00001432323
Port 1:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 3
LMC: 0
SM lid: 2
Capability mask: 0x02514868
Port GUID: 0x0010e00001432321
Link layer: IB
Port 2:
State: Down                   <<<<<<<<<<<<<<<<
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x02514868
Port GUID: 0x0010e00001432322
Link layer: IB

 

/var/log/messages will have below entries

Dec 30 08:45:02 xxxxxxx kernel: CMA: ffff881e29eb3800: cma_query_handler: bad status -22 from path query
Dec 30 08:45:03 xxxxxxx kernel: CMA: ffff883efbbb4c00: cma_query_handler: bad status -22 from path query
Dec 30 08:45:04 xxxxxxx kernel: CMA: ffff881e21c6e800: cma_query_handler: bad status -22 from path query
Dec 30 08:45:05 xxxxxxx kernel: CMA: ffff881f1bc9c000: cma_query_handler: bad status -22 from path query
Dec 30 08:45:05 xxxxxxx kernel: CMA: ffff881f1bc9c000: cma_query_handler: bad status -22 from path query

 

Clusterware alert log reports unable to discover voting files

[cssd(54701)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1.0.1/grid/log/xxxxxxx/cssd/ocssd.log
2015-12-30 08:45:46.134:
[cssd(54701)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/12.1.0.1/grid/log/xxxxxxx/cssd/ocssd.log
2015-12-30 08:46:00.838:

 

 

Solution

Check IB stat output whether both links are Active. Issue most likely attributable to suspected IB cable fault/loose 

* Reseat the IB cable

* Replace faulty IB cable


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback