Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1910008.1
Update Date:2014-07-22
Keywords:

Solution Type  Problem Resolution Sure

Solution  1910008.1 :   Exadata CELLSRV process failing to start due to bad Infiniband Mellanox Card  


Related Items
  • Exadata Database Machine V2
  •  
  • Exadata X4-2 Hardware
  •  
  • Exadata X4-8 Hardware
  •  
  • Exadata X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




In this Document
Symptoms
Cause
Solution


Applies to:

Exadata Database Machine V2 - Version All Versions to All Versions [Release All Releases]
Exadata X3-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X4-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X4-8 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

  • remote console looping on message "mlx4_cmd_wait command 0x24 wait_for_completion_timeout"
  • cellsrv process failing to start reporting error "CELL-01533 Unable to validate IP addresses from cellinit.ora"
  • ifconfig bondib0 reports the IP address but RDS functionality not working on the storage cell.  Commands like:
    • rds-info will not report the HW address for local and remote devices 
    • rds-ping will not work against any other host. Example:  # rds-ping -c 5 <IB Ip address of other device>

Cause

 The problem was caused by a failed Mellanox Infiband card.  That was validated by running lspci -vvv command.  The part related to the Mellanox card, there was not information about the product. None of the attributes like Product Name, Product version, etc were present:

   

0d:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
    Subsystem: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
    Physical Slot: 1
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 256 bytes
    Interrupt: pin A routed to IRQ 24
    Region 0: Memory at ddd00000 (64-bit, non-prefetchable) [size=1M]
    Region 2: Memory at dc800000 (64-bit, prefetchable) [size=8M]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [48] Vital Product Data
        Not readable

  

Compared with a system where card is present, RDS services working and CELLSRV service working:

 

   

0d:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)
    Subsystem: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
    Physical Slot: 1
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 256 bytes
    Interrupt: pin A routed to IRQ 24
    Region 0: Memory at ddd00000 (64-bit, non-prefetchable) [size=1M]
    Region 2: Memory at dc800000 (64-bit, prefetchable) [size=8M]
    Capabilities: [40] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [48] Vital Product Data
        Product Name: qFalcon QDR
        Read-only fields:
            [PN] Part number: 375-3696-01          
            [EC] Engineering changes: 50
            [SN] Serial number: 1388FMH-1032500720      
            [V0] Vendor specific: PCIe Gen2 x8    
            [RV] Reserved: checksum good, 0 byte(s) reserved

  


Solution

 Replace the Mellanox Infiniband card.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback