Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2195659.1
Update Date:2018-02-04
Keywords:

Solution Type  Problem Resolution Sure

Solution  2195659.1 :   Oracle ZFS Storage Appliance: Alert "The cable between the Ethernet ports of each controller is down"  


Related Items
  • Exalogic Elastic Cloud X3-2 Eighth Rack
  •  
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Exalogic Elastic Cloud X4-2 Quarter Rack
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun ZFS Backup Appliance
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Exalogic Elastic Cloud X4-2 Quarter Rack
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Exalogic Elastic Cloud X3-2 Eighth Rack
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-13472509061>

Applies to:

Exalogic Elastic Cloud X4-2 Quarter Rack - Version X4 to X4 [Release X4]
Exalogic Elastic Cloud X3-2 Eighth Rack - Version X5 to X5 [Release X5]
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Backup Appliance - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

Cluster nodes generating the alert "The cable between the Ethernet ports of each controller is down. Severity: Critical ".

The clustron card on both the nodes have an LED lit Amber and each clustron card is marked as 'Faulted'.

And problems/alerts exist in both the nodes.

There was no Takeover or Failback initiated.

 

Node01:> maintenance problems select problem-000 ls
Properties:
                          uuid = a6854718-bdc7-xxx-xxxx-xxxxxxx
                          code = AK-8002-9M
                     diagnosed = 2016-8-4 15:47:18
                   phoned_home = ----
                      severity = Critical
                          type = Fault
                           url = http://support.oracle.com/msg/AK-8002-9M
                   description = The cable between the Ethernet ports of each
                                 controller is down.
                        impact = Communication with the cluster peer via the
                                 Ethernet port is lost.
                      response = None.
                        action = Ensure the cable between the Ethernet ports of
                                 each controller is properly seated and
                                 undamaged. If the problem persists, replace
                                 the cable and/or contact your vendor for
                                 support.

 

Node01:maintenance chassis-000 slot> ls
Slots:

LABEL STATE MANUFACTURER MODEL SERIAL

...
slot-002 PCIe 3 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-003 PCIe 4 ok Oracle 2x10Gb Optical Ethernet unknown
slot-004 Cluster Card faulted Oracle Fishworks CLUSTRON 200 unknown <<<<<<<<<<<<<<

Node02:maintenance chassis-000 slot> ls
Slots:

LABEL STATE MANUFACTURER MODEL SERIAL
...
slot-002 PCIe 3 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-003 PCIe 4 ok Oracle 2x10Gb Optical Ethernet unknown
slot-004 Cluster Card faulted Oracle Fishworks CLUSTRON 200 unknown <<<<<<<<<<<<<<

 

FMA
----

--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Oct 12 10:33:18 2bffa4d3-f228-67f3-d240-ca498bfd354e AK-8002-9M Critical

Problem Status : open
Diag Engine : ak-diagnosis / 1.0
System
Manufacturer : unknown
Name : unknown
Part_Number : unknown
Serial_Number : unknown

System Component
Manufacturer : Oracle-Corporation
Name : SUN-FIRE-X4470-M2-SERVER
Part_Number : 7050917
Serial_Number : xxxxxxxx
Host_ID : 00000000
Server_Name : xxxxxxx

----------------------------------------
Suspect 1 of 1 :
Fault class : fault.ak.xmlrpc.cluster.link.dlpi.down
Certainty : 100%
Affects : dev:////pci@0,0/pci8086,3a48@1c,4/pci111d,8039@0/pci111d,8039@3/pci108e,7b07@0
Status : faulted but still in service

FRU
Location : "PCIE_CC"

  

Changes

No Recent Changes

 

Cause

It is widely seen on the nodes running OS8.6.x (2013.1.6.x) firmware.

Bug 23092294 (clustron component fault shows up in problems while links are still active)

 

Solution

Recommended action plan / Workaround:

Have a physical check on the ZFSSA nodes.

1) Verify the LED's on the clusteron cards / ports of both the nodes

2) Mostly the middle port lit amber.

3) Reseat the cable on both the ends/ports

     IMPORTANT : No two clusteron cables / ports should be disconnected or reseated at same time.

4) Verify the LED status - Should turn green

5) Login to BUI and navigate to 'Maintenance ---> problems', select the problem and click 'markrepaired'

 

If the problem persists, Engage Oracle Support via Service Request.

 

The clustron link status from 'akd' shows one of the link failed / timeout :

In general, it would be the 'dlpi:0' which is the link between the middle port of clustron cards

> ::ak_cluster -v
CLUSTER

address: 8172808
state: CLUSTERED
CIO links: 3
CIO channels: 3
CIO flags: ENABLED
clusterable: TRUE

LOCK

owner: -
flags: 0
readers: 0

LINKS

address: a031ac8
name: clustron_uart:0
state: ACTIVE
capabilities: AUTOHEARTBEAT
flags: COMMITTED | CONF | HBVALID
remote ASN: eb5b157c-db73-4ade-b27d-d481aa381251

address: d367208
name: clustron_uart:1
state: ACTIVE
capabilities: AUTOHEARTBEAT
flags: COMMITTED | CONF | HBVALID
remote ASN: eb5b157c-db73-4ade-b27d-d481aa381251

address: d367088
name: dlpi:0
state: TIMEDOUT <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
capabilities: -
flags: COMMITTED | CONF
remote ASN: eb5b157c-db73-4ade-b27d-d481aa381251

CHANNELS

CIO channel: 0
address: 13db47b0
state: ESTABLISHED
connection: 0 <==> 0

CIO channel: 1
address: 1461e170
state: ESTABLISHED
connection: 1 <==> 1

CIO channel: 2
address: 146098e0
state: CLOSED
connection: 2 <==> -1

------

No Need to replace PCIe / clusteron card

 

 

References

<NOTE:1542550.1> - Sun Storage 7000 Unified Storage System: Communication with the cluster peer via a cluster interconnect link has been lost
<NOTE:2081179.1> - Oracle ZFS Storage Appliance : How to Configure Cluster Cabling Correctly
<BUG:23092294> - CLUSTRON COMPONENT FAULT SHOWS UP IN PROBLEMS WHILE LINKS ARE STILL ACTIVE
<NOTE:2200950.1> - How to replace a cluster interconnect link In A Oracle ZFS Storage ZS3, ZS4, ZS5 & Sun Storage 7000 Series

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback