Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2387540.1
Update Date:2018-04-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  2387540.1 :   Oracle ZFS Storage PCI Slot Faulted for IXGBE 10GB Network Card  


Related Items
  • Oracle ZFS Backup Appliance
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS
  •  




In this Document
Symptoms
Cause
Solution


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Commands required are not accessible by Customer
Created from <SR 3-16803993953>

Applies to:

Oracle ZFS Backup Appliance - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

The BUI indicates that two IXGBE ports are down . When we mark them as repaired they return to the faulted state . We discovered that the PCI slot was added to the retire_store file . After removing this file and rebooting the entry returned. We noticed during the boot that it stated there were entries in retire_store .

 

DC-ZFSBA-H2:maintenance chassis-000> select slot show
Slots:

LABEL STATE MANUFACTURER MODEL SERIAL
slot-000 PCIe 0 faulted Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-001 PCIe 1 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC07WR
slot-002 PCIe 2 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC082F
slot-003 PCIe 3 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-004 PCIe 4 ok Oracle 2x10Gb Optical Ethernet unknown
slot-005 Cluster Card ok Oracle Fishworks CLUSTRON 200 unknown
slot-006 PCIe 9 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-007 PCIe 5 faulted Oracle 2x10Gb Optical Ethernet unknown
slot-008 PCIe 6 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-009 PCIe 7 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC07Y1
slot-010 PCIe 8 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC07XD

 

NOTE: Because this is a dual-ported card we are seeing IXGBE2 and IXGBE3 as faulted .

DC-ZFSBA-H2:maintenance> problems show
Problems:

COMPONENT DIAGNOSED TYPE DESCRIPTION
problem-000 2018-2-5 18:33:54 Major Defect Service
svc:/appliance/kit/network/datalink:ixgbe2
failed - a start, stop or
refresh method failed.

problem-001 2018-2-5 19:41:05 Major Fault The diagnosis engine
encountered telemetry from
the listed devices for which
it was unable to perform a
diagnosis - all hypotheses
were disproved.

problem-002 2018-1-12 20:10:37 Critical Fault A problem was detected for a
PCIEX device.

problem-003 2018-2-5 18:33:56 Critical Fault A problem was detected for a
PCIEX device.

problem-004 2018-2-5 19:38:17 Critical Fault A problem was detected for a
PCIEX device.

problem-005 2018-2-5 20:05:54 Major Fault The diagnosis engine
encountered telemetry from
the listed devices for which
it was unable to perform a
diagnosis - all hypotheses
were disproved.

problem-006 2018-2-28 07:38:24 Major Fault The diagnosis engine
encountered telemetry from
the listed devices for which
it was unable to perform a
diagnosis - all hypotheses
were disproved.

 

Noticed the following in debug.sys:

DC-ZFSBA-H2# tail debug.sys
Mar 6 09:01:46 DC-ZFSBA-H2 last message repeated 3 times
Mar 6 09:01:49 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: Failed to initialize adapter
Mar 6 09:01:55 DC-ZFSBA-H2 last message repeated 2 times
Mar 6 09:01:58 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe3: Failed to initialize adapter
Mar 6 09:02:01 DC-ZFSBA-H2 last message repeated 1 time
Mar 6 09:02:04 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: Failed to initialize adapter
Mar 6 09:02:10 DC-ZFSBA-H2 last message repeated 2 times
Mar 6 09:02:13 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe3: Failed to initialize adapter
Mar 6 09:02:16 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: Failed to initialize adapter
Mar 6 09:02:19 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe3: Failed to initialize adapter

 

Checking /etc/devices/retire_store

-bash-4.4$ strings retire_store
/pci@75,0/pci8086,340a@3/pci108e,7b11@0,1

Cause

 The ixgbe cards have two removable transceivers. A bad Transceiver can cause the IXGBE  PCI card to fail to initialize and appear to bad. This can trigger the system to see this as a problem requiring the PCI slot to be added to the /etc/devices/retire_store.

When a replacement card is shipped these transceivers are not included and need to be swapped from the old card to the new card . To replace the transceivers you need to use the part number for the appropriate transceiver .

Solution

 This was found to be a bad transceiver on the IXGBE card .

When a replacement card is shipped it does not include the tranceivers. They are required to be transferred from the old card to the new card . For this reason sending a replacement IXGBE card will not resolve the problem . In This situation we had to ship replacement tranceivers part # 530-4440-01 .


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback