| Asset ID: |
1-72-2387540.1 |
| Update Date: | 2018-04-17 |
| Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2387540.1
:
Oracle ZFS Storage PCI Slot Faulted for IXGBE 10GB Network Card
| Related Items |
- Oracle ZFS Backup Appliance
|
| Related Categories |
- PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Commands required are not accessible by Customer
Created from <SR 3-16803993953>
Applies to:
Oracle ZFS Backup Appliance - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)
Symptoms
The BUI indicates that two IXGBE ports are down . When we mark them as repaired they return to the faulted state . We discovered that the PCI slot was added to the retire_store file . After removing this file and rebooting the entry returned. We noticed during the boot that it stated there were entries in retire_store .
DC-ZFSBA-H2:maintenance chassis-000> select slot show
Slots:
LABEL STATE MANUFACTURER MODEL SERIAL
slot-000 PCIe 0 faulted Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-001 PCIe 1 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC07WR
slot-002 PCIe 2 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC082F
slot-003 PCIe 3 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-004 PCIe 4 ok Oracle 2x10Gb Optical Ethernet unknown
slot-005 Cluster Card ok Oracle Fishworks CLUSTRON 200 unknown
slot-006 PCIe 9 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-007 PCIe 5 faulted Oracle 2x10Gb Optical Ethernet unknown
slot-008 PCIe 6 ok Sun Microsystems, Inc. Dual Port QDR IB HCA M2 unknown
slot-009 PCIe 7 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC07Y1
slot-010 PCIe 8 ok Sun Microsystems, Inc. Dual 4x6Gb External SAS-2 HBA 465769T+1340TC07XD
NOTE: Because this is a dual-ported card we are seeing IXGBE2 and IXGBE3 as faulted .
DC-ZFSBA-H2:maintenance> problems show
Problems:
COMPONENT DIAGNOSED TYPE DESCRIPTION
problem-000 2018-2-5 18:33:54 Major Defect Service
svc:/appliance/kit/network/datalink:ixgbe2
failed - a start, stop or
refresh method failed.
problem-001 2018-2-5 19:41:05 Major Fault The diagnosis engine
encountered telemetry from
the listed devices for which
it was unable to perform a
diagnosis - all hypotheses
were disproved.
problem-002 2018-1-12 20:10:37 Critical Fault A problem was detected for a
PCIEX device.
problem-003 2018-2-5 18:33:56 Critical Fault A problem was detected for a
PCIEX device.
problem-004 2018-2-5 19:38:17 Critical Fault A problem was detected for a
PCIEX device.
problem-005 2018-2-5 20:05:54 Major Fault The diagnosis engine
encountered telemetry from
the listed devices for which
it was unable to perform a
diagnosis - all hypotheses
were disproved.
problem-006 2018-2-28 07:38:24 Major Fault The diagnosis engine
encountered telemetry from
the listed devices for which
it was unable to perform a
diagnosis - all hypotheses
were disproved.
Noticed the following in debug.sys:
DC-ZFSBA-H2# tail debug.sys
Mar 6 09:01:46 DC-ZFSBA-H2 last message repeated 3 times
Mar 6 09:01:49 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: Failed to initialize adapter
Mar 6 09:01:55 DC-ZFSBA-H2 last message repeated 2 times
Mar 6 09:01:58 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe3: Failed to initialize adapter
Mar 6 09:02:01 DC-ZFSBA-H2 last message repeated 1 time
Mar 6 09:02:04 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: Failed to initialize adapter
Mar 6 09:02:10 DC-ZFSBA-H2 last message repeated 2 times
Mar 6 09:02:13 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe3: Failed to initialize adapter
Mar 6 09:02:16 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe2: Failed to initialize adapter
Mar 6 09:02:19 DC-ZFSBA-H2 ixgbe: [ID 611667 kern.warning] WARNING: ixgbe3: Failed to initialize adapter
Checking /etc/devices/retire_store
-bash-4.4$ strings retire_store
/pci@75,0/pci8086,340a@3/pci108e,7b11@0,1
Cause
The ixgbe cards have two removable transceivers. A bad Transceiver can cause the IXGBE PCI card to fail to initialize and appear to bad. This can trigger the system to see this as a problem requiring the PCI slot to be added to the /etc/devices/retire_store.
When a replacement card is shipped these transceivers are not included and need to be swapped from the old card to the new card . To replace the transceivers you need to use the part number for the appropriate transceiver .
Solution
This was found to be a bad transceiver on the IXGBE card .
When a replacement card is shipped it does not include the tranceivers. They are required to be transferred from the old card to the new card . For this reason sending a replacement IXGBE card will not resolve the problem . In This situation we had to ship replacement tranceivers part # 530-4440-01 .
Attachments
This solution has no attachment