SAN Brocade FC switch - Encode Out Errors in porterrshow - How to identify bad SFP or FC Cable or FC device

Asset ID:	1-72-1016436.1
Update Date:	2017-10-20
Keywords:

Solution Type Problem Resolution Sure

Solution 1016436.1 : SAN Brocade FC switch - Encode Out Errors in porterrshow - How to identify bad SFP or FC Cable or FC device

Applies to:

Brocade 3800 Fabric Switch - Version Not Applicable to Not Applicable [Release N/A]
Brocade 12000 Fabric Switch - Version Not Applicable to Not Applicable [Release N/A]
Brocade 200E Switch - Version All Versions and later
Brocade 24000 Director - Version All Versions and later
Brocade 48000 Director - Version All Versions and later
All Platforms

Symptoms

Encode out errors in porterrshow

Loss of connectivity to a Host, Storage or another Switch can be caused by a faulty SFP or a cable.
The Switch error Log would show an error similar to the following:

2012/05/06-14:30:22, [FW-1424], 5681,, WARNING, SWITCH_1, Switch status changed from HEALTHY to MARGINAL
2012/05/06-14:30:22, [FW-1436], 5682,, WARNING, SWITCH_1, Switch status change contributing factor Marginal ports: 1 marginal ports. (Port(s) x )

Cause

Enc out errors are caused by noise outside of the frame. They usually mean that there is a bad GBIC (Giga Bit Interface Convertor) or SFP (Small Factor Plugin), FC cable or port on the other end. They should not be more than 1% of the rx and tx. There should not be more than a couple of these a day. If this is a real problem, they will increment at more than a couple an hour.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk/Tape Storage Area Networks

Solution

Plug the cable from that switch port into another open port on the switch. If the errors stop, the problem is the GBIC/SFP. -- If the errors continue --

Try a new FC cable -- If the errors continue --
Check the device on the other end

There are other values that can be analyzed more in depth before taking any action.

You can also search in Brocade Community for the following threads: PORTSHOW - tips and tricks and/or Address Error on E-port and/or CRC Errors on E_Port

portstatsclear portnumber --> can be used to clear stats and observe the counters again.

Identifying if SFP or the Cable is the Cause for Loss of Link :

- "enc out " errors alone imply primarily cable problem.

- "enc out " and "crc err " combination imply primarily GBIC/SFP problem.

- To find out if source or destination SFP is causing the error, Check the Output of "portshow x" where x is the port number.

- If the pair of "Lr_in " and "Ols_out " as well the "Lr_out " and "Ols_in " values are "quite" equal, it is a normal case.

- If one counter is significantly higher than the other, the link problems either "reached" the switch ("in" > "out") or are caused by the switch ("out" > "in").

- Note: If the "Ols_in" value is higher than the "Lr_out" one, then the "problem source" is, in most cases, more related to the attached device (sending those offline sequences) and the switch responds to them with a "link reset".

enc_out   - Encoding error outside of frames
crc err   - Frames with CRC errors
Lr_in     - Link reset In (primitive sequence), does not apply to FL_Port
Lr_out    - Link reset Out (primitive sequence), does not apply to FL_Port
Ols_in    - Offline reset in (primitive sequence), does not apply to FL_Port
Ols_out   - Offline reset out (primitive sequence), does not apply to FL_Port

Example:

From the portshow of an E-port (ISL link) , you need to replace first the SFP on the other end of this ISL first.

Free_buffer:       0          Address_err: 0
Overrun:           0          Lr_in:        66
Suspended:         0          Lr_out:       14
Parity_err:        0          Ols_in:       1
2_parity_err:      0          Ols_out:      7

Since "in" errors are more than "out", I suspect the errors are coming in from the other end.

The output of porterrshow can be divided into 2 areas:

1. Physical layer issues, these originate at the source and can propagate through fabrics.

enc_in: This counter increments when 8b/10b encoding errors are detected within a frame. enc_in errors are always detected on the ingress port.

crc_err: Indicates corruption within the frame. Always seen on ingress port but will be passed by the switch unaltered through the fabric.

enc_in and/or crc_err = Possible bad media (SFP, cable, patch panel)

Bad_eof: After a loss of synchronization error, continuous-mode alignment allows the receiver to re-establish word alignment at any point in the incoming bit stream
while the receiver is operational. If such a re-alignment occurs, detection of the resulting error condition is dependant upon higher level functions (eg: invalid CRC, missing EOF)
my take if you see bad_eof and crc incrementing, replace SFP a.s.a.p

too_long or too_short errors indicate an unreliable link

enc_out: 8b/10b encoding errors NOT associated with frames (IDLE, R_RDY, and various other primitives). This counter increments during speed negotiation prior to login. Locking a port to a speed supported by the end device can be used to isolate issues.
– Possible bad media (SFP, cable, patch panel)
– Can cause a performance problem due to buffer recovery

disc_c3: Class 3 frame has been discarded because it is not routable to a destination address
– Corrupted or not-online Destination ID (DID)
– Timeout exceeded (Condor ASIC hold time exceeded)
– Counter may increment when FC nodes and/or switches rapidly transition between online and offline; look at fabriclog –s output

2. Link errors point to point - do not traverse fabric.

Link failures - error conditions that cause a port to drop out of an active state
– Requires the reconnecting device to FLOGI back into fabric (No speed negotiation required, since the device does not lose synchronization)

Loss of sync - occur when bit and word synchronization on link is lost

Loss of signal – occur when light or an electrical signal is lost on a link
– Require connected device to renegotiate speed and FLOGI back into fabric

If you experience device connectivity and/or performance issues and rising link counters look for
– bad cables/SFPs/patch-panel connections
– repeating cycles of online/offline states in fabriclog -s output

Once you identify the suspects use, portstats64show and portstatsshow to zero down on the culprit.

See also: Doc ID 1020239.1 Brocade SAN Switches: Investigating switch outputs for hardware errors

@ Previously Published As
STKKB6296

Attachments

This solution has no attachment