Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1016436.1
Update Date:2017-10-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  1016436.1 :   SAN Brocade FC switch - Encode Out Errors in porterrshow - How to identify bad SFP or FC Cable or FC device  


Related Items
  • Brocade 48000 Director
  •  
  • Brocade 5100 Switch
  •  
  • Brocade 24000 Director
  •  
  • Brocade 200E Switch
  •  
  • Brocade 3200 Fabric Switch
  •  
  • Brocade 3800 Fabric Switch
  •  
  • Brocade 12000 Fabric Switch
  •  
  • Brocade 4100 Switch
  •  
  • Brocade SAN Switch Hardware
  •  
  • Brocade 2800 Switch
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Switch>SN-DK: Brocade Switch
  •  
  • _Old GCS Categories>Sun Microsystems>Switches>Brocade
  •  

PreviouslyPublishedAs
226142


Applies to:

Brocade 3800 Fabric Switch - Version Not Applicable to Not Applicable [Release N/A]
Brocade 12000 Fabric Switch - Version Not Applicable to Not Applicable [Release N/A]
Brocade 200E Switch - Version All Versions and later
Brocade 24000 Director - Version All Versions and later
Brocade 48000 Director - Version All Versions and later
All Platforms

Symptoms

Encode out errors in porterrshow

Loss of connectivity to a Host, Storage or another Switch can be caused by a faulty SFP or a cable.
The Switch error Log would show an error similar to the following:

2012/05/06-14:30:22, [FW-1424], 5681,, WARNING, SWITCH_1, Switch status changed from HEALTHY to MARGINAL
2012/05/06-14:30:22, [FW-1436], 5682,, WARNING, SWITCH_1, Switch status change contributing factor Marginal ports: 1 marginal ports. (Port(s) x )

Cause

Enc out errors are caused by noise outside of the frame. They usually mean that there is a bad GBIC (Giga Bit Interface Convertor) or SFP (Small Factor Plugin), FC cable or port on the other end. They should not be more than 1% of the rx and tx. There should not be more than a couple of these a day. If this is a real problem, they will increment at more than a couple an hour.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk/Tape Storage Area Networks 

Solution

Plug the cable from that switch port into another open port on the switch. If the errors stop, the problem is the GBIC/SFP. -- If the errors continue --

  1. Try a new FC cable -- If the errors continue --
  2. Check the device on the other end

 

There are other values that can be analyzed more in depth before taking any action.

You can also search in Brocade Community for the following threads:   PORTSHOW - tips and tricks     and/or     Address Error on E-port     and/or      CRC Errors on E_Port

portstatsclear portnumber --> can be used to clear stats and observe the counters again.

Identifying if SFP or the Cable is the Cause for Loss of Link :

- "enc out " errors alone imply primarily cable problem.

- "enc out " and "crc err " combination imply primarily GBIC/SFP problem.

- To find out if source or destination SFP is causing the error, Check the Output of "portshow x" where x is the port number.

- If the pair of "Lr_in " and "Ols_out " as well the "Lr_out " and "Ols_in " values are "quite" equal, it is a normal case.

- If one counter is significantly higher than the other, the link problems either "reached" the switch ("in" > "out") or are caused by the switch ("out" > "in").

- Note: If the "Ols_in" value is higher than the "Lr_out" one, then the "problem source" is, in most cases, more related to the attached device (sending those offline sequences) and the switch responds to them with a "link reset".

enc_out   - Encoding error outside of frames    
crc err   - Frames with CRC errors    
Lr_in     - Link reset In (primitive sequence), does not apply to FL_Port  
Lr_out    - Link reset Out (primitive sequence), does not apply to FL_Port  
Ols_in    - Offline reset in (primitive sequence), does not apply to FL_Port  
Ols_out   - Offline reset out (primitive sequence), does not apply to FL_Port

 

Example:

From the portshow of an E-port (ISL link) , you need to replace first the SFP on the other end of this ISL first.

Free_buffer:       0          Address_err:  0
Overrun:           0          Lr_in:        66
Suspended:         0          Lr_out:       14
Parity_err:        0          Ols_in:       1
2_parity_err:      0          Ols_out:      7


Since "in" errors are more than "out", I suspect the errors are coming in from the other end.

 


The output of porterrshow can be divided into 2 areas:

1. Physical layer issues, these originate at the source and can propagate through fabrics.

enc_in: This counter increments when 8b/10b encoding errors are detected within a frame. enc_in errors are always detected on the ingress port.

crc_err: Indicates corruption within the frame. Always seen on ingress port but will be passed by the switch unaltered through the fabric.

enc_in and/or crc_err = Possible bad media (SFP, cable, patch panel)

Bad_eof: After a loss of synchronization error, continuous-mode alignment allows the receiver to re-establish word alignment at any point in the incoming bit stream
while the receiver is operational. If such a re-alignment occurs, detection of the resulting error condition is dependant upon higher level functions (eg: invalid CRC, missing EOF)
my take if you see bad_eof and crc incrementing, replace SFP a.s.a.p


too_long or too_short errors indicate an unreliable link

enc_out: 8b/10b encoding errors NOT associated with frames (IDLE, R_RDY, and various other primitives). This counter increments during speed negotiation prior to login. Locking a port to a speed supported by the end device can be used to isolate issues.
– Possible bad media (SFP, cable, patch panel)
– Can cause a performance problem due to buffer recovery

disc_c3: Class 3 frame has been discarded because it is not routable to a destination address
– Corrupted or not-online Destination ID (DID)
– Timeout exceeded (Condor ASIC hold time exceeded)
– Counter may increment when FC nodes and/or switches rapidly transition between online and offline; look at fabriclog –s output


2. Link errors point to point - do not traverse fabric.

Link failures - error conditions that cause a port to drop out of an active state
– Requires the reconnecting device to FLOGI back into fabric (No speed negotiation required, since the device does not lose synchronization)

Loss of sync - occur when bit and word synchronization on link is lost

Loss of signal – occur when light or an electrical signal is lost on a link
– Require connected device to renegotiate speed and FLOGI back into fabric

If you experience device connectivity and/or performance issues and rising link counters look for
– bad cables/SFPs/patch-panel connections
– repeating cycles of online/offline states in fabriclog -s output

Once you identify the suspects use, portstats64show and portstatsshow to zero down on the culprit.

 

See also:  Doc ID 1020239.1 Brocade SAN Switches: Investigating switch outputs for hardware errors

@ Previously Published As
STKKB6296

See also HP doc HP StorageWorks B-Series Switches - Identifying if SFP or the Cable is the Cause for Loss of Link


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback