Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1012557.1
Update Date:2016-10-12
Keywords:

Solution Type  Technical Instruction Sure

Solution  1012557.1 :   Sun StorEdge[TM] 3510/3511 Arrays: Understanding Link Error Status Block (LESB) counters.  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 6120 Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage T3 Array
  •  
  • Sun Storage T3+ Array
  •  
  • Sun Storage A5200 Array
  •  
  • Sun Storage A5000 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
  •  

PreviouslyPublishedAs
217290


Applies to:

Sun Storage A5000 Array - Version Not Applicable and later
Sun Storage 3510 FC Array - Version Not Applicable and later
Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 6120 Array - Version Not Applicable and later
Sun Storage A5200 Array - Version Not Applicable and later
All Platforms

Goal


This document is primarily intended as a reference for Sun StorEdge[TM] 3510 and 3511 arrays,  but  can be used as a reference for all Fiber Channel (hence referred to as FC) arrays.

Fix

 

The RAID controller, fibre channel (FC) disk drives and the scsi enclosure services (SES) device all provide the ability to return FC error information that is contained in the Link Error Status Block (LESB). The information in the LESB is very useful for troubleshooting intermittent FC problems.
This document will help understand the LESB and how to use the information provided by the link error status blocks (LESB )to troubleshoot FC loop problems.



Steps to Follow
The LESB is maintained internally by all FC devices and contains accumulated values for errors and other FC statistics.

The LESB is obtained from a device by issuing a Read Link Status Extended Link Service (RLS ELS) to the specified loop ID.
In the case of SE3510/3511, the sccli command "diag error" causes a RLS ELS to be issued to one or more devices in the loop to obtain the LESB.
Following is a description of the information that's returned in the LESB.
1. Total number of LIPs:-
This counter will increment every-time that there is a loop initialization (LIP). Note that the LIP occurs normally for a FC device to join/leave the loop. If there is any problem in the loop or the loop is unstable, there could be LIPs and this counter could increment.
2. Total number of instances of link failures:-
Link failure indicates that the receiving FC device has detected a link-down condition.
3. Total number of instances of loss of Synchronization:-
Loss of synchronization that the receiving FC device has detected an interruption in the protocol sequence that maintains synchronization of information on the loop.
4. Total number of instances of loss of signal:-
Loss of signal indicates that the receiving FC device has detected a complete loss of signal on the loop.
5. Total number of instance of primitive sequence protocol errors:-
A primitive sequence protocol error indicates that the receiving FC device has detected an error in a primitive sequence. These types of errors typically occur outside of FC frames.
6. Total number of instances of invalid transmission words:-
An invalid transmission word indicates that the receiving FC device has detected an invalid transmission word that is not a part of an ordered set,or that an incorrect running disparity has been detected. Invalid transmission word errors can occur inside or outside of FC frames.
7. Total number of instances of invalid CRC:-
A CRC error indicates that the receiving FC device has detected an error within a FC frame. The result of the CRC calculated by the FC receiver for the frame does not match the CRC that was sent by the FC transmitting device.
Having understood this, now if you see the following on a SE3510/3511, you can understand whats going on...
diag data for channel 2 shows:-
sccli: selected device /dev/rdsk/c5t600C0FF00000000007FA9366227CA300d0s2 [SUN StorEdge 3510 SN#07FA93]
CH  ID  TYPE  LIP   LinkFail LossOfSy LossOfSi PrimErr  InvalTxW InvalCRC
-------------------------------------------------------------------------
2   0  DISK  17    0        0        0        0        159      0
2   1  DISK  17    0        41       0        0        2573     0
2   2  DISK  17    0        2        0        0        450      0
2   3  DISK  17    0        0        0        0        279      0
2   4  DISK  17    0        2        0        0        159      0
2   5  DISK  17    0        1        0        0        280      0
2   6  DISK  17    0        0        0        0        38       0
2   7  DISK  17    0        2        0        0        158      0
2   8  DISK  17    0        1        0        0        158      0
2   9  DISK  17    0        2        0        0        159      0
2  10  DISK  17    0        43       0        0        159      0
2  11  DISK  17    0        0        0        0        279      0
2  12  SES   17    0        0        0        0        0        0
2  32  DISK  17    0        4        0        0        400      0
2  33  DISK  17    0        0        0        0        413      0
2  34  DISK  17    0        0        0        0        277      0
2  35  DISK  17    0        69       0        0        322      0
2  36  DISK  17    0        2        0        0        280      0
2  37  DISK  17    0        79       0        0        83221    0
2  38  DISK  17    0        2        0        0        399      0
2  39  DISK  17    0        2        0        0        280      0
2  40  DISK  17    0        5        0        0        481      0
2  41  DISK  17    0        0        0        0        277      0
2  42  DISK  17    0        2        0        0        37       0
2  43  DISK  17    0        0        0        0        156      0
2  44  SES   17    0        0        0        0        0        0
2 112  DISK  17    0        46       0        0        3175     0
2 112  DISK  17    0        46       0        0        3175     0
2 113  DISK  17    0        6        0        0        415      0
2 114  DISK  17    0        3        0        0        404      0
2 115  DISK  17    0        0        0        0        256      0
2 116  DISK  17    0        0        0        0        280      0
2 117  DISK  17    0        70       0        0        83299    0
2 118  DISK  17    0        6        0        0        763      0
2 119  DISK  17    0        0        0        0        39       0
2 120  DISK  17    0        2        0        0        158      0
2 121  DISK  17    0        2        0        0        401      0
2 122  DISK  17    0        2        0        0        281      0
2 123  DISK  17    0        42       0        0        3416     0
2 124  SES   17    0        0        0        0        0        0
2  14  RAID  17    0        0        0        0        0        0
2  15  RAID  17    0        0        0        0        0        0
and diag data for channel 3 shows:-
sccli: selected device /dev/rdsk/c5t600C0FF00000000007FA9366227CA300d0s2 [SUN StorEdge 3510 SN#07FA93]
CH  ID  TYPE  LIP   LinkFail LossOfSy LossOfSi PrimErr  InvalTxW InvalCRC
-------------------------------------------------------------------------
3   0  DISK  180   0        7        0        0        113872   4
3   1  DISK  180   0        13       0        0        116989   17
3   2  DISK  180   0        11       0        0        113524   0
3   3  DISK  180   0        7        0        0        114687   5
3   4  DISK  180   0        8        0        0        114500   12
3   5  DISK  180   0        9        0        0        114240   2
3   6  DISK  180   0        4        0        0        111251   0
3   7  DISK  180   0        6        0        0        109227   35
3   8  DISK  180   0        10       0        0        115010   31
3   9  DISK  180   0        8        0        0        113054   49
3  10  DISK  180   0        6        0        0        111574   59
3  11  DISK  180   0        74       0        0        113619   0
3  12  SES   180   0        0        0        4        0        0
3  32  DISK  180   0        5        0        0        110749   141
3  33  DISK  180   0        5        0        0        108484   0
3  34  DISK  180   0        241      0        0        124826   213
3  35  DISK  180   0        75       0        0        1848807  359
3  36  DISK  180   0        6        0        0        106782   0
3  37  DISK  180   0        76       0        0        1848374  0
3  38  DISK  180   0        4        0        0        108279   0
3  39  DISK  180   0        4        0        0        106821   0
3  40  DISK  180   0        7        0        0        109237   1
3  41  DISK  180   49       104989   0        0        610943   3
3  42  DISK  180   0        4        0        0        106803   0
3  43  DISK  180   0        2        0        0        106611   0
3  44  SES   180   0        0        0        0        0        0
3 112  DISK  180   0        45       0        0        109901   0
3 113  DISK  180   0        76       0        0        110926   0
3 114  DISK  180   0        6        0        0        108043   0
3 115  DISK  180   0        10       0        0        106042   0
3 116  DISK  180   0        3        0        0        108666   0
3 117  DISK  180   0        82       0        0        1845672  0
3 118  DISK  180   0        14       0        0        113513   3
3 119  DISK  180   0        6        0        0        113617   3
3 120  DISK  180   0        6        0        0        107927   0
3 121  DISK  180   0        5        0        0        109396   0
3 122  DISK  180   0        9        0        0        111031   7
3 123  DISK  180   37       296508   0        0        5456006  0
3 124  SES   180   0        0        0        0        0        0
3  14  RAID  180   0        0        0        0        3        0
3  15  RAID  180   0        0        0        0        3        0
From the above, we can say that channel 3 (loop b) is having a highly unstable loop ie looking at the high number of InvalTxW and also having CRC errors in the frames.
In this case, it would be highly recommended to schedule a downtime and isolate the bad component on the loop. It is beyond the scope of this document to help isolate the bad component.
Notes:-
1. The statistics is stored on the controller, drives and SES devices and they can only be reset via a power cycle. Just by resetting the array, the counters are not resetted.
2. The InvalTxW I.E the 'Invalid word' field counter is a counter for the number invalid transmission and is 4 bytes in size on the FC interface. This counter is updated only after the synchronization state has been acquired. The   receiver FC node (could be a HBA or controller) checks each received transmission word to determine if the word is valid or not. If not valid, then it updates the counter, and ignores the sequence. Keep in mind this is a receiver function that's maintained by each end.
When we see a very high number in the InvalTxW (invalid words), we also need to look at the other counters like the lnkfail, LossOfSync, or LossOfSignal and if they too have a high number then we may have a flaky link (as in the example above). If however, there is a high number of InvalidWord but not on the other counters, then the transmitting node has something wrong. Check for the HBA or the controller.
3. All the FC devices (arrays) maintain the LESB information but the commands/utilities to get the information are different for all the arrays. This document is not about how to get the LESB block information but how to interpret the same and that is the same for all the arrays.
For example, for T3/T3+/6120, we can access the LESB data using the
.disk linkstat   command and following is the sample output
.disk linkstat u1d1-9 path 1
DISK LINKFAIL LOSSSYNC LOSSSIG PROTOERR INVTXWORD INVCRC
--------------------------------------------------------
u1d1  1        6        0       0        37        0
u1d2  1        4        0       0        37        0
u1d3  1        15       0       0        8         0
u1d4  1        3        0       0        37        0
u1d5  1        5        0       0        37        0
u1d6  1        17       0       0        27        0
u1d7  1        5        0       0        37        0
u1d8  1        4        0       0        37        0
u1d9  1        7        0       0        37        0
and you will see that the columns are the same as what we saw for SE3510/3511, hence we can use this document to interpret the LESB data for all FC storage arrays.
Another example would be SunStorage A5x00 array which is a JBOD of FC disks. To get the LESB data for this JBOD (Just a bunch of disks) use the following   command from the host
luxadm -e rdls pathname
Example:-
luxadm -e rdls /devices/sbus@2,0-SUNW,socal@2,0-sf@0,0:devctl.out
would give us the following output for all the FC disks in the JBOD array
al_pa   lnk fail    sync loss   signal loss   sequence err   invalid word   CRC
1        720896      0           0             0              0              0
d2       0           2           11            0              0              0
ef       0           79          0             0              2              0
e8       0           0           0             0              5              0
e2       0           0           0             0              4              0
e0       0           0           0             0              3              0
dc       0           1           0             0              117            0
d5       0           0           0             0              3              0
2        720896      1           0             0              117            0
b5       0           2           13            0              0              0
cd       0           95          0             0              3              0
ca       0           0           0             0              6              0
c7       0           0           0             0              6              0
c6       0           0           0             0              6              0
ba       0           0           0             0              6              0
which again gives the same data which can be interpreted using this document.



Product
Sun StorageTek 3511 SATA Array
Sun StorageTek 3510 FC Array
Sun StorageTek T3 Array
Sun StorageTek A5200 Array
Sun StorageTek A5000 Array
Sun StorageTek A5100 Array
Sun StorageTek T3+/6X20 Controller Firmware 3.1
Sun StorageTek T3+ Array Controller FW 2.1
Sun StorageTek T3 Multi-Platform 1.1
Sun StorageTek T3+ Array
Sun StorageTek 6120 Array

Internal Comments
Sun StorEdge[TM] 3510/3511 Arrays: Understanding Link Error Status Block (LESB) counters.

ELS. LESB, CRC, 3510, 3511, InvalTxW, LIP, LossOfSy, LossOfSi. PrimiErr, T3, T3+, 6120, A5200, A5100
Previously Published As
81767

Change History
Date: 2010
User Name: Vickie Williams
Comment: *** Restored Published Content *** SSH AUDIT
Version: 0

User Name: 7058
Action: Update Started
Comment: SSH AUDIT
Version: 0
Date: 2005-06-03
User Name: 25440
Action: Approved
Comment: Thanks for the clarification. Publishing.
Version: 4


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback