Asset ID: |
1-79-2195792.1 |
Update Date: | 2017-07-11 |
Keywords: | |
Solution Type
Predictive Self-Healing Sure
Solution
2195792.1
:
Infiniband Fabric errors and counters
Related Items |
- Sun Network QDR InfiniBand Gateway Switch
- Sun Datacenter InfiniBand Switch 36
|
Related Categories |
- PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
|
In this Document
Applies to:
Sun Network QDR InfiniBand Gateway Switch
Sun Datacenter InfiniBand Switch 36
Information in this document applies to any platform.
Purpose
Explain what is the meaning of certain errors and counters visible at the IB Fabric level.
Scope
Help to determine which errors and counters are harmful and which can be safely ignored.
Details
Infiniband Fabric errors and counters can be divided into 2 groups: 1) harmful, meaning a real problem 2) harmless, that can be ignored.
Harmful:
Error / Counter |
Meaning |
SymbolErrors |
Total number of minor link errors detected on one or more physical lanes. |
LinkErrorRecovery |
Total number of times the Port Training state machine has successfully completed the link error recovery process. |
PortRcvErrors |
Total number of packets containing an error that were received on the port.
|
PortXmitConstraintErrors |
Total number of packets not transmitted from the port for the following reasons:
- FilterRawOutbound is true and packet is raw
- PartitionEnforcementOutbound is true and packet fails partition key check or IP version check
|
PortRcvConstraintErrors |
Total number of packets received on the port that are discarded for the following reasons:
- FilterRawInbound is true and packet is raw
- PartitionEnforcementInbound is true and packet fails partition key check or IP version check
|
LocalLinkIntegrityErrors |
The number of times that the count of local physical errors exceeded the threshold specified by LocalPhyErrors
|
ExcessiveBufferOverrunErrors |
The number of times that OverrunErrors consecutive flow control update periods occurred, each having at least one overrun error |
PortRcvRemotePhysicalErrors |
Total number of packets marked with the EBP delimiter received on the port. |
Harmless: counters can have non-zero values in our environments and these can be safely ignored.
Error / Counter |
Meaning |
LinkDown |
Total number of times the Port Training state machine has failed the link error recovery process and downed the link.
|
PortXmitDiscards |
Total number of outbound packets discarded by the port because the port is down or congested. |
VL15Dropped |
Number of incoming VL15 packets dropped due to resource limitations (e.g., lack of buffers) in the port. |
PortXmitData |
Total number of data octets, divided by 4, transmitted on all VLs from the port. This includes all octets between
(and not including) the start of packet delimiter and the VCRC, and may include packets containing errors It excludes all link packets.
|
PortRcvData |
Total number of data octets, divided by 4, received on all VLs at the port. This includes all octets between
(and not including) the start of packet delimiter and the VCRC, and may include packets containing errors It excludes all link packets.
|
PortXmitPkts |
Total number of packets transmitted on all VLs from the port. This may include packets with errors and excludes link packets. |
PortRcvPkts |
Total number of packets, including packets containing errors and excluding link packets, received from all VLs on the port.
|
PortRcvSwitchRelayErrors |
Total number of packets received on the port that were discarded because they could not be forwarded by the switch relay. |
Attachments
This solution has no attachment