Oracle Fabric Interconnect :: "CS_TIMEOUT" in ESX / ESXi logs

Asset ID:	1-72-1528106.1
Update Date:	2017-08-29
Keywords:

Solution Type Problem Resolution Sure

Solution 1528106.1 : Oracle Fabric Interconnect :: "CS_TIMEOUT" in ESX / ESXi logs

Applies to:

Oracle Fabric Interconnect F1-4 - Version All Versions to All Versions [Release All Releases]
Oracle Fabric Interconnect F1-15 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Seeing "CS_TIMEOUT" in ESX / ESXi logs, allthough CS_TIMEOUT also applies to other operating systems as well.

Cause

The following information describes what CS_TIMEOUT messages in ESX/ESXi /var/log/vmkernel or /var/log/messages files mean and how to resolve the condition that causes these messages.

What this essentially means is that IO cmds sitting in FC IO card queue did not get a response from storage within 20 seconds. The CS_TIMEOUT handling from the drivers and the vmkernel can temporarily cause datastores to be unavailable. The purpose of lowering the queue depth is to avoid CS_TIMEOUT condition.

Solution

When too many CS_TIMEOUTs occur these things can happen to the ESX hosts:

A) VMFS filesystems on the Shared LUNs get set to read-only to prevent data corruption
B) The ESX host can experience a PSOD due to the fact that storage has disconnected from the ESX host due to too many CS_TIMEOUTs
C) Data corruption can occur

This is the syntax of the error messages:

vmkernel:May 27 09:50:29 srwp01vmw031 vmkernel: 0:01:16:08.114 cpu10:4106)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e027ec0, scsi_status=0x0
vmkernel.1:May 27 08:44:59 srwp01vmw031 vmkernel: 0:00:10:38.581 cpu15:4111)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e032ac0, scsi_status=0x0
vmkernel.1:May 27 08:45:00 srwp01vmw031 vmkernel: 0:00:10:38.883 cpu15:4111)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e0534c0, scsi_status=0x0
vmkernel.1:May 27 08:45:00 srwp01vmw031 vmkernel: 0:00:10:38.883 cpu15:4111)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e02fcc0, scsi_status=0x0

This indicates the ESX/ESXi shared storage is extremely busy and/or oversubscribed.

For certain Storage vendors you need to have these questions answered:

1) How many hosts are accessing the shared LUNs
2) How many VMs are on these shared LUNs
3) What is the vhba queue depth set to

Please review VMware KB Articles on reducing number of hosts sharing the same LUNs, limiting/balancing the number of VMs on the Shared LUNs and reducing queue depth for some Storage Manufacturer storage devices:

1005010 http://kb.vmware.com/kb/1005010
1005011 http://kb.vmware.com/kb/1005011
1006001 http://kb.vmware.com/kb/1006001
1006002 http://kb.vmware.com/kb/1006002
1006003 http://kb.vmware.com/kb/1006003

Attachments

This solution has no attachment