Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1528106.1
Update Date:2017-08-29
Keywords:

Solution Type  Problem Resolution Sure

Solution  1528106.1 :   Oracle Fabric Interconnect :: "CS_TIMEOUT" in ESX / ESXi logs  


Related Items
  • Oracle Fabric Interconnect F1-15
  •  
  • Oracle Fabric Interconnect F1-4
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Oracle Virtual Networking
  •  




In this Document
Symptoms
Cause
Solution


Applies to:

Oracle Fabric Interconnect F1-4 - Version All Versions to All Versions [Release All Releases]
Oracle Fabric Interconnect F1-15 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

 Seeing "CS_TIMEOUT" in ESX / ESXi logs, allthough CS_TIMEOUT also applies to other operating systems as well.

Cause

The following information describes what CS_TIMEOUT messages in ESX/ESXi /var/log/vmkernel or /var/log/messages files mean and how to resolve the condition that causes these messages.

What this essentially means is that IO cmds sitting in FC IO card queue did not get a response from storage within 20 seconds. The CS_TIMEOUT handling  from the drivers and the vmkernel can temporarily cause datastores to be unavailable. The purpose of lowering the queue depth is to avoid CS_TIMEOUT condition.

Solution

 When too many CS_TIMEOUTs occur these things can happen to the ESX hosts:

A) VMFS filesystems on the Shared LUNs get set to read-only to prevent data corruption
B) The ESX host can experience a PSOD due to the fact that storage has disconnected from the ESX host due to too many CS_TIMEOUTs
C) Data corruption can occur

This is the syntax of the error messages:

vmkernel:May 27 09:50:29 srwp01vmw031 vmkernel: 0:01:16:08.114 cpu10:4106)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e027ec0, scsi_status=0x0
vmkernel.1:May 27 08:44:59 srwp01vmw031 vmkernel: 0:00:10:38.581 cpu15:4111)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e032ac0, scsi_status=0x0
vmkernel.1:May 27 08:45:00 srwp01vmw031 vmkernel: 0:00:10:38.883 cpu15:4111)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e0534c0, scsi_status=0x0
vmkernel.1:May 27 08:45:00 srwp01vmw031 vmkernel: 0:00:10:38.883 cpu15:4111)<vhba vhba1> process_status_entry: CS_TIMEOUT, cp=0x41000e02fcc0, scsi_status=0x0

This indicates the ESX/ESXi shared storage is extremely busy and/or oversubscribed.

For certain Storage vendors you need to have these questions answered:

1) How many hosts are accessing the shared LUNs
2) How many VMs are on these shared LUNs
3) What is the vhba queue depth set to

Please review VMware KB Articles on reducing number of hosts sharing the same LUNs, limiting/balancing the number of VMs on the Shared LUNs and reducing queue depth for some Storage Manufacturer storage devices:

1005010 http://kb.vmware.com/kb/1005010
1005011 http://kb.vmware.com/kb/1005011
1006001 http://kb.vmware.com/kb/1006001
1006002 http://kb.vmware.com/kb/1006002
1006003 http://kb.vmware.com/kb/1006003


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback