Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1582979.1
Update Date:2013-09-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  1582979.1 :   Sun Storage 7000 Unified Storage System: SMB I/O Stalls when ICAP Virus Scan Engine ( vscan ) Scans File  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun Storage 7110 Unified Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-7710052231>

Applies to:

Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun Storage 7210 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

The user has configured many different SMB shares.

When he sets up the ICAP (virus scan) engine to scan the share, all I/O seems to stall to SMB shares (and non-vscan enabled shares also).

Changes

none

Cause

The reason that the vscan enabled shares and non-enabled shares also become unavailable is that there are smb_worker_thread threads waiting in vscan_svc_scan_file() while holding smb_node_wrlock locks
and the rest of smb_session_worker threads are waiting for that lock.

Solution

Virus scanning is enabled on some of the shares. But all smb shares lost access when the problem occurred, even those that did not have virus scanning enabled.

If the virus scan service is disabled, smb instantly begins working again.


Please collect the  vscand core and "netstat -an" created during the issue, along with the bundle

Check for the logs from the bundle:

--from debug.sys----

logs$ cat debug.sys | grep "Virus scan request timeout" | wc -l
     1122

    Aug  2 10:44:31 fianna vscan: [ID 509523 kern.warning] WARNING: Virus scan request timeout
    ....
    Aug 23 01:12:00 fianna vscan: [ID 509523 kern.warning] WARNING: Virus scan request timeout


logs$ cat debug.sys | grep "Error receiving data from Scan Engine" | wc -l
      79
     ....
     Jul 29 01:02:31 fianna vscand: [ID 940187 daemon.error] Error receiving data from Scan Engine: Connection reset by peer
     ....
     Aug  9 08:35:09 fianna vscand: [ID 940187 daemon.error] Error receiving data from Scan Engine: Error 0


logs$ cat debug.sys | grep "ICAP protocol error" | wc -l
        7
     ....
     Aug  2 10:25:58 fianna vscand: [ID 586775 daemon.error] ICAP protocol error - unsupported scan result: Internal server error
     ....
     Aug 22 03:07:49 fianna vscand: [ID 586775 daemon.error] ICAP protocol error - unsupported scan result: Internal server error

 
--> Remaining vscand messages:
 

logs$ cat debug.sys | grep vscan | egrep -v "ICAP protocol error|Virus scan request timeout|Error receiving data from Scan Engine"
 Aug 10 07:13:36 fianna vscand: [ID 678180 daemon.notice] Scan Engine - connection error (10.5.109.53:1344) Connection timed out
 Aug 10 07:14:17 fianna vscand: [ID 678180 daemon.notice] Scan Engine - connection error (10.5.109.53:1344) Connection timed out
 Aug 10 07:15:29 fianna vscand: [ID 678180 daemon.notice] Scan Engine - connection error (10.5.109.53:1344) Connection timed out
 Aug 10 07:16:34 fianna vscand: [ID 678180 daemon.notice] Scan Engine - connection error (10.5.109.53:1344) Connection timed out
 Aug 10 07:17:40 fianna vscand: [ID 678180 daemon.notice] Scan Engine - connection error (10.5.109.53:1344) Connection timed out
 Aug 10 07:17:55 fianna vscand: [ID 678180 daemon.notice] Scan Engine - connection error (10.5.109.53:1344) Connection timed out
 ........



From vscand_core.xxx is clear it is waiting for response from scan engine.

There are 32 threads waiting for response:
 

 cd33     UNPARKED <NONE>         32
          libsocket.so.1`recv+0x2a
          vs_icap_readline+0x2f
          vs_icap_read_resp_code+0x2f
          vs_icap_read_respmod_resp+0xe
          vs_icap_respmod_request+0x73
          vs_icap_scan_file+0x184
          vs_svc_scan_file+0xd1
          vs_svc_async_scan+0x22
          libc_hwcap1.so.1`_thrp_setup+0x9b
          libc_hwcap1.so.1`_lwp_start

 
 Which corresponds to number of connections opened to port 1344:
 

 dropbox$ cat vscand_netstat-an.out | grep 1344
 10.5.104.81.34992    10.5.109.53.1344     16775936      0 1049800      0    ESTABLISHED
 10.5.104.81.64652    10.5.109.53.1344     10776064      0 1049800      0    ESTABLISHED
 10.5.104.81.41743    10.5.109.53.1344     13267456      0 1049800      0    ESTABLISHED
 10.5.104.81.53894    10.5.109.53.1344     1323008        0 1049800      0    ESTABLISHED


 

 dropbox$ cat vscand_netstat-an.out | grep 1344 | wc -l
          32

Can be linked to BUG 15646191 - SUNBT6956260 CIFS shares not accessible / vscand related issue

 

The reason that the vscan enabled shares and non-enabled shares also become unavailable is that there are smb_worker_thread threads waiting in vscan_svc_scan_file() while holding smb_node_wrlock locks and the rest of smb_session_worker threads is waiting for that lock.

There is a pool of 1024 smb_session_worker threads, which are a global resource and this resource can be exhausted, so for further smb I/O to occur these threads will need to be freed up.

By default, the call to vscan_svc_scan_file() has a 15 minute timeout, so after 15 minutes the locks will be freed, but due to the backlog of requests and the fact that any new I/O to vscan enabled shares will also end up waiting on the vscan call, the smb service on the system will appear hung.

This could be confirmed in the crashdump, if the customer initiated an NMI during the issue.

Action to customer: Please check with scan engine vendor why it is not replying to vscand.

References

<NOTE:1173064.1> - Sun Storage 7000 Unified Storage System: How to generate NMI to collect a system core dump

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback