Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2390606.1
Update Date:2018-04-30
Keywords:

Solution Type  Problem Resolution Sure

Solution  2390606.1 :   VSM6, VSM7, VLE - Receiving command timeouts on an SSD or JBOD disk  


Related Items
  • StorageTek Virtual Storage Manager System 6 (VSM6)
  •  
  • StorageTek Virtual Storage Manager System 7 (VSM7)
  •  
  • Sun Virtual Library Extension (VLE)
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Virtual Tape>SN-TP: VSM6
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Partners and Oracle only
Created from <SR 3-17344314401>

Applies to:

StorageTek Virtual Storage Manager System 6 (VSM6) - Version All Versions and later
StorageTek Virtual Storage Manager System 7 (VSM7) - Version 7.0.0 to 7.1.2 [Release 7.0]
Sun Virtual Library Extension (VLE) - Version 1.4 to 1.5 [Release 1.0]
Information in this document applies to any platform.

Symptoms

Command timeout errors are being reported

SSD fenced

SSD or JBOD disk status is offline or removed

Common questions related to an SSD or disk failure

Changes

 None

Cause

Faulty JBOD (just a bunch of disks) disk

Faulty SSD (solid state device)

Too many command timeouts for an SSD or JBOD drive (aka disk)

Solution

Command timeouts can occur on an SSD or JBOD disk device for a VSM6, VSM7, or VLE.  The command timeouts are related to a failure of reading or writing to one of these devices.  As with the VLE, the VSM6 and VSM7 subsystems have a monitoring script designed to monitor the command timeouts and offline the SSD or drive.  Should the command timeouts reach the threshold of 5 timeouts within 15 minutes, the SSD or JBOD disk will typically be placed in an offline status.  There are instances where the SSD or disk reports "No offline done. Not able to get the disk status." and the device will not be placed offline but placed in a status of "removed" by the system.  In either of these circumstances, an event in the alarms.txt log will be generated and an ASR created (if configured) identifying the failed SSD or disk.  The VSM6, or VSM7 Installation and Configuration Service Guide document has instructions to replace the failed SSD or disk.  In the case of a an offlined or removed SSD, the SSD will be fenced and there are procedures in this document for this scenario.

 Some common questions related to an SSD or disk failure.

Questions for command timeouts with regards to the SSDs on a VSM6 or VSM7?

  1. What is the lifetime expectancy of an SSD?  Answer - There is no known, specific life expectancy for an SSD at the time of the writing of this document.  Upon a failure, replace the SSD.
  2. How does the VSM6 or VSM7 control when an SSD or disk is to expire or fail?  There is nothing other than the command timeouts threshold set or monitored within the VLE, VSM6 or VSM7.  When there are 5 timeouts within 15 minutes, the disk will be failed with a status of offline or removed.  Devices that do not exceed the threshold of 5 timeouts within 15 minutes do NOT need to be replaced - they are operating within Engineering specified parameters.
  3. Is there a "lifetime of access count for an SSD?  Answer - There are no known "lifetime access counts" for an SSD or disk at the time of the writing of this KB document.

 

Background History

The Oracle VLE engineering team wrote the script to offline a disk for too many command timeouts to avoided the complexity of automatically picking a spare disk and initiating the resilver process. The purpose of offlining a disk is to stop sending I/O to the failing disk and avoid long delays that can further hang the system operation or cause performance issues. Offlining the disk will not induce drive re-silvering by design, therefore a spare will not be used.  The VSM6 and VSM7 engineering team incorporated this monitoring script to perform the same function as the VLE subsystem.

References

<NOTE:1533980.1> - How to replace a drive in a VSM6 or VSM7 stpool (JBOD):ATR:1533980.1:3
<NOTE:1959907.1> - VSM6 - Due to Part shortages for Gen3 73GB SSD's, a Gen4 73GB SSD may be substituted on VSM6 systems
<NOTE:2317184.1> - Oracle VSM6 and VSM7 Storage Appliance: Support Strategy for Replacing a 200GB SSD (October 2017)
<NOTE:1533979.1> - How to replace a VSM6 or VSM7 SSD drive:ATR:1533979.1:3

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback