![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2390606.1 : VSM6, VSM7, VLE - Receiving command timeouts on an SSD or JBOD disk
In this Document
Oracle Confidential PARTNER - Available to partners (SUN). Applies to:StorageTek Virtual Storage Manager System 6 (VSM6) - Version All Versions and laterStorageTek Virtual Storage Manager System 7 (VSM7) - Version 7.0.0 to 7.1.2 [Release 7.0] Sun Virtual Library Extension (VLE) - Version 1.4 to 1.5 [Release 1.0] Information in this document applies to any platform. SymptomsCommand timeout errors are being reported SSD fenced SSD or JBOD disk status is offline or removed Common questions related to an SSD or disk failure ChangesNone CauseFaulty JBOD (just a bunch of disks) disk Faulty SSD (solid state device) Too many command timeouts for an SSD or JBOD drive (aka disk) SolutionCommand timeouts can occur on an SSD or JBOD disk device for a VSM6, VSM7, or VLE. The command timeouts are related to a failure of reading or writing to one of these devices. As with the VLE, the VSM6 and VSM7 subsystems have a monitoring script designed to monitor the command timeouts and offline the SSD or drive. Should the command timeouts reach the threshold of 5 timeouts within 15 minutes, the SSD or JBOD disk will typically be placed in an offline status. There are instances where the SSD or disk reports "No offline done. Not able to get the disk status." and the device will not be placed offline but placed in a status of "removed" by the system. In either of these circumstances, an event in the alarms.txt log will be generated and an ASR created (if configured) identifying the failed SSD or disk. The VSM6, or VSM7 Installation and Configuration Service Guide document has instructions to replace the failed SSD or disk. In the case of a an offlined or removed SSD, the SSD will be fenced and there are procedures in this document for this scenario. Some common questions related to an SSD or disk failure. Questions for command timeouts with regards to the SSDs on a VSM6 or VSM7?
Background History The Oracle VLE engineering team wrote the script to offline a disk for too many command timeouts to avoided the complexity of automatically picking a spare disk and initiating the resilver process. The purpose of offlining a disk is to stop sending I/O to the failing disk and avoid long delays that can further hang the system operation or cause performance issues. Offlining the disk will not induce drive re-silvering by design, therefore a spare will not be used. The VSM6 and VSM7 engineering team incorporated this monitoring script to perform the same function as the VLE subsystem. References<NOTE:1533980.1> - How to replace a drive in a VSM6 or VSM7 stpool (JBOD):ATR:1533980.1:3<NOTE:1959907.1> - VSM6 - Due to Part shortages for Gen3 73GB SSD's, a Gen4 73GB SSD may be substituted on VSM6 systems <NOTE:2317184.1> - Oracle VSM6 and VSM7 Storage Appliance: Support Strategy for Replacing a 200GB SSD (October 2017) <NOTE:1533979.1> - How to replace a VSM6 or VSM7 SSD drive:ATR:1533979.1:3 Attachments This solution has no attachment |
||||||||||||||||||||
|