Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1450121.1
Update Date:2017-06-15
Keywords:

Solution Type  Technical Instruction Sure

Solution  1450121.1 :   How to Resolve a Missing Hot Spare Drive  


Related Items
  • Sun Storage Flexline 380 Array
  •  
  • Sun Storage 6580 Array
  •  
  • Sun Storage Flexline 240 Array
  •  
  • Sun Storage 2540-M2 Array
  •  
  • Sun Storage 6180 Array
  •  
  • Sun Storage Flexline 280 Array
  •  
  • Sun Storage 6130 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 6540 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 6780 Array
  •  
  • Sun Storage 2530-M2 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage Flexline 210 Array
  •  
  • Sun Storage 6140 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: 6140_6180
  •  


How to recover from and avoid a missing Hot Spare condition.

In this Document
Goal
Solution
References


Applies to:

Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6580 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6180 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage Flexline 280 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage Flexline 380 Array - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Goal

This document provides the steps to resolve the alarm xx.66.1203 - Missing Hot Spare Drive that can be raised by the array management software. While this problem can be seen in 6.x and 7.x controller firmware versions, it is much more prevalent in 7.x. If you see this problem on array running 6.x controller firmware, please contact Oracle support for a resolution. The balance of this document assumes 7.x controller firmware.

There are two primary ways this problem can be created:

  1. The actual removal of a drive that has been assigned as a Hot Spare Drive.
  2. Not unassigning a Hot Spare Drive that has failed before replacement.

As the first problem is easily fixed by inserting the removed Hot Spare Drive, this document will focus on the second cause. This problem can also be easily avoided by unassigning the failed Hot Spare Drive from the Hot Spare list before attempting to replace it.

Solution

 

NOTES:
A) The solutions provided here are non-disruptive and do not require an outage.
B) If you have replaced the hot spare with a new disk and reassigned this same disk slot as a hot spare before eliminating the 'missing' hot spare, you will now have two entries for the same hot spare.  
In this situation, you will have to perform the solution procedure twice.  The first time you run the procedure it will unassign the new disk and the second time it will unassign the 'missing' hot spare.

 

  1. Collect supportData bundle.

    1. Reference <Document 1002514.1> Collecting Sun Storage Common Array Manager Support Data for Arrays.
    2. Reference <Document 1014074.1> Collecting Support Data from Arrays using Sun StorageTek[TM] SANtricity Storage Manager.

     

  2. Verify Missing Hot Spare Drive.

    1. Unzip the supportData bundle from step 1.
    2. For Sun Storage Common Array Manager (CAM) users, look at the alarms.txt file for an alarm Gridcode of xx.66.1203.
    3. For SANtricity users, look in recoveryGuruProcedures.html for HOT_SPARE_MISSING-Recovery Failure Type Code: 203.

     

  3. Attempt to unassign the hot spare drive.

    CAM Browser interface:

    1. Select the array in the navigation tree on the left.
    2. Select Service Advisor on the right side of the CAM Mast Head at the top.
    3. Expand Portable Virtual Disk Management.
    4. Expand the Unassign a Hot Spare.
    5. Navigate to and select the Drive in question.
    6. Follow the procedure in the main window to unassign the Hot Spare.

    The fault should disappear within a few minutes of performing this update.  If the command errors or the fault does not clear within five minutes, please contact Oracle Support.


    CAM command line:

    # service -d <array_name> -c unassign -t tNNdYY


    where NN is the tray number and YY is the slot number of the drive in question.

    Service command is located:
    Solaris: /opt/SUNWsefms/bin
    Linux: /opt/sun/cam/private/fms/bin
    Windows: \Program Files\Sun\Common Array Manager\Component\fms\bin
    Windows: \Program Files (x86)\Sun\Common Array Manager\Component\fms\bin

    If the unassign option is not available, upgrade CAM to the latest version and try again.

    The fault should disappear within a few minutes of performing this update.  If the command errors or the fault does not clear within five minutes, please contact Oracle Support.


    SANtricity:

    1. Open the Array Management Window for the array.
    2. Select Drive in the Menu bar.
    3. Select Hot Spare Coverage...
    4. Select View /change Hot Spare Coverage and OK.
    5. Select the Drive in the right pane that has (Missing) next to it, you may need to scroll to see all Hot Spare disks.  (Note: the 'missing' disk may not be visible, so in this case try clicking on the 'Tray' box in the first 'empty' row in the Hot Spare Drive table.)
    6. Click Unassign.

    SANtricity command line:

    # SMcli -n array_name -c 'set drive[trayID,slotID] hotSpare=FALSE;'


    The fault should disappear within a few minutes of performing this update.  If the command errors or the fault does not clear within five minutes, please contact Oracle Support.

 

Do you still have questions?  You can use My Oracle Support Communities.  Communities put you in touch with industry professionals like yourself.  They are monitored by Oracle support engineers, so you can expect reliable and correct answers.  Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

 


Important: The instructions in this document have to be used by an Oracle support engineer who received the required NetApp advanced training to access the shell. If you are not one of these engineers, you are not authorized to use these commands without guidance from one of these engineers. In that case, please open a collaboration SR with a TSC L2 engineer.  This procedure is internal only so DO NOT copy these instructions for customer visibility.


If the resolution for this issue requires the use of serial port commands, an L2 collaboration is required.  If the array is running 7.x firmware on the controllers, it is a relatively quick process that does not require an outage but still needs to be confirmed by L2.  If the array is running 6.x firmware, please escalate to L2 as the steps to resolve are unique to each individual case and implementing the fix requires an outage.  Either way, the recovery steps can either be performed remotely or on site but must be done by an Oracle badged person.

  1. Establish a connection to one of the controllers. Two methods are available. Access to the shell requires a collaboration with L2.
    You can enable telnet to the controller with the service command. Once telnet is enabled, you can telnet to the specific IP address of the controller.      
      ./service -d arrayname -c enable -q remote -t b        
      ./service -d arrayname -c enable -q remote -t a  
    When done, you can turn off telnet access by using the disable option to the command.
    You can establish a console session via the serial port on the controller. See <Document 1400311.1> Sun Storage 2500, 2500-M2 and 6000 Arrays: How to Establish a Serial Connection to the Controller.
  2. First, load the debug module. Depending on your version of firmware, use either the vdmDrmShowHSDrives or the vdmDrmShowMgr command to display the missing or phantom hot spare. (If the firmware does not understand the command used, it will fail with C interp: unknown symbol name 'vdmDrmShowXXXXXX'.  Simply run the alternative command).   The offending  devnum(s) will consist of some combination of up to eight 0's and f's:

    -> loadDebug
    value = o = 0x0
    -> vdmDrmShowMgr
    =================
    m_HSDrives in DRM
    =================

    Drive:0x453b408 devnum:0xffff role:Standby
    Drive:0x453b81c devnum:0x1030f role:Standby
    Drive:0x434da1c devnum:0x1050f role:Standby
    Drive:0x46f9800 devnum:0x1010f role:Standby
    ->

    Depending on how it came into existence, the phantom may also be seen in the output of vdmShowDriveList.  Using the devnum(s) found above, examine the matching line(s) from vdmShowDriveList.  The Tray/Slot column will have a 0/0 entry and on the left side of the State column there will be an NP.
  3. In the example above, the first entry is the missing of phantom hot spare drive.  It can be removed by using the deassignDrivesAsHotSpares_MT command and the devnum of the phantom:

    -> deassignDrivesAsHotSpares_MT 1,0xffff

    Multiple instances of phantom spares should be removed with a single execution of the command.
  4. Re-run the first command to confirm that the missing or phantom Hot Spare Drive is no longer listed and unload the debug module:

    -> vdmDrmShowMgr
    =================
    m_HSDrives in DRM
    =================
    Drive:0x453b81c devnum:0x1030f role:Standby
    Drive:0x434da1c devnum:0x1050f role:Standby
    Drive:0x46f9800 devnum:0x1010f role:Standby

    -> unld "Debug"
    -> value = 0 = 0x0


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback