Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1017618.1
Update Date:2015-05-13
Keywords:

Solution Type  Troubleshooting Sure

Solution  1017618.1 :   All connectivity (data and management) is lost to a Sun Storedge 35XX / 33XX arrays. Both controller status LED's are Blinking green  


Related Items
  • Sun Storage 3511 SATA Array
  •  
  • Sun Storage 3510 FC Array
  •  
  • Sun Storage 3310 Array
  •  
  • Sun Storage 3320 SCSI Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: SE31xx_33xx_35xx
  •  
  • _Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
228794
This document describes how to resolve a RAID controller "race condition" on a Storage 3310,

Applies to:

Sun Storage 3510 FC Array - Version Not Applicable and later
Sun Storage 3310 Array - Version Not Applicable and later
Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
All Platforms

Purpose

This document will explain the symptoms and resolution for a total loss of connectivity to a Sun StorEdge 3X00 array. The problem has become known as a "race condition".  The array goes into a state, where both controllers assume a role of primary. You know this has occurred when:

  • Both controller status LEDs are blinking green.
  • The TCP/IP (ethernet) connection may not respond.
  • Serial console (Console Menu Interface) sends garbled characters
  • You cannot reach the array via sccli
  • I/O to the host  may stop as well.

 

The cause can also be seen in single raid controller arrays SE3310, SE3510 and SE3511.


Explanation:

An incorrect NVSRAM 3.2.X exists on 4.X NVSRAM Raid Controller. Which prevents a crossload of Raid Controller firmware and NVSRAM from the primary to secondary controller during a power on sequence or  a controller reset (initiated by an sscli command "sscli> reset controller")



Cause:

This problem can be attributed to either an improper controller firmware upgrade, or may occur when a 3.2.X controller was installed in a 4.X array. Specifically, there is an NVRAM mismatch somewhere between the NVRAM on the controllers.



 

Troubleshooting Steps

Troubleshooting Steps:

The following procedure is used to resolve race conditions. It requires

  • downtime (roughly 1 hour)
  • sccli access (preferrably out of band)
  • console access. (You will need to tip to the serial port of the controller)
  • A previous show_configuration.xml file from either explorer or se3kxtr.

 

Steps to Follow:

  • Stop all Host I/O. Start downtime. 
  • Power off the RAID chassis using the power switches on the two PCUs.
  • Pull the BOTTOM RAID controller module part way out of the RAID chassis.
  • Wait 10 seconds, then power the RAID chassis on.
  • Wait for the TOP RAID controller module to boot up (approx. 90 seconds),
  • From the host start an sccli session in-band -> /usr/sbin/sccli oob-> /usr/sbin/sccli <ip-address> 
  • Select (or verify that sccli displays) the correct S/N for the RAID chassis you are working on.
    S/N  The Serial Number of the RAID chassis can be found on a sticker on the 
            bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has 
            the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
  • Use sccli command  sccli> Reset Nvram   When complete, exit sccli.
  • Power off the RAID chassis using the power switches on the two PCU's
  • Fully insert the BOTTOM RAID controller module (ensure it is fully seated).
  • Pull the TOP RAID controller module part way out of the RAID chassis
  • Power the RAID chassis on.
  • Wait for the BOTTOM RAID controller module to boot up (approx. 90 seconds)
  • From the host start an sccli session in-band -> /usr/sbin/sccli oob-> /usr/sbin/sccli <ip-address> 
  • Select (or verify that sccli displays) the correct S/N for the RAID chassis you are working on.
    S/N  The Serial Number of the RAID chassis can be found on a sticker on the 
            bottom left side of the Disk Drive Bay of the RAID chassis. If the sticker has 
            the number 0451-0408008DB3, the S/N of the RAID chassis would be 008DB3.
  • Use sccli command   sccli> Reset Nvram    When complete, exit sccli.
  • Power off the RAID chassis using the power switches on the two PCU's\
  • Reinsert the TOP RAID controller.
  • Power up the RAID array.
  • From sccli, issue the   sccli> show redundancy   command, and verify the Status is "Enabled"
  • Verify the IP address is set correctly on the array with  sccli> show ip    If it is incorrect, reset the IP address from console. Using the serial Console Menu Interface select "view and Edit Configuration Parameters" ->Communication Parameters->TCP/IP Address.
  • Restore the Configuration. In this example, the show_configuration.xml file is restored via the in-band scci device c4t4d0
    /opt/SUNWsscs/sbin/s3kdlres /var/tmp/show_configuration.xml --device=/dev/rdsk/c4t4d0s2 
    
  • See Document Sun StorEdge 3000 Family RAID Controller Firmware Migration Guide for more details on the s3kdlres utility 
    

 

Internal Note:

Restore the Configuration. In this example, the show_configuration.xml file is restored via the in-band scci device c4t4d0

/opt/SUNWsscs/sbin/s3kdlres /var/tmp/show_configuration.xml --device=/dev/rdsk/c4t4d0s2

To display s3kdlres syntax
/opt/SUNWsscs/sbin/s3kdlres -h
usage: /opt/SUNWsscs/sbin/s3kdlres <XMLfn>
           [--device=<dev>] [--password=<pwd>] [--cli=<fn>]
           [--pretend] [--quiet] [--restore=<config>]
           [--timeout=<timeout>] [--help]

     <XMLfn>            => XML configuration file
     --device=<dev>     => in-band or out-of-band device. Default:  192.168.1.1
     --password=<pwd>   => Password for OOB interface
     --cli=<fn>         => Location of the cli
     --pretend          => Don't run cli commands, just echo
     --quiet            => Do not echo cli commands as they are run
     --restore=<config> => Restore configuration section
                      <config> is one of "channels", "maps", "settings", or "all"
     --timeout          => Timeout for cli commands in seconds. Default:  1200
     --nolog            => Disable Logging.
     --help             => Displays this message

To recreate the SE3000 configuration from A previous show_configuration.xml file from either explorer or se3kxtr issue
/opt/SUNWsscs/sbin/s3kdlres [previous show_configuration.xml file ]
--device=[ip-address of the array or /dev/dsk/[cxtxdxs2 or ses]] --restore=channels

Next

/opt/SUNWsscs/sbin/s3kdlres [previous show_configuration.xml file ]
--device=[ip-address of the array or /dev/dsk/[cxtxdxs2 or ses]] --restore=settings

Finally

/opt/SUNWsscs/sbin/s3kdlres [previous show_configuration.xml file ]
--device=[ip-address of the array or /dev/dsk/[cxtxdxs2 or ses]] --restore=maps

 

 

text

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback