Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1021054.1
Update Date:2018-01-10
Keywords:

Solution Type  Troubleshooting Sure

Solution  1021054.1 :   Troubleshooting Sun Storage Array SMART Battery Faults  


Related Items
  • Sun Storage 6580 Array
  •  
  • Sun Storage 2540-M2 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 6180 Array
  •  
  • Sun Storage 6780 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 2530-M2 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: ST25xx
  •  

PreviouslyPublishedAs
270028


Applies to:

Sun Storage 2530 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6580 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2510 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2540 Array - Version Not Applicable to Not Applicable [Release N/A]
All Platforms

Purpose

The purpose of this document is to help users identify problems with batteries in the StorageTek 2500, Sun Storage 2500-M2 or Sun Storage 6x80 arrays.  These batteries provide power to the controller's data cache in the event of a power outage.  If you have a Sun Storage Flexline 210, 240, 280, 380 or Sun StorageTek 6130, 6140 or 6540 array, please refer to Document 1392919.1 Troubleshooting Sun Storage[TM] Array non-SMART Battery Faults.

Troubleshooting Steps

The following table contains the most common faults seen by Common Array Manager (CAM) or SANtricity:

Grid ID CAM Critical Fault SANtricity Critical Fault
xx.66.1005 Battery Near Expiration BATTERY_NEAR_EXPIRATION
xx.66.1006 There has been a failure in the ICC battery pack / Battery has failed FAILED_BATTERY
xx.66.1039 Controller Cache Battery Near Expiration NON_FRU_BATTERY_NEAR_EXPIRATION or INTEGRATED_BATTERY_NEAR_EXPIRATION
xx.66.1040 A controller cache backup battery has failed NON_FRU_FAILED_BATTER or FAILED_INTEGRATED_BATTERY
xx.66.1091 Battery Tray.xx.Battery.xx has transitioned to an unknown state BATTERY_UNKNOWN_STATE
xx.66.1101 There has been a failure in the ICC battery pack FAILED_BATTERY_SYSTEM
xx.66.1176 Battery has a full charge capacity below the replacement capacity threshold BATTERY_REPLACEMENT_REQUIRED
xx.66.1254 Battery has expired EXPIRED BATTERY
xx.66.1255 Battery has expired EXPIRED_INTEGRATED_BATTERY
xx.66.1261 Battery is over temperature BATTERY_OVERTEMP


Other possible conditions include:

  • You just replace the battery, but it still shows failed.
  • Amber LED lit on battery
  • Amber LED lit on array


Batteries are monitored by two methods, an Expiration Timer and SMART (Self-Monitoring Analysis and Reporting Technology) battery technology.  In some cases it is possible to have both methods active on the same array.

  • Only the SMART battery technology is used if :
    • the array type is a 25x0-M2
    • the array type is a 6x80 with controller firmware version 07.77.13.10 and higher
    • the array type is a 25x0 with controller firmware version 07.35.74.10 and higher
  • Otherwise, both SMART battery technology and Expiration Timer are in use (including all 25x0 arrays on any version of 06.xx controller firmware)

The Expiration Timer is a simple counter whereas the newer SMART battery technology internally tests the ability of the batteries to hold a charge.  Both are used to determine battery replacement but the Expiration Timer is also susceptible to outside conditions which can lead to reports of premature failures.

Oracle will no longer replace SMART batteries unless there is sufficient evidence that the battery has actually failed.  For additional details on SMART battery technology, see <Document 1207186.1> SMART Battery Functionality in 2500 and 6000 Arrays.

  1. Verify the Array Type, Firmware Version and Fault.

    Since the steps to resolve battery issues will differ based on the Hardware and Firmware involved, it is necessary to gather this information in order to determine the proper troubleshooting steps.

    • To determine the Array type, see <Document 1021066.1> Verify Sun Storage[TM] Array Array type via the User Interface.
    • To determine the Firmware version, see <Document 1021067.1> Verify Storage[TM] Array Firmware via the User Interface.
    • To determine the Faults, see <Document 1021057.1> Sun Storage Common Array Manager (CAM): How to Verify Critical Faults for Sun Storage 2500, 2500-M2, 6000 and J4000 Arrays.

    If there are no faults but the battery is not optimal, go to step 9.

    The following table lists the most common faults associated with batteries.  If you have an array with redundant batteries and both batteries have a fault, each fault should be evaluated on it's own. Sometimes a single remedy will fix multiple faults.  If you have a single battery with multiple faults, go to step 9, contact Oracle support.

    Critical Fault Array Type Firmware Version Remedy
    Battery Near Expiration 6x80 < 7.77 Go to step 6
    25x0 Any Go to step 6
    Battery Expired 25x0/6x80 Any Go to step 5
    Over Temperature 25x0/25x0-M2/6x80 Any Go to step 2
    Replacement Required 25x0/25x0-M2/6x80 Any Go to step 9
    Battery Failed 25x0 >=7.35.67.10 Go to step 9
    >=7.35.10.10 and <7.35.67.10 Got to step 3
    6.x Go to step 4
    25x0-M2/6180 >=7.80.51.10 Go to step 9
    25x0-M2/6180 >=7.77 Go to step 3
    6180 < 7.77 Go to step 3
    6580/6780 < 7.77 Go to step 3
    6580/6780 >= 7.77 Go to step 9
    Unknown 25x0/25x0-M2/6x80 Any Go to step 8
    Full charge capacity below replacement threshold 25x0/25x0-M2/6x80 Any Replace Battery

    If you do not see your critical fault in the above list, proceed to step 2.
  2. Battery Temperature is Out of Range.

    The 6180 and 25x0-M2 arrays running 7.77 firmware can falsely give you this error message.  It will typically clear by itself, but if it happens during an array check, the alarm can be logged.  A resolution for this issue can be found in <Document 1392313.1> Random "BBU Overheated" for Battery Backup Unit on Sun Storage 2500-M2 and 6180 Arrays. This is a known <Bug 15762521>

    If after implementing the fix for this issue the problem remains, go to step 9.
  3. Check for Failures in Battery Learn Cycle.

    Please refer to <Document 1312148.1> Troubleshooting 25xx and 6180 Storage[TM] Array Battery Failures During Learn Cycle.  If this does not resolve the problem and the array type is 25x0 running firmware version 6.x, proceed to step 4.  For all other arrays,  proceed to step 5.
  4. Check Life Remaining.

    In the case of a 25x0 array running firmware version 6.x, SMART battery hardware is used but down rev firmware can create a premature failure.  Run the command below to obtain the Life Remaining value.

    # sscs list -d myarray -t Battery fru Tray.85.Battery.A

    Element Name        : Tray.85.Battery.A
    Element Status      : Optimal
    Enabled State       : Enabled
    FRU Number          : 1T71200441PS
    FRU Type            : Battery
    Firmware            : N/A
    Id                  : SUN.371-2482-01.1T71200441PS
    IdentifyingNumber   : 1T71200441PS
    *Life Remaining      : 288 Days*
    ManufactureDate     : Wed Feb 28 19:00:00 EST 2007
    Model    

    • If the Life Remaining value is 0, go to step 6.
    • If the Life Remaining value is between 1 and 1095, go to step 9.
    • If the Life Remaining value is greater than 1095 or negative, go to step 5.

     

  5. Confirm Array System Time is correct.

    Batteries that have the Expiration Timer active are subject to premature failures if the array system time gets improperly set. Typically this is the result of a rogue NTP server.  Use <Document 1021108.1> Verifying and Setting Sun Storage[TM] Array System Time, to verify the array system time.  If the system time is incorrect, search the majorEventLog.txt (from supportdata bundle) to see if a rogue NTP server is the cause.

    # grep NTP majorEventLog.txt
    Description: Controller clocks set via NTP or SNTP
    Description: Controller clocks set via NTP or SNTP
    #

    If you find any instances of the above, you can reset the array system time but the problem is likely to return unless the rogue NTP server is addressed.  Reset the array system time and wait 5 minutes.

    • If the critical fault clears, no further action is needed.
    • If the critical fault remains and you recently replaced the battery, or the battery is expired, go to step 6.
    • If the critical fault remains, the array system time is correct and the battery has not recently been replaced, to to step 9.

    The Real Time Clock on the array may also cause premature battery expirations and faults. Use <Document 1581167.1> How to Check and Repair the Real-Time Clock (RTC) of the Sun Storage 2500, 2500-M2, and 6000 Arrays to verify this internal clock is correct. The RTC may be incorrect if

    • battery ages reflect unrealistic date and time stamps, and, no rogue NTP servers were identified in the majorEventLog
    • you successfully reset the battery age only to have it immediately fail again with an unrealistic age.

  6. Reset the Battery Counter.
    Go to <Document 1021695.1> How to Reset the Cache Backup Battery Age for Sun Storage 2500, 2500-M2, and 6000 Arrays.  If the reset of the age fails, proceed to step 7.


    For 25x0 arrays it is possible to edit the NVSRAM to negate the Expiration Timer.  This edit will be permanent unless the array's NVSRAM is reloaded or upgraded. 


    /* `service` is under:
    /* Solaris: /opt/SUNWsefms/bin/
    /* Linux:  /opt/sun/cam/private/fms/bin/
    /* Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin\

    # service -d <arrayname> -c read -q nvsram region=0xEE
    # service -d <arrayname> -c set -q nvsram region=0xEE offset=0x2D value=0xFF
    # service -d <arrayname> -c set -q nvsram region=0xEE offset=0x2E value=0xFF
    # service -d <arrayname> -c read -q nvsram region=0xEE

    Example:

    # service -d myarray -c read -q nvsram region=0xEE
    Executing the read command on myarray

            Controller A Region Id = (238) REGION_USER_CONFIG_DATA

              0000: 0000 c220 0000 0000 0050 0600 0000 0000    ... .....P......
              0010: 0000 0000 0000 0000 f001 0000 8080 0000    ................
              0020: 0000 0000 0000 0000 8c86 008a 0000 0000    ................
              0030: 80be 9f41 1300 2000 0f00 1400 0000 0000    ...A.. .........

            Controller B Region Id = (238) REGION_USER_CONFIG_DATA

              0000: 0000 c220 0000 0000 0050 0600 0000 0000    ... .....P......
              0010: 0000 0000 0000 0000 f001 0000 8080 0000    ................
              0020: 0000 0000 0000 0000 8c86 008a 0000 0000    ................
              0030: 80be 9f41 1300 2000 0f00 1400 0000 0000    ...A.. .........

    Completion Status: Success
    #
    # service -d myarray -c set -q nvsram region=0xEE offset=0x2D value=0xFF
    Executing the set command on myarray
    Completion Status: Success
    #
    # service -d myarray -c set -q nvsram region=0xEE offset=0x2E value=0xFF
    Executing the set command on myarray
    Completion Status: Success
    #
    # service -d myarray -c read -q nvsram region=0xEE
    Executing the read command on myarray

            Controller A Region Id = (238) REGION_USER_CONFIG_DATA

              0000: 0000 c220 0000 0000 0050 0600 0000 0000    ... .....P......
              0010: 0000 0000 0000 0000 f001 0000 8080 0000    ................
              0020: 0000 0000 0000 0000 8c86 008a 00ff ff00    ................
              0030: 80be 9f41 1300 2000 0f00 1400 0000 0000    ...A.. .........

            Controller B Region Id = (238) REGION_USER_CONFIG_DATA

              0000: 0000 c220 0000 0000 0050 0600 0000 0000    ... .....P......
              0010: 0000 0000 0000 0000 f001 0000 8080 0000    ................
              0020: 0000 0000 0000 0000 8c86 008a 00ff ff00    ................
              0030: 80be 9f41 1300 2000 0f00 1400 0000 0000    ...A.. .........

    Completion Status: Success
    #
    If the storageArrayProfile.txt file (of supportdata), inidicates the Days until Replacement is -1, then the expiration timer is NOT being used on the array. It has either been turned off, (per the above procedure), or is not utilized by the version of firmnware.

          Battery status: Optimal
             Location: Controller in slot A
             Age: 365 day(s)
             Days until replacement: -1
     
  7. Reseat the Battery.

    Batteries in 25x0 and 25x0-M2 array controllers are not externally accessible and require that the controller be removed in order to reseat the battery. This will create an interruption of the datapath to that controller. On an active system, you must have a dual controller configuration and a healthy multipathing environment configured to allow the IO access through the surviving controller. Single controller (Simplex) arrays (some 2510/2530/2540 arrays) require an outage as there is no redundant datapath.

    Batteries in 6x80 array controllers may be reseated without interrupting the datapath.


    A premature failure of the battery may not clear the failed state unless it is reseated.  To resolve this it is necessary to reseat the battery in order to reset the Battery Installation Date.  Use the CAM Service Advisor or SANtricity Recovery Guru for the specific steps.  Once this is completed, repeat the previous step (6) to reset the battery age.
  8. Battery in an Unknown State.

    Please see <Document 1283914.1> Sun Storage Common Array Manager (CAM) Reports "Battery has transitioned to an unknown state" for further troubleshooting.
  9. Contact Oracle Support.

    Either the battery needs to be physically replaced, or further investigation by Oracle Support is required.

    Please collect Array Support Data output :

    • <Document 1002514.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] Common Array Manager.
    • <Document 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.

    Log a Service Request, including the Array Support Data output.

Do you still have questions?  You can use My Oracle Support Communities.  Communities put you in touch with industry professionals like yourself.  They are monitored by Oracle support engineers, so you can expect reliable and correct answers.  Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback