Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1312148.1
Update Date:2017-03-23
Keywords:

Solution Type  Problem Resolution Sure

Solution  1312148.1 :   Sun Storage 25xx and 6180 Arrays: Troubleshooting Battery Failures During Learn Cycle  


Related Items
  • Sun Storage 2540-M2 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 6180 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 2530-M2 Array
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Arrays>SN-DK: ST25xx
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Sun Storage 2540-M2 Array - Version Not Applicable and later
Sun Storage 2510 Array - Version Not Applicable and later
Sun Storage 2530 Array - Version Not Applicable and later
Sun Storage 2540 Array - Version Not Applicable and later
Sun Storage 2530-M2 Array - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

If your Sun Storage 25xx arrays running Controller Firmware 07.35.xx.xx experiences Learn Cycle Notifications and Battery Cache failures such as...

A:Fri Oct 15 16:46:42 MEST 2010 : 639 : 0/0/0 : 7310 : Notification : Battery : tray85 : Learn Cycle Started
B:Fri Oct 15 16:46:20 MEST 2010 : 640 : 0/0/0 : 7310 : Notification : Battery : tray85 : Learn Cycle Started
B:Fri Oct 15 23:17:37 MEST 2010 : 647 : 0/0/0 : 730E : Notification : Battery : tray85 : Battery capacity is sufficient

or if your Sun Storage 25xx-M2 or 6180 array reports Learn Cycle Notifications and Battery Cache failures:

B:Mon Jan 16 15:35:34 CET 2012 : 1048 : 0/0/0 : 7310 : Notification : Battery : Tray.99.Controller.B.Battery.B : Learn Cycle Started
A:Mon Jan 16 15:35:34 CET 2012 : 1049 : 0/0/0 : 7310 : Notification : Battery : Tray.99.Controller.A.Battery.A : Learn Cycle Started
B:Mon Jan 16 19:03:03 CET 2012 : 1050 : 0/0/0 : 7302 : Notification : Battery : Tray.99.Controller.B.Battery.B : Battery Capacity Low
A:Mon Jan 16 20:38:26 CET 2012 : 1053 : 0/0/0 : 210C : Notification : Battery : Tray.99.Controller.A.Battery.A : Controller cache battery failed
B:Mon Jan 16 20:38:35 CET 2012 : 1056 : 0/0/0 : 730F : Notification : Battery : Tray.99.Controller.B.Battery.B : Incomplete Learn Cycle
A:Mon Jan 16 20:38:37 CET 2012 : 1057 : 0/0/0 : 730F : Notification : Battery : Tray.99.Controller.A.Battery.A : Incomplete Learn Cycle


Followed by Critical Alarms:

  • Common Array Manager alarm code xx.66.1006:  A cache backup battery has failed. 
  • Common Array Manager alarm code xx.66.1040: A controller cache backup battery has failed.

Refer to <Document 1021057.1> Sun Storage Common Array Manager (CAM): How to Verify Critical Faults for Sun Storage 2500, 2500-M2, 6000 and J4000 Arrays.

Changes

 No changes have occurred on the array, other than the automated Learn Cycle has been run.

Cause

For the 2500 Array only:

  • Bug 15671399 - Exmoor smart battery gets failed during a learn cycle due to i2c bus errors. This is only observed in Sun Storage 2500 arrays running 07.35.xx.xx firmware.  This is caused by i2c bus faults that are falsely raised by the controller firmware. The issue above affects only  the 2510, 2530, and 2540 array models.  No other models, including 2540-M2 and 2530-M2, are impacted by this fault. Also, no other firmware releases are impacted by this fault.  The fix for this is in bundled into firmware 7.35.67.10, starting with release 6.8.1.4 of Common Array Manager. Reference <Bug 15671399>

For the 2530-M2, 2540-M2 and the 6180 Array:

  • Bug 15767945 2500-M2, 6180: Battery Failures for Incomplete Learn Cycle. Reference <Bug 15767945>
  • Bug 15767948 2500-M2, 6180: Rev 07 battery failed during learn cycle. Reference <Bug 15767948>
  • Bug 15752660 Battery Failures for Incomplete Learn Cycle. Reference <Bug 15752660>
  • Bug 15752659 Write cache on StorageTek 6180 becomes disabled while Battery Learning Cycle processing. Reference <Bug 15752659>
  • Bug 15769192 2500-M2/6180: Incomplete battery learn cycle due to inhibit charge. Reference <Bug 15769192>
The fix for all of these issues is bundled into firmware 07.80.62.10, starting with release 6.9.0.18 of Common Array Manager.  The upgrade will only prevent the problem from re-occurring. Follow this troubleshooting doc to address the current issue. 

Solution

Each battery should be analyzed individually for this fault. The 2500 systems can have one or two batteries depending on the controller configuration. The 25xx-M2 and the 6180 arrays have two batteries.


1.  Verify that you have a critical fault of the Battery (xx.66.1006)

      Refer to <Document 1021057.1> Sun Storage Common Array Manager (CAM): How to Verify Critical Faults for Sun Storage 2500, 2500-M2, 6000 and J4000 Arrays.

  • Common Array Manager alarm code xx.66.1006:  A cache backup battery has failed.
  • Common Array Manager alarm code xx.66.1040: A controller cache backup battery has failed.
  • If there are no critical faults for Battery Failure, then you may have a different issue, refer to <Document 1021054.1> Troubleshooting Sun Storage Array Battery Faults.
  • If there is a Battery Failure fault as shown above is for the 2500 or 2500-M2 or 6180 arrays continue to Step 2.

2.   Verify array firmware

      Reference: <Document 1021067.1> Verify Storage Array Firmware via the User Interface.

  • If the firmware is 06.xx.xx.xx, then you may have a different issue, refer to <Document 1021054.1> Troubleshooting Sun Storage Array Battery Faults.
  • For 2500 array, if the firmware is 07.35.xx.xx and below 07.35.67.10, then continue to Step 3.
  • For 2500 array, if the firmware is 07.35.67.10 or above, you can stop here. Your array is not impacted by the issue in scope of this document.
  • For 2500-M2 or 6180 array, if the firmware is 07.80.62.10 or above , you can stop here. Your array is not impacted by the issue in scope of this document.
  • For 2500-M2 or 6180 array, if the firmware is below 07.80.62.10, then continue to Step 3.

3.  Verify the LastLearnStart date from the stateCaptureData output

  • Create a stateCaptureData file for your array using the service command from your CAM management host.
  • Solaris:  /opt/SUNWsefms/bin/
    Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin
    Linux: /opt/sun/cam/private/fms/bin

    service -d <array-name> -c save -t state -p <directory where the output should be saved to> -o <name of the file the output will be saved as>

     

    Example:

    service -d 2540-array -c save -t state -p /var/tmp -o stateCaptureData.dmp

    Executing the save command on 2540-array

    Stats Capture data was written to /var/tmp/stateCaptureData.dmp

    Completion Status: Success

  • List your alarms.

    Refer to <Document 1021057.1> Sun Storage Common Array Manager (CAM): How to Verify Critical Faults for Sun Storage 2500, 2500-M2, 6000 and J4000 Arrays. Get the date of the fault from the alarm:

Example:

Alarm ID   : alarm9
Description: A cache backup battery has failed Tray.85.Battery.A
Severity   : Critical
Element    : t85bat1
GridCode   : 70.66.1006
Date       : 2010-08-23 22:54:25
  • Get the LastLearnStart date from the battery for the controller specified, by opening the stateCaptureData.dmp file you created, and searching for the keyword "LastLearnStart":
    Example:
    Controller          = CTLR_A
    Local Battery Slot  = 1
    DOMI Agent          = Initialized
    Bmgr WakeUp Time    = 07/19/2011 13:04:15
    LastLearnStart Time = 08/24/2010 08:27:23

    There will be an entry for each battery and again for each controller.  So if you have two RAID controllers, you will have a total of four entries in the file (2 for each battery).  Only one entry on either controller is sufficient to verify.
  • If the stateCapture file does not contain a LastLearnStart, go to Step 4.
  • If the difference between the Alarm and the LastLearnStart for the battery slot, is less than or equal to 24 hours (as in the example above), go to Step 5.
  • If the difference between the Alarm and the LastLearnStart for the battery slot, is greater than 24 hours, contact Oracle to have the battery replaced.

4. Verify Event Sequences.

  • For the 2500 Array  Verify the events sequence for 7310, followed by 730E, followed by 210C.
  • For the 2500-M2 Array, or the 6180 Array, verify the events sequence for 7310, followed by 7302, followed by 210A, followed by 210C and 730F.
  • To view in Sun Storage Common Array Manager (CAM)
    • Expand Storage Arrays in the left menu pane.
    • Expand your storage array name in the left menu pane.
    • Expand Troubleshooting in the left menu pane.
    • Click on Events.
    • In the right pane, click on the -|-> icon.  If you mouse over it, it will state Advanced Filter.
    • Set Event to Log Events.
    • Set Event Type to Component.
    • Set Read the last X Kbytes From Log File to 100.
    • Set String Filter to Battery.
    • Click on the Details of any event that is shown.
    • Review the Description Field.
    • Get the value of the array log event ID from the description.
NOTE: String Filters are case-sensitive.
Example:

Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x7310] NOTICE: Learn Cycle Started
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x730E] Battery capacity is sufficient
Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x210C] Controller cache battery failed
  • To view using SSCS CLI:
    • Get the list of events using the sscs utility found in..
      Solaris: /opt/SUNWstkcam/bin/
      Linux: /opt/sun/cam/bin/
      Windows: C:\Program Files\Sun\Common Array Manager\bin
      sscs list -d-t LogEvent -f Battery event
    • Get the event details:
      sscs list -d array_name event event_id
    • Get the value of the array log event ID from the description:

      Example:
      Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x7310] NOTICE: Learn Cycle Started
      Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x730E] Battery capacity is sufficient
      Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x210C] Controller cache battery failed

 

For the 2500 Array:

  • If event 0x210C follows 0x730E, it is due to Bug 15671399 and a firmware upgrade to 07.35.67.10 will be required. The upgrade will only prevent the problem from re-occurring. Continue to Step 5 to clear the battery state. Ref: <Bug 15671399>
  • If event 0x210C does not follow a 0x730E event, contact Oracle to have the battery replaced.

For the 2500-M2 or 6180 Array:

  • If event 0x210C and 730F follows 0x7302, it is due to Bug 15767948 and a firmware upgrade to 07.80.51.10 will be required. The upgrade will only prevent the problem from re-occurring. Continue to Step 5 to clear the battery state. Ref: <Bug 15767948>
  • If event 0x210C and 730F do not follow a 0x7302 event, contact Oracle to have the battery replaced.

5.  Reset the controller of the failed battery

Based on the information supplied, the battery is marked as failed due to bug and the battery is indeed healthy. The workaround is to reset the controller reporting the cache battery failure.

Browser:

  1. Select Physical Devices -> select Controllers.
  2. Select Reset Controller for the controller reporting the Cache Battery Failure.

SSCS:

sscs reset -a array_name controller [A or B]


For example, based on the following alarm, reset Controller A:

Alarm ID   : alarm9
Description: A cache backup battery has failed Tray.85.Battery.A
Severity   : Critical
Element    : t85bat1
GridCode   : 70.66.1006
Date       : 2010-08-23 22:54:25

 

NOTE:  For SIMPLEX arrays (single controller), an outage is required, as the data path will be unavailable during the reset.

 

  • If the battery failure clears after resetting the controller, no further work is required.
  • If the battery failure does not clear, contact Oracle to have the battery replaced.

 

Do you still have questions?  You can use My Oracle Support Communities.  Communities put you in touch with industry professionals like yourself.  They are monitored by Oracle support engineers, so you can expect reliable and correct answers.  Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback