How to Decode Common Array Manager Alarms using the Event or Grid Code

Asset ID:	1-71-1498053.1
Update Date:	2016-12-19
Keywords:

Solution Type Technical Instruction Sure

Solution 1498053.1 : How to Decode Common Array Manager Alarms using the Event or Grid Code

Applies to:

Sun Storage 6140 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2530 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 2510 Array - Version Not Applicable to Not Applicable [Release N/A]
Sun Storage 6580 Array - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Goal

This document will explain the format of Event Codes seen in arrays managed by Sun Storage Common Array Manager (CAM) as well as how to extract more information from this code.

Solution

CAM alarms contain useful information about problems that a given array may be seeing. One of the components of the alarm is the "Grid Code" (also seen as the event code in other forms of the alarm). Here is a typical alarm with the Grid Code highlighted:

Alarm ID   : alarm137
Description: Drive Tray.08.Drive.10 failed.
Severity   : Critical
Element    : t8drive10
GridCode   : 63.66.1023
Date       : 2012-10-11 18:38:04

Using the Grid or Event Code, we can decipher additional information about a problem. This code is arranged in 3 parts separated by periods. In the example above, those parts are 63, 66 and 1023. The first part is used to identify the Source. The value of 63 equates to a Sun StorageTek 6540 array (other possibilities are listed in the first table below). The second part is the Event Type. The value of 66 equates to a "ProblemEvent" (other possibilities are listed in the second table below). And the last part is a Description of the Event Type. The complete list of possible values for these Descriptions is too large for this document as well as the Description itself is self evident.


Source Value	Source
7.yy.zzzz	Management Host
48.yy.zzzz	Sun StorageTek 6130
57.yy.zzzz	Sun StorageTek 6140
59.yy.zzzz	StorageTek Flexline 380
63.yy.zzzz	Sun Storagetek 6540
69.yy.zzzz	Sun StorageTek 2530
70.yy.zzzz	Sun StorageTek 2540
72.yy.zzzz	StorageTek Flexline 280
73.yy.zzzz	Sun StorageTek 2510
74.yy.zzzz	StorageTek Flexline 240
77.yy.zzzz	Sun Storage J4200
78.yy.zzzz	Sun Storage J4400
79.yy.zzzz	Sun StorageTek 6580
80.yy.zzzz	Sun StorageTek 6780
83.yy.zzzz	Sun Storage J4500
90.yy.zzzz	Sun StorageTek 6180
92.yy.zzzz	Sun StorageTek 2530M2
93.yy.zzzz	Sun StorageTek 2540M2
86.yy.zzzz	Sun StorageTek F5100

INTERNAL ONLY:

There are additional sources for arrays that CAM was supposed to support but never made it and should never be seen:


Source Value	Source
84	B6000
85	NEM
94	6190
95	6590

xx.66.9999 is a dummy GridCode which is returned by CAM each time it does not match the Fault ID from SYMbol to the array model. See for example <Document 1519083.1> Sun Storage Common Array Manager (CAM) Returns the Alarm with Event Code 93.66.9999 and Fault ID 434 for a Sun Storage 2500-M2 Array.


Event Value	Event Type	Notes
xx.4.zzzz	Value Change Event Resolved	Reporting Change to Optimal
xx.5.zzzz	Value Change Event Problem	Reporting Change to Non-Optimal
xx.10.zzzz	Audit Event	Weekly Internal Audit
xx.11.zzzz	Communications Established Event	Management Communication
xx.12.zzzz	Communications Lost Event	Management Communication
xx.14.zzzz	Discovery Event	Initial Array Discovery
xx.19.zzzz	Location Change Event	Changes to Customer Information Page
xx.20.zzzz	Log Event	Typically a Diagnostic Test Issue
xx.22.zzzz	Quiesce End Event	IO Successfully Quiesced
xx.23.zzzz	Quiesce Start Event	IO Quiescence Started
xx.25.zzzz	State Change Event	Reporting Change to Optimal
xx.26.zzzz	State Change Event	Reporting Change to Non-Optima
xx.40.zzzz	Component Insert Event	Component Inserted
xx.41.zzzz	Component Remove Event	Component Removed
xx.64.zzzz	Problem Change Event	Previously Reported Problem Changed
xx.65.zzzz	Problem Clear Event	Previously Reported Problem Fixed
xx.66.zzzz	Problem Event	Problem being Reported
xx.74.zzzz	Revision Baseline Event	Array is at Firmware Baseline
xx.75.zzzz	Revision Delta Event	Array is not at Firmware Baseline

Events that are not errors will not generate alarms. For example, Revision Baseline Event indicates that the array is at the baseline. As such, no further actions are needed.

Using the command ras_admin and the Grid Code, it is possible to obtain additional information about the failure, including the array type and what to do to resolve the problem:

# ./ras_admin advisor -e 90.66.1023
Event Code         : 90.66.1023
Event Type         : 6180.ProblemEvent.REC_FAILED_DRIVE
Severity           : 0
Sample Description : Drive {0} failed.
Probable Cause     : A drive has failed.
Recommended Action : Replace the disk drive.

The ras_admin command can be found in the following locations:

Solaris: /opt/SUNWsefms/bin
Linux: /opt/sun/cam/private/fms/bin
Windows: <Drive>: \Program Files\Sun\Common Array Manager\Component\fms\bin

Since ras_admin is only processing the Event Code and not an actual alarm, things like location will not match the actual alarm. In this case, ras_admin has no idea which drive location actually failed and so it leaves a value in {}. If there is more than one value, the subsequent instances will increment (1..2..).

Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

Attachments

This solution has no attachment