Asset ID: |
1-75-1388897.1 |
Update Date: | 2017-05-12 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1388897.1
:
Troubleshooting Sun Storage[TM] 2500 and 6000 Array Drive Tray Lost Redundancy Events
Related Items |
- Sun Storage 6180 Array
- Sun Storage 6580 Array
- Sun Storage 2540 Array
- Sun Storage 2510 Array
- Sun Storage 6780 Array
- Sun Storage 2540-M2 Array
- Sun Storage 6140 Array
- Sun Storage 2530 Array
- Sun Storage 2530-M2 Array
- Sun Storage 6540 Array
- Sun Storage 6130 Array
|
Related Categories |
- PLA-Support>Sun Systems>DISK>Arrays>SN-DK: 6140_6180
- _Old GCS Categories>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
|
In this Document
Applies to:
Sun Storage 6140 Array - Version Not Applicable and later
Sun Storage 6180 Array - Version Not Applicable and later
Sun Storage 2540 Array - Version Not Applicable and later
Sun Storage 6130 Array - Version Not Applicable and later
Sun Storage 2530 Array - Version Not Applicable and later
Information in this document applies to any platform.
Purpose
The purpose of this document is to help troubleshoot Drive/Drive-Tray Lost Redundancy events for Sun Storage[TM] 2500 and 6000 Arrays.
Symptoms include:
- Critical Fault for Drive <Tray.xx.Drive.xx> lost redundancy (xx.66.1032) or REC_LOST_REDUNDANCY_DRIVE
- Critical Fault for Enclosure tray <Tray.xx> lost redundancy (xx.66.1033) or REC_LOST_REDUNDANCY_TRAY
- Critical Fault for Lost communication with <Tray.xx.IOM.x> (xx.66.1034) or REC_LOST_REDUNDANCY_ESM
Please validate that each troubleshooting step below is true for your environment. Each step will provide instructions via a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
Troubleshooting Steps
1. Verify the Critical Faults on the array.
Reference <Document 1021057.1> Sun Storage Common Array Manager (CAM): How to Verify Critical Faults for Sun Storage 2500, 2500-M2, 6000 and J4000 Arrays.
- If the critical fault is for only Drive, collect Supportdata and proceed to Step 2.
- If there is critical fault for IOM/Tray along with the drive above, collect Cabling diagram along with Supportdata and proceed to Step 2.
- If the critical fault is for only IOM/Tray, collect Cabling diagram along with Supportdata and proceed to Step 3.
Reference <Document 1002514.1> Collecting Sun Storage Common Array Manager Array Support Data.
Reference <Document 1014074.1> Collecting Support Data for Arrays Using Sun StorageTek[TM] SANtricity Storage Manager.
2. Identify Drive Details from alarms.txt (recoveryGuruProcedures.html in case of SANtricity).
- If you use Sun Storage Common Array Manager:
- Extract and open the alarms.txt file from the supportdata.
- Get TrayID, DriveID and Channel information from the alarm xx.66.1032.
Example:
Alarm ID : alarm1
Description: Drive Tray.45.Drive.05 lost redundancy, IOM N/A, working channel: 5.
Severity : Critical
Element : t45drive5
GridCode : 80.66.1032
Date : xx-xx-xx
- If you use Sun StorageTek[TM] SANtricity Storage Manager:
- Extract and open the recoveryGuruProcedures.html file from the supportdata.
- Get TrayID, DriveID and Channel information from the Failure Entry NO_REDUNDANCY_DRIVE
Example:
Storage array: ST6540
Component reporting problem: Drive in slot 8
Status: Optimal
Location: Drive tray 1
Component requiring service: 8
Service action (removal) allowed: No
Service action LED on component: No
Working channel: 2
- Proceed to Step 4 to identify Working and Affected Channels.
3. Identify Tray Details from alarms.txt (recoveryGuruProcedures.html in case of SANtricity).
Reference Examples in Step 2 to get TrayID, Working Channel information from alarms.txt or recoveryGuruProcedures.html.
Proceed to Step 4 to identify Working and Affected Channels.
4. Identify Affected and Working Channels:
Locate the 'luall' output by opening the stateCaptureData.dmp file, and searching for the keyword 'luall'. If 'luall' output is not available (as in the case of old firmware revisions 6.x), search for 'iditnall'.
Locate the Affected Drive/Tray as mentioned in the previous steps, and identify the Affected and Working Channels by following one of the examples below:
Example 1 (for array firmware 7.x):
Executing luall(0,0,0,0,0,0,0,0,0,0) on controller A:
.......Logical Unit........: :.Channels..:Que ............IOs............:
Devnum Location Role :ORP : 0 1 2 3 4 :Dep Qd Open Completed Errs : OldestCmdAge(ms)
---------- -------- ------ :--- : - - - - - :--- --- ----- ---------- ----- : ----------------
00020000 t0 Encl :++ : A B : 1 0 0 38399 3 0
00010100 t0,s1 FCdr :+++ : * + : 16 0 0 5934 2 0
00010101 t0,s2 FCdr :+++ : + * : 16 0 0 5935 4 0
Important fields to look here:
'Location' Column - t0,s1 - indicates Tray0, Slot1
'Channels' Column
0 1 2 3 4 . . . - Drive Channel information. Here it starts from 0. Channel-0 here represents Channel-1 in storageArrayProfile or alarms.txt output, and so on.
'A' or 'B' under Channels - Reported for only Trays, having A and B for a tray indicates the drive is redundant.
'*' under Channels - Active Path
'+' under Channels - Standby Path
'D' or 'd' or '-' or ' ' (No charactor) under Channels - Standby path is not available and needs further investigation.
Note1: Working Channel will always be seen with '*'
Note2: For Simplex (Single Controller) Array configuration, it's expected to see only Active path and Standby path will not be seen.
Example 2 (for array firmware 6.x):
iditnall...
...IDITN..:.......Logical Unit........:Port :............IOs...........
iditn Ch : Devnum Location Role :Byte : Qd Open Completed Errs
------ -- :---------- -------- ------ :---- :--- ----- ---------- -----
73 3 :d<00020001 t1 Encl :0x71 : 0 0 19181 2 0
-< 72 1 :d<00010200 t1,s1 FCdr :0x10 : 0 0 5151 4 0
69 3 :d<00010200 t1,s1 FCdr :0x10 : 0 0 5151 4 0
-< 71 1 :d<00010201 t1,s2 FCdr :0x11 : 0 0 5151 4 0
68 3 :d<00010201 t1,s2 FCdr :0x11 : 0 0 5151 4 0
Important fields to look here:
'Location' Column - t1,s1 - indicates Tray1, Slot1
'Ch' column - Drive channels the tray/slot participating into.
Symbols before "iditn" and "Devnum":
-< Disconnected
=< Rejecting IO requests
#< Restricted, suspended, blocked, reserved
d< Degraded
In this example output, Tray1 is working on Channel 3, but it's redundancy is degraded indicated with the symbol "d<". Channel 1 is affected, because it's disconnected as per the symbol "-<"
Detailed Explanation of symbols for Oracle TSE:
Symbols appearing before Device numbers:
-< = no IT Nexus connected
=< = logical unit rejecting IO requests
#< = logical unit restricted or suspended
d< = logical unit degraded... look at the ORP
ORP Column = Operation, Redundancy, Performance
Operation = the state of the ITN currently chosen
+ = chosen itn is not degraded
d = chosen itn is degraded
Redundancy = the state of the redundant ITN
+ = alternate itn is up
d = alternate itn is degraded
- = alternate itn is down
x = there is no alternate itn
Performance = Are we using the preferred path?
+ = chosen itn is preferred
- = chosen itn is not preferred
= no itn preferences
Channels column indicates the state of the itn on that channel
* = up and chosen
+ = up and not chosen
D = degraded and chosen
d = degraded and not chosen
- = down
x = not present
- If only a drive is seen with single path and all the other drives in the same tray have both paths available, the drive may need to be replaced, follow <Document 1021055.1> Troubleshooting Sun Storage[TM] 2500 and 6000 RAID Array Disk Failures.
- If all the drives are seen with single path and it is controller tray, one of the controllers may not be working properly. To check for controller issue and take appropriate action(s), follow <Document 1021113.1> Sun Storage 2500, 2500-M2, 6000 and Flexline Arrays: Troubleshooting RAID Controller Failures.
- If all the drives are seen with Single path and it is expansion tray, proceed to Step 5.
5. Verify other alerts in alarms.txt (recoveryGuruProcedures.html in case of SANtricity).
- If any alarm exists for Failed IOM on the same tray, the IOM may need to be replaced. Proceed to Step 11.
- If any alarm exists for Minihub failed and/or SFP failed, the SFP may need to be replaced. Proceed to Step 11.
- If no such alarm exists, proceed to Step 6.
6. Physically locate the Affected Channel using information collected in Step4.
- Default channel numbers and their location for 6130 array:

- Default channel numbers and their location for 6140 array:

- Default channel numbers and their location for 6180 array:

- Default channel numbers and their location for 6540 array:

- Default channel numbers and their location for 6580/6780 array:

- Default channel numbers and their location for 2540/2530/2510 array:

7. Trace the cable connectivity from the Affected Tray in the Affected Channel.
CAUTION: Do not disconnect any cables on the working channel. Doing so may cause a possible loss of data accessibility.
- If the array is 6000 series, proceed to Step 8.
- If the array is 2500 series, proceed to Step 9.
8. Verify the 7 segment LED status code of IOM.
Internal Note for Oracle Support Engineers:
a. CSM200 Tray has 7 segment LED display. To identify Tray/IOM type, click here.
b. For detailed LED status code description, refer <Document 1021109.1> Sun StorageTek[TM] 6140, 6540, and Flexline 380 Array Controller 7-Segment LED
9. Verify the Port Status LED.
Reference Port Status LEDs for 2500 Series - Check for "Link Fault" LED status.
Reference Port Status LEDs for 6000 Series - Check for "Port Bypass" LED status
- If Amber LED is ON, proceed to Step 10.
- If the LED is OFF, proceed to Step 11.
10. Check the cable going IN to the array in the cabling sequence.
If the cable and/or its corresponding IOM is not connected properly, connect and re-evaluate the alarm.
- If the alarm is cleared, the issue is resolved.
- If the issue is not resolved and Amber LED is ON, the cable and/or SFP would need replacement, proceed to Step 11.
11. Contact Oracle Support and supply:
- Supportdata Collection
- Cabling Diagram (if applicable)
- Results of the above steps (if applicable)
Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.
1021055.1
Attachments
This solution has no attachment