![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1534963.1 : How to verify a bad disk in a cougar controlled Raid 5 array
This document will demonstrate where to look in order to verify a bad disk in a cougar controlled RAID 5 array. In this Document
Created from <SR 3-6900218801> Applies to:Sun SPARC Enterprise T5120 Server - Version All Versions and laterSun Netra T5440 Server - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise T5220 Server - Version All Versions and later Sun SPARC Enterprise T5240 Server - Version All Versions and later Sun SPARC Enterprise T5140 Server - Version All Versions and later Information in this document applies to any platform. SymptomsCustomer reports a disk (0,11) in a cougar controlled array has disappeared from the configuration. CauseFirst we check with the "/opt/StorMan/arcconf getconfig 1" output and look for the missing disk, in this case it's in Logical Device 2 (LD_2): ... ...
The disk doesn't appear here or in the Physical Device information section: ... Device #2 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,10 Reported Location : Enclosure 0, Slot 2 Reported ESD : 2,0 Vendor : SEAGATE Model : ST930003SSUN300G Firmware : 0868 Serial number : 00100371LSP2 3SE1LSP2 World-wide name : 5000C5001D2033BC Size : 286102 MB Write Cache : Disabled (write-through) FRU : None S.M.A.R.T. : No Device #3 Device is a Hard drive State : Online Supported : Yes Transfer Speed : SAS 3.0 Gb/s Reported Channel,Device : 0,12 Reported Location : Enclosure 0, Slot 4 Reported ESD : 2,0 Vendor : SEAGATE Model : ST930003SSUN300G Firmware : 0868 Serial number : 00100271HQ7F 3SE1HQ7F World-wide name : 5000C5001D097D34 Size : 286102 MB Write Cache : Disabled (write-through) FRU : None S.M.A.R.T. : No Normally, a failed disk would show in the above output of the devices. In this case, disk 0,11 has disappeared from the configuration. This is likely a hard failure for which the Cougar card can't reach the disk However, the Raid Event Log (/opt/StorMan/RaidEvtA.log) registered the disk failure events: March 4, 2013 9:22:08 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 1c 54 58 00 00 02 00 00 00 00] <------------------ Original error.
March 4, 2013 9:22:18 PM CST INF ILHH241FDA Sense data: Unit attention (BUS DEVICE RESET FUNCTION OCCURRED). Controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 1c 71 96 00 00 02 00 00 00 00], data [70 00 06 00 00 00 00 00 00 00 00 00 29 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] March 4, 2013 9:28:10 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 14 b9 c8 00 00 02 00 00 00 00] March 4, 2013 9:28:21 PM CST INF ILHH241FDA Sense data: Unit attention (BUS DEVICE RESET FUNCTION OCCURRED). Controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 00 01 15 eb 00 00 10 00 00 00], data [70 00 06 00 00 00 00 00 00 00 00 00 29 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] March 4, 2013 9:30:53 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [2a 00 00 06 5c 00 00 02 00 00 00 00] March 4, 2013 9:30:58 PM CST INF ILHH241FDA Sense data: Unit attention (BUS DEVICE RESET FUNCTION OCCURRED). Controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [2a 00 1c 60 04 00 00 02 00 00 00 00], data [70 00 06 00 00 00 00 00 00 00 00 00 29 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] March 4, 2013 9:32:11 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 22 ec b1 e0 00 00 01 00 00 00] March 4, 2013 9:32:27 PM CST INF ILHH241FDA Sense data: Unit attention (BUS DEVICE RESET FUNCTION OCCURRED). Controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 00 cb fe 00 00 02 00 00 00 00], data [70 00 06 00 00 00 00 00 00 00 00 00 29 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] March 4, 2013 9:33:34 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 00 01 a6 00 00 02 00 00 00 00] March 4, 2013 9:33:55 PM CST INF ILHH241FDA Sense data: Unit attention (BUS DEVICE RESET FUNCTION OCCURRED). Controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 22 ec b2 23 00 00 01 00 00 00], data [70 00 06 00 00 00 00 00 00 00 00 00 29 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] March 4, 2013 9:53:41 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [28 00 1c 5d 94 f6 00 00 10 00 00 00] March 4, 2013 9:54:05 PM CST ERR ILHH241FDA Drive in a RAID-5 set failed: controller 1, logical device 2 <-------------- RAID 5 failover March 4, 2013 9:54:05 PM CST ERR ILHH241FDA Disk failed: controller 1, channel 0, SCSI device ID 11 March 4, 2013 9:54:40 PM CST INF ILHH241FDA Container changed: controller 1, logical device 2 March 4, 2013 9:54:40 PM CST INF ILHH241FDA PPI update. Age 278 March 4, 2013 9:54:40 PM CST INF ILHH241FDA PPI update. Age 279 March 4, 2013 9:54:45 PM CST INF ILHH241FDA Container changed: controller 1, logical device 2 March 4, 2013 9:54:46 PM CST INF ILHH241FDA PPI update. Age 280 March 4, 2013 9:54:46 PM CST INF ILHH241FDA Failover disk changed: controller 1, logical device 2 March 4, 2013 9:54:46 PM CST INF ILHH241FDA Failover and rebuild operation started on a RAID-5 set: controller 1, logical device 2 March 4, 2013 9:54:46 PM CST INF ILHH241FDA Container changed: controller 1, logical device 2 March 4, 2013 9:54:46 PM CST INF ILHH241FDA Configuration has changed. March 4, 2013 9:55:01 PM CST WRN 301:A01C-S--L02 ILHH241FDA Logical device is degraded: controller 1, logical device 2 ("LD_2"). March 4, 2013 9:55:01 PM CST ERR 401:A01C0S11L-- ILHH241FDA Failed drive: controller 1, enclosure 0, slot 3, S/N 00100271HQVC 3SE1HQVC (Vendor: SEAGATE Model: ST930003SSUN300G). <--------------- Controller acknowledging the disk is bad. March 4, 2013 9:55:06 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 0%. Controller 1, logical device 2 <------------------- Rebuilding the RAID. March 4, 2013 9:55:06 PM CST INF 304:A01C-S--L02 ILHH241FDA Rebuilding: controller 1, logical device 2 ("LD_2"). March 4, 2013 9:57:33 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 5%. Controller 1, logical device 2 March 4, 2013 10:00:09 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 10%. Controller 1, logical device 2 March 4, 2013 10:02:50 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 15%. Controller 1, logical device 2 March 4, 2013 10:06:03 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 20%. Controller 1, logical device 2 March 4, 2013 10:08:39 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 25%. Controller 1, logical device 2 March 4, 2013 10:11:20 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 30%. Controller 1, logical device 2 March 4, 2013 10:14:51 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 35%. Controller 1, logical device 2 March 4, 2013 10:17:43 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 40%. Controller 1, logical device 2 March 4, 2013 10:20:24 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 45%. Controller 1, logical device 2 March 4, 2013 10:23:31 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 50%. Controller 1, logical device 2 March 4, 2013 10:26:31 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 55%. Controller 1, logical device 2 March 4, 2013 10:27:46 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 1c 00 10 00 00 02 00 00 00 00] March 4, 2013 10:30:39 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 60%. Controller 1, logical device 2 March 4, 2013 10:33:26 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 3f 00 ff 00 00 02 00 00 00 00] March 4, 2013 10:33:26 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 3f 00 ff 00 00 02 00 00 00 00] March 4, 2013 10:34:59 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 65%. Controller 1, logical device 2 March 4, 2013 10:38:08 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 70%. Controller 1, logical device 2 March 4, 2013 10:41:15 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 75%. Controller 1, logical device 2 March 4, 2013 10:44:32 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 80%. Controller 1, logical device 2 March 4, 2013 10:47:51 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 85%. Controller 1, logical device 2 March 4, 2013 10:51:29 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 90%. Controller 1, logical device 2 March 4, 2013 10:55:37 PM CST INF ILHH241FDA Running: RAID 5 rebuild - 95%. Controller 1, logical device 2 March 4, 2013 10:59:36 PM CST INF ILHH241FDA Complete: RAID 5 rebuild - 100%. Controller 1, logical device 2 March 4, 2013 10:59:36 PM CST INF ILHH241FDA PPI update. Age 281 March 4, 2013 10:59:36 PM CST INF ILHH241FDA Container changed: controller 1, logical device 2 March 4, 2013 10:59:36 PM CST INF ILHH241FDA PPI update. Age 282 March 4, 2013 10:59:36 PM CST INF ILHH241FDA RAID-5 rebuild operation completed successfully: controller 1, logical device 2 March 4, 2013 10:59:36 PM CST INF ILHH241FDA Complete: RAID 5 rebuild - 100%. Controller 1, logical device 2 March 4, 2013 10:59:36 PM CST INF ILHH241FDA Configuration has changed. March 4, 2013 10:59:37 PM CST INF 345:A01C-S--L02 ILHH241FDA Logical device is normal: controller 1, logical device 2 ("LD_2"). March 4, 2013 10:59:37 PM CST INF 305:A01C-S--L02 ILHH241FDA Rebuild complete: controller 1, logical device 2 ("LD_2"). March 4, 2013 11:53:13 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 3f 00 ff 00 00 02 00 00 00 00] March 4, 2013 11:53:59 PM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 3f 00 ff 00 00 02 00 00 00 00] March 5, 2013 1:43:28 AM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 3f 00 ff 00 00 00 01 00 00 00] March 5, 2013 1:44:20 AM CST INF ILHH241FDA Sense data: Unit attention (BUS DEVICE RESET FUNCTION OCCURRED). Controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [1a 08 3f 00 ff 00 00 00 01 00 00 00], data [70 00 06 00 00 00 00 00 00 00 00 00 29 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00] March 5, 2013 1:44:56 AM CST WRN ILHH241FDA Command timeout: controller 1, channel 0, SCSI device ID 11, LUN 0, cdb [00 00 00 00 00 00 00 00 01 00 00 00] March 5, 2013 1:50:18 AM CST INF 408:A01C0S11L-- ILHH241FDA Physical drive removed: controller 1, enclosure 0, slot 3, S/N 00100271HQVC 3SE1HQVC. <----------------- Shows the drive was removed.
SolutionReplace disk drive with the appropriate part. References<NOTE:1509311.1> - How to isolate disk problems on an Adaptec RAID controller (Cougar)Attachments This solution has no attachment |
||||||||||||||||||
|