![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1019826.1 : Critical Faults for Unreadable Sectors on a Sun Storage 2500, 2500-M2, 6000 or Flexline RAID Array
PreviouslyPublishedAs 246986 Applies to:Sun Storage Common Array Manager (CAM) - Version 5.0 and laterSun Storage Flexline 240 Array - Version Not Applicable to Not Applicable [Release N/A] Sun Storage 2530-M2 Array - Version Not Applicable to Not Applicable [Release N/A] Sun Storage Flexline 280 Array - Version Not Applicable to Not Applicable [Release N/A] Sun Storage 6140 Array - Version Not Applicable to Not Applicable [Release N/A] All Platforms SymptomsA critical fault is generated similar to the following: Alarm ID : alarm1
Description: The unreadable sectors database is full. Sector count is 1000 Severity : Critical Element : GridCode : 57.66.1074 Date : 2008-12-03 12:33:53 Alarm ID : alarm2 Description: Unreadable sectors exist. Current count is 1024 Severity : Critical Element : GridCode : 57.66.1075 Date : 2008-12-03 12:33:55
CauseThe term "unreadable sector" refers to a volume logical block address that has been rendered completely unreadable due to a disk media-related double fault condition on redundant volumes, or a disk media-related single fault condition on non-redundant volumes (RAID 0). Any user data contained within the unreadable sector is unrecoverable and should be considered lost. (These types of faults are more commonly referred to as "a two disk failure in a Raid 5" or a "read error on a raid 0") SolutionThe unreadable sector database is used to count the number of logical block allocations (LBA) on a given volume. It can only hold around 1024 entries total. 1. Verify the Critical Fault.
2. Identify the Volume(s) that have the unreadable sectors.
This first example comes from a supportdata created with SANtricity. The unzipped file is called unreadableSectors.txt. We see PHYSICAL errors to disk drive 85.5 Volume Date/Time Volume LBA Tray,Slot Drive LBA Failure Type ------- ---------------------------- ---------- --------- --------- ------------ ora-vol Sun Mar 13 02:59:57 GMT 2011 276637252 85,5 276637252 PHYSICAL This second example comes from a supportdata created by CAM. The unzipped file is called badBlocksData.txt. In this file we see multiple LOGICAL errors to disk 85.5 Volume Date/Time Volume LBA Tray,Slot Drive LBA Failure Type
------------------- ---------------------------- ----------- --------- ---------- ------------ NTINV10CLUST2-vol2 Sat Sep 17 12:11:16 BST 2016 347605573 85,5 173803077 LOGICAL NTINV10CLUST2-vol2 Sat Sep 17 12:11:15 BST 2016 347606597 85,12 173803077 PHYSICAL
A PHYSICAL error is reported by the disk drive itself.
A LOGICAL error is discovered and reported by the controller against the identified disk during rebuild/reconstruction of a Volume. Refer <Document 1021055.1> Troubleshooting Sun Storage[TM] 2500 and 6000 RAID Array Disk Failures to identify if any disk needs to be replaced. The identified disk with PHYSICAL error needs to be replaced and data integrity checks (including a potential restore) need to be done. The identified disk with LOGICAL error is no indication that this disk drive has a HW error and needs to be replaced. Further investigation (mel) is needed to see what disks were involved at the time of error happened. If the badBlocksData.txt file or the unreadableSectors.txt files do not have any entries, collect a supportdata and contact Oracle support for further instruction. Otherwise, proceed to step 3. There are certain circumstances where the badBlocksData.txt or unreadableSectors.txt files will be empty, but the alarm still exists. When this is the case, use the following shell procedure to clear the alarm from the array. Important: The instructions in this document have to be used by an Oracle support engineer who received the required NetApp advanced training to access the shell. If you are not one of these engineers, you are not authorized to use these commands without guidance from one of these engineers. In that case, please open a collaboration SR with a TSC L2 engineer.
This first command will report all alarms on the array. Its possible that only one controller is reporting the alarm. Subsequent repairs should be run from that controller. -> getRecoveryFailureList_MT
See if there are stale entries in the unreadable sectors database. -> vdAll usmShowUnreadableSectorTable
07.xx.xx.xx firmware -> readUnreadableSectorDatabase_MT
Clear the entries form the database. -> clearUnreadableSectors_MT
If the problem still persists, Fix any alarms for Volumes Not On Preferred Path, and repeat the procedure. You may also need to reboot the controller reporting the alarm. 3. Recover and Restore the volume(s) that have the LBA errors
In theory, the unreadable sectors could very well be in a non-data resident piece of the Raid Volume. When this is the case, data may actually be intact. For situations when a restore is not a viable solution, a server side check of the data may be an option. For example, a successful Solaris fsck of the UFS or VxFS file system, or a successful zpool scrub of the ZFS file system may be enough proof that the data is valid. If this is true, then a restore is not required. 4. Clear the unreadable sectors list if it has not been cleared by Step 3Clear the alarm with the CAM GUI
Clearing the alarm with the cli.Location for sscs:
Solaris: /opt/SUNWstkcam/bin/ Linux: /opt/sun/cam/bin/ Windows: C:\Program Files\Sun\Common Array Manager\bin Location for service: Solaris: /opt/SUNWsefms/bin/ Linux: /opt/sun/cam/private/fms/bin/ Windows: C:\Program Files\Sun\Common Array Manager\Component\fms\bin\ Get the array name.
Clearing the alarm with SANtricity
If the issue persists, collect a supportData and contact Oracle Support:
Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.
References<NOTE:1021057.1> - Sun Storage Common Array Manager (CAM): How to Verify Critical Faults for Sun Storage 2500, 2500-M2, 6000 and J4000 Arrays<NOTE:1021055.1> - Troubleshooting Sun Storage[TM] 2500 and 6000 RAID Array Disk Failures <NOTE:1014074.1> - Collecting Support Data for Arrays Using Sun StorageTek SANtricity Storage Manager Attachments This solution has no attachment |
||||||||||||
|