![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Troubleshooting Sure Solution 1353887.1 : Sun Storage J4000 JBOD Array: Troubleshooting Disk Failures
In this Document
Applies to:Sun Storage J4400 Array - Version Not Applicable and laterSun Storage J4500 Array - Version Not Applicable and later Sun Storage J4200 Array - Version Not Applicable and later Information in this document applies to any platform. PurposeThe purpose of this document is to help troubleshoot disk failure symptoms in Sun Storage J4000 JBOD arrays. Symptoms may include:
This document mainly deals with the Solaris Operating System Environment. The instructions may vary for other OS environments. This document does not cover J4000 JBOD arrays connected to Sun Storage 7000 Unified Storage Systems, for which refer to Sun Storage 7000 Unified Storage System documentation.
Troubleshooting Steps1. Verify Host logs to identify the fault(s), and the details of each fault
3. Verify '/var/adm/messages*' file(s) for any SCSI errors Apr 22 04:39:58 host01 scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,378@b/pci1000,3150@0 (mpt0):
Apr 22 04:39:58 host01 scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,378@b/pci1000,3150@0 (mpt0): Apr 22 04:39:58 host01 Disconnected command timeout for Target 17 Apr 22 04:39:58 host01 Disconnected command timeout for Target 17 Apr 22 04:40:00 host01 scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,378@b/pci1000,3150@0 (mpt0): Apr 22 04:40:00 host01 scsi: [ID 365881 kern.info] /pci@7c,0/pci10de,378@b/pci1000,3150@0 (mpt0): Apr 22 04:40:00 host01 mpt_check_task_mgt: Task 3 failed. ioc status = 4a target= 17 Apr 22 04:40:00 host01 Log info 31140000 received for target 17. Apr 22 04:40:00 host01 scsi_status=0, ioc_status=8048, scsi_state=c Apr 22 04:40:00 host01 scsi: [ID 107833 kern.warning] WARNING: /pci@7c,0/pci10de,378@b/pci1000,3150@0 (mpt0): Apr 22 04:40:00 host01 mpt_check_task_mgt: Task 3 failed. ioc status = 4a target= 17 (or) Mar 12 10:01:10 host02 scsi: [ID 107833 kern.warning] WARNING: /pci@0,0/pci8086,3410@9/pci1000,3150@0/sd@b,0 (sd6):
Mar 12 10:01:10 host02 Error for Command: read(10) Error Level: Retryable Mar 12 10:01:10 host02 scsi: [ID 107833 kern.notice] Requested Block: 55060475 Error Block: 55060539 Mar 12 10:01:10 host02 scsi: [ID 107833 kern.notice] Vendor: SEAGATE Serial Number: 01234XXXXX Mar 12 10:01:10 host02 scsi: [ID 107833 kern.notice] Sense Key: Media Error Mar 12 10:01:10 host02 scsi: [ID 107833 kern.notice] ASC: 0x11 (unrecovered read error), ASCQ: 0x0, FRU: 0x0
4. Verify whether Common Array Manager(CAM) application is installed in the host SCSI errors reported in hosts installed with Common Array Manager(CAM), and connected to J4000 JBOD array(s) using Pandora HBA, due to Bug 15638598 - mpt Disconnected timeouts - Pandora HBA connected to two J4500 continually reset. Pandora is an 8-port 3Gbps SAS/SATA HBA - External. Model Number : SG-XPCIE8SAS-E-Z.
You are required to verify whether CAM application is installed in the host. Use pkginfo command as indicated below: # pkginfo -l SUNWsefms
PKGINST: SUNWsefms NAME: Sun Storage Common Array Manager Fault Management Services CATEGORY: application ARCH: all VERSION: 6.8.0,REV=2011.06.04.08.08.24 BASEDIR: /opt VENDOR: Oracle Corporation DESC: The Sun Storage Common Array Manager Fault Management Services
5. Implement the fix for Bug 15638598 The Bug 15638598 is fixed in the HBA firmware 01.33.03.00 located here . Verify the status of 'fmservice': # svcs fmservice
STATE STIME FMRI online Aug_24 svc:/system/fmservice:default
Oracle HBA Engineering Guidance:
We recommend that the fmservice daemon of CAM not be used with topologies that include SATA disks, or, if it must be used to perform maintenance activity, that the daemon be disabled after the maintenance. While the daemon is running, SATA PASSTHRU commands are issued to each attached device and due to the nature of the SATA protocol, this disrupts pending read/write I/O activity leading to a drop in performance. Additionally you may encounter messages of the following form in '/var/adm/messages', these are to be expected while CAM is running the fmservice daemon and will *not* be fixed: scsi: [ID 243001 kern.info] /pci@78,0/pci8086,e08@3/pci1000,3150@0 (mpt0): mpt_check_scsi_io: IOCStatus=0x4b IOCLogInfo=0x31112000 scsi: [ID 107833 kern.warning] WARNING: /pci@78,0/pci8086,e08@3/pci1000,3150@0/sd@13,0 (sd25): Error for Command: read Error Level: Retryable scsi: [ID 107833 kern.notice] Requested Block: 280333 Error Block: 280333 scsi: [ID 107833 kern.notice] Vendor: ATA Serial Number: ABCD1234 scsi: [ID 107833 kern.notice] Sense Key: Unit_Attention scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
6. Check the SCSI errors for any media errors
7. Verify whether the fault(s) is/are observed for multiple drives
8. Verify the physical LED indications of the drive
9. Check the cable connectivity
10. Adjust the cabling as per the documentation and verify whether the host can access the drives properly
Note: Cabling cannot be adjusted while host is online and accessing other enclosure drives. You need to plan a maintenance window to correct the cabling.
13. Open a call for further analysis
Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.
Attachments This solution has no attachment |
||||||||||||||||
|