![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1998049.1 : Solaris 10 Hung While Booting With Errors mpt_cmd_timeout: Restarting HBA - Failed Internal Disk Controlled by Mpt Driver
In this Document
Created from <SR 3-10481332201> Applies to:Sun SPARC Enterprise M9000-32 Server - Version All Versions and laterSolaris Operating System - Version 10 9/10 U9 and later Sun SPARC Enterprise M8000 Server - Version All Versions and later Sun SPARC Enterprise M5000 Server - Version All Versions and later Sun SPARC Enterprise M4000 Server - Version All Versions and later Information in this document applies to any platform. SymptomsSolaris 10 Server has continuous disk error messages "Disconnected command timeout for Target 1" and format command takes too much time to complete, these are some error messages observed: Mar 27 13:37:57 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2):
Mar 27 13:37:57 server01 passthrough command timeout Mar 27 13:37:57 server01 scsi: [ID 365881 kern.info] /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 27 13:37:57 server01 Rev. 2 LSI, Inc. 1064 found. Mar 27 13:37:57 server01 scsi: [ID 365881 kern.info] /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 27 13:37:57 server01 mpt2 supports power management. Mar 27 13:37:58 server01 scsi: [ID 365881 kern.info] /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 27 13:37:58 server01 mpt2: IOC Operational. Mar 27 13:39:15 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 27 13:39:15 server01 passthrough command timeout
Mar 31 11:06:09 server01 scsi: WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2):
Mar 31 11:06:09 server01 Disconnected command timeout for Target 1 Mar 31 11:07:19 server01 scsi: WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 31 11:07:19 server01 Disconnected command timeout for Target 1 Mar 31 11:08:30 server01 scsi: WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 31 11:08:30 server01 Disconnected command timeout for Target 1
Mar 31 11:16:53 server01 scsi: [ID 107833 kern.warning] WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2):
Mar 31 11:16:53 server01 Disconnected command timeout for Target 1 Mar 31 11:16:54 server01 scsi: [ID 243001 kern.info] /pci@10,600000/pci@0/scsi@1 (mpt2): Mar 31 11:16:54 server01 mpt_check_scsi_io: IOCStatus=0x48 IOCLogInfo=0x31140000
Disk c2t1d0 is not visible in format: AVAILABLE DISK SELECTIONS:
0. c0t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@0,600000/pci@0/scsi@1/sd@0,0 1. c0t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@0,600000/pci@0/scsi@1/sd@1,0 2. c2t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@10,600000/pci@0/scsi@1/sd@0,0 3. c3t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@14,600000/pci@0/scsi@1/sd@0,0 4. c3t1d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848> /pci@14,600000/pci@0/scsi@1/sd@1,0
Also, raidctl command takes too much time to complete, after 90 min : # raidctl -l
Controller: 0 Disk: 0.0.0 Disk: 0.1.0 Controller: 2 Disk: 0.0.0 Controller: 3 Disk: 0.0.0 Disk: 0.1.0 Controller: 7
As there are known mpt driver problems with errors <passthrough command timeout> and hung issues In particular, 150400-22 and later revisions contain fixes for these relevant bugs:
After installing a cluster of Solaris 10 recommended patches, with kernel patch 150400-17, system hung while booting with these errors on console: Apr 04 18:40:50 CEST 2015 WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2):
Apr 04 18:40:50 CEST 2015 mpt_cmd_timeout: Restarting HBA Apr 04 18:41:00 CEST 2015 WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2): Apr 04 18:41:00 CEST 2015 mpt_cmd_timeout: Restarting HBA
the original errors for the failed disk: SunOS server01 5.10 Generic_147440-19 sun4u sparc SUNW,SPARC-Enterprise
modinfo.out:163 7b284000 3a1c0 213 1 mpt (MPT HBA Driver v1.113) Apr 04 19:52:00 CEST 2015 Apr 4 19:52:00 server01 scsi: WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2): Apr 04 19:52:01 CEST 2015 Apr 4 19:52:00 server01 Disconnected command timeout for Target 1 Apr 04 19:53:11 CEST 2015 Apr 4 19:53:11 server01 scsi: WARNING: /pci@10,600000/pci@0/scsi@1 (mpt2): Apr 04 19:53:11 CEST 2015 Apr 4 19:53:11 server01 Disconnected command timeout for Target 1 CauseThis server boots from the internal disks c0t0d0 and c2t0d0 (mirrored under svm) but c2t1d0 is defective, or this other one Bug ID 17594186 mpt_accept_tx_waitq: failed to accept cmd on queue and hang --> fixed by kernel patch 150400-22, not installed by customer --> Closed as duplicate of 20237135 SolutionReplace internal disk c2t1d0 ( on this M9000 server it was located on IOU#1/HDD#1 ), that solved the issue.
In the case the system cannot boot due to the failed disk, physically remove the disk , that way you should be able to boot.
Bug ID 20475363/21348068/20237135 has been observed with kernel patches 150400 with revisions 9 and 17, lower than 19. There are several mpt bugs fixed with kernel patch 150400-19 and 150400-22 and finally 150400-28. If you are installing any new kernel patch and have disks under mpt driver, in order to avoid this issue, make sure to install : Solaris 10: Sparc:<SunPatch:150400-28> (or above) SunOS 5.10 Sparc : Kernel Patch <SunPatch:150309-02> SunOS 5.10 Sparc : mpt.so patch x86: <SunPatch:150401-28> (or above) SunOS 5.10 x86 : Kernel Patch <SunPatch:148877-04> SunOS 5.10 x86 : mpt.so patch Solaris 11: This can be found on: Oracle Solaris 11.2 Support Repository Updates (SRU) Index (Doc ID 1672221.1) References<BUG:17594186> - MPT_ACCEPT_TX_WAITQ: FAILED TO ACCEPT CMD ON QUEUE AND HANG<BUG:20475363> - MPT_CMD_TIMEOUT: RESTARTING HBA CAUSING DOMAIN HANG <BUG:21348068> - STUCK COMMAND FOLLOWING MPT_DO_SCSI_RESET <BUG:20237135> - STUCK COMMAND FOLLOWING MPT_DO_SCSI_RESET Attachments This solution has no attachment |
||||||||||||||||||
|