Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2113367.1
Update Date:2017-08-28
Keywords:

Solution Type  Problem Resolution Sure

Solution  2113367.1 :   Sun Fire[TM] V440 or Netra 440 Server: System went into Hung state due to Single Internal Disk Failure  


Related Items
  • Sun Netra 440 Server
  •  
  • Sun Fire V440 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-V4x0
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-12250369931>

Applies to:

Sun Netra 440 Server - Version All Versions to All Versions [Release All Releases]
Sun Fire V440 Server - Version Not Applicable to Not Applicable [Release NA]
Oracle Solaris on SPARC (64-bit)

Symptoms

System went into hung state due to single disk failure in the server, though normally the single disk failure would have been isolated by Solaris OS or by Solaris Volume Manager (SVM).

Messages file provides no indication of a problem in this particular case:

Feb 26 11:00:08 xxxxxxx genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_150400-07 64-bit
Feb 26 11:00:08 xxxxxxx genunix: [ID 700403 kern.notice] Copyright (c) 1983, 2013, Oracle and/or its affiliates. All rights reserved.
Feb 26 11:00:08 xxxxxxx genunix: [ID 678236 kern.info] Ethernet address = 0:14:4f:5c:1d:4e
Feb 26 11:00:08 xxxxxxx unix: [ID 673563 kern.info] NOTICE: Kernel Cage is ENABLED
Feb 26 11:00:08 xxxxxxx unix: [ID 389951 kern.info] mem = 8388608K (0x200000000)
Feb 26 11:00:08 xxxxxxx unix: [ID 930857 kern.info] avail mem = 8362975232
Feb 26 11:00:08 xxxxxxx rootnex: [ID 466748 kern.info] root nexus = Netra 440

 

Some hang timeout messages were found on the console along with a restarting HBA:

Feb 26 04:30:29 xxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2 (mpt0):
Feb 26 04:30:29 xxxxxxx mpt_cmd_timeout: Restarting HBA

Feb 26 05:09:09 xxxxxxx scsi: [ID 243001 kern.warning] WARNING: /pci@1f,700000/scsi@2 (mpt0):
Feb 26 05:09:09 xxxxxxx Disconnected command timeout for target 0 cmd=0x2a (write(10)) pkt_time=60 abort_count=235
Feb 26 05:09:09 xxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@1f,700000/scsi@2 (mpt0):
Feb 26 05:09:09 xxxxxxx mpt_cmd_timeout: Restarting HBA

 

Cause

This issue was caused by known bug:

  Bug ID 20475363 mpt_cmd_timeout: Restarting HBA causing domain hang--> closed as duplicate of

  Bug ID 21348068 - stuck command following mpt_do_scsi_reset --> fixed by kernel patch 150400-28

Above Bugs have been observed with kernel patches 150400 with revisions lower than -28.

 

Solution

This issue is solved in: 

SunPatch:150400-32 (or above) SunOS 5.10 Sparc : Kernel Patch
SunPatch:150309-02 SunOS 5.10 Sparc : mpt.so patch

Install these patches or higher.


References

<BUG:20475363> - MPT_CMD_TIMEOUT: RESTARTING HBA CAUSING DOMAIN HANG
<BUG:21348068> - STUCK COMMAND FOLLOWING MPT_DO_SCSI_RESET
<NOTE:1998049.1> - Solaris 10 Hung While Booting With Errors mpt_cmd_timeout: Restarting HBA - Failed Internal Disk Controlled by Mpt Driver

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback