Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1512111.1
Update Date:2018-05-17
Keywords:

Solution Type  Technical Instruction Sure

Solution  1512111.1 :   Identifying internal disk failures and issues on T5120/T5220/T5140/T5240/T5440 and Netra  


Related Items
  • Sun SPARC Enterprise T5240 Server
  •  
  • Sun SPARC Enterprise T5220 Server
  •  
  • Sun Fire T2000 Server
  •  
  • Sun SPARC Enterprise T5120 Server
  •  
  • Sun SPARC Enterprise T5140 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5xx0
  •  


An internal disk has failed in Sparc Enterprise CMT systems. This document provides basic steps for identifying the failed disk and opening a Service Request

Created from <SR 3-6532299447>

Applies to:

Sun SPARC Enterprise T5140 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5240 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5120 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5220 Server - Version All Versions to All Versions [Release All Releases]
Sun Fire T2000 Server
Oracle Solaris on SPARC (64-bit)

Goal

 

  • identify the problem (disk failure) using Solaris commands/files and  the disk type and location (internal or external to System)
  • open a Service Request and submit data for analysis

 

 

Solution

1) Identify the problem (disk failure) using Solaris commands/files and the disk type and location (internal or external to System)

a) look for scsi error messages in the /var/adm/messages(.x) files

Example:

Nov 27 03:28:29 myhost  scsi: [ID 107833 kern.warning] WARNING: /pci@400/pci@0/pci@8/scsi@0 (mpt0):
Nov 27 03:28:29 myhost  Disconnected command timeout for Target 1
Nov 27 03:28:31 myhost  scsi: [ID 365881 kern.info] /pci@400/pci@0/pci@8/scsi@0 (mpt0):
Nov 27 03:28:31 myhost  Log info 31140000 received for target 1.
Nov 27 03:28:31 myhost  scsi_status=0, ioc_status=8048, scsi_state=c

 

  • take note of the path shown in the error messages, in this example: /pci@400/pci@0/pci@8/scsi@0, the error message ("Disconnected command timeout") and the affected target (1)
  • to determine if the errors are logged for an internal disk, you can search for the path in Document 1005907.1 "Matrix of Recognized Device Paths for SPARC systems"

 

Note: internal disks in some Systems may be connected to an Adaptec PCI RAID HBA (aka Cougar), these disks will be shown with a different path than the default.
Refer to Document 1509311.1 "How to isolate disk problems on an Adaptec RAID controller (Cougar)" for troubleshooting them.

b) Use commands provided by Storage Management (Software) to identify a failed disk

 

  • depending on the method used to manage the disks, you can identify a failed disk with the commands provided, for example:

 

  

Solaris Volume Manager metastat look for maintenance entries Solaris Volume Manager Command Line Reference (Doc ID 1011732.1)
Veritas Volume Manager

vxdisk list

look for failed, failing entries

vxprint -th

look for disabled, failed, failing and nodevice entries

Veritas Volume Manager - What logs to gather for troubleshooting (see references)
Solaris ZFS

fmadm faulty

look for faults tagged with ZFS

zpool status -v <Poolname>

look for degraded and unavail entries

How to replace a drive in Solaris[TM] ZFS (Doc ID 1002753.1)
Internal SAS HBA, Hardware RAID

raidctl

to list RAID volumes and disks

raidctl -l <Volume-ID>

to determine if a disk in a volume has failed.

How To Use Hardware RAID on the T2000 T1000 Systems (Doc ID 1009346.1) (commands applicable to T5x40)

 


c) Search output of command "format" for disks listed on path /pci@400/pci@0/pci@8/scsi@0 or the disk ID you identified to be failed using Storage Management Software commands

 

  • from the error message we know that "Target 1" has errors, this disk should appear as cXt1d0 in the list if no mpxio has been configured
  • for troubleshooting disk failures on Systems with mpxio, see Document 1002465.1 "How to verify the health of SCSI logical unit managed under Solaris Multipathing Software (MPxIO)"

 

Example:

AVAILABLE DISK SELECTIONS:
       0. c1t0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
          /pci@400/pci@0/pci@8/scsi@0/sd@0,0
       1. c1t1d0 <drive not available> /pci@400/pci@0/pci@8/scsi@0/sd@1,0
       2. c1t2d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
          /pci@400/pci@0/pci@8/scsi@0/sd@2,0
       3. c1t3d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
          /pci@400/pci@0/pci@8/scsi@0/sd@3,0
       4. c5t5006048452A7FB46d0 <EMC-SYMMETRIX-5772 cyl 1 alt 2 hd 15 sec 128>
          /pci@400/pci@0/pci@d/SUNW,qlc@0,1/fp@0,0/ssd@w5006048452a7fb46,0
       5. c9t5006048452A7FB49d0 <EMC-SYMMETRIX-5772 cyl 1 alt 2 hd 15 sec 128>
          /pci@500/pci@0/pci@d/SUNW,qlc@0,1/fp@0,0/ssd@w5006048452a7fb49,0

 

  • format output lists disk c1t1d0 as "drive not available", the disk type is not shown

 

d) use the command "iostat -En c1t1d0"  to determine the disk type

Example:

c1t1d0 Soft Errors: 1 Hard Errors: 1188 Transport Errors: 34953 
Vendor: SEAGATE  Product: ST914602SSUN146G Revision: 0603 Device Id: id1,sd@n5000c5000bb1d26f 
Size: 146.81GB <146810536448 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 1033 Recoverable: 1 
Illegal Request: 16 Predictive Failure Analysis: 5 

 

  • iostat shows Product ST914602SSUN146G for c1t1d0, a 146GB disk
  • to identify the disk part number, you can enter ST914602SSUN146G in System Handbook search field and select your System in the results

 

2) Open a Service Request and submit data for analysis

 

  • open a Service Request on My Oracle Support or call your local Customer Support Contact
  • state the Hardware Serial Number of the System or external Storage device which contains the failed disk (if the disk could not be determined as external, use Hardware Serial number of the System)

    If you are opening the Service Request in My Oracle Support, select

  • the Hardware Product in the Product field, for example: Sun SPARC Enterprise T5240 Server
  • "Errors or Missing Components" in the Component field
  • "Disk Issues" in the Sub Component field

 

in order to transfer the Service Request to the Hardware Support Group.

Provide an Explorer Output for analysis or - if Explorer cannot be run on the System - provide a transcript of the commands you typed to identify the failed disk.



References

<NOTE:1017301.1> - Veritas Volume Manager - What logs to gather for troubleshooting

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback