Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2017748.1
Update Date:2017-11-27
Keywords:

Solution Type  Problem Resolution Sure

Solution  2017748.1 :   How to Identify and Remove a Failed Internal Disk in Sun Fire V480, V490 Under MPxIO Control  


Related Items
  • Sun Fire V480 Server
  •  
  • Sun Fire V490 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-Vx90
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-10390214611>

Applies to:

Sun Fire V490 Server - Version All Versions to All Versions [Release All Releases]
Sun Fire V480 Server - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)

Symptoms

A V480 or V490 disk configured in Multipathing Software - MPxIO has failed

Changes

  In this environment, the X6727A 375-3030 Crystal+ [ PCI to FC-AL adapter ] has been used for connectivity to an internal disk backplane which provides redundant data paths to the internal disks.

  V480, V490 is a very simple implementation with 2 disks on a single Backplane. 

  With the Crystal+ card installed, the internal disks can be accessed by 2 paths either the internal controller or through an optional second path from a PCI card which ensures a high redundancy. 

  Internal Controller Path:

  • Disk 1       /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@WWN,0 (cXt0d0 lower disk)
  • Disk 2       /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@WWN,0 (cXt1d0 upper disk)
   Reference: SPARC Platforms: Matrix of Recognized Device Paths (Doc ID 1005907.1)

Cause

   Failed internal disk is under MPxIO control

Solution

  The following example is from a SF V490 server.     

  Disk Identification:

  An internal disk failure is noticed from the iostat and messages:

c9t500000E014036270d0 Soft Errors: 1 Hard Errors: 20 Transport Errors: 2 
Vendor: FUJITSU Product: MAX3147FCSUN146G Revision: 1103 Serial No: 0642G02S2P 
Size: 146.81GB <146810536448 bytes>
Media Error: 10 Device Not Ready: 0 No Device: 10 Recoverable: 1
Illegal Request: 2 Predictive Failure Analysis: 2

Mar 8 07:24:07 syd0902 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g500000e014036270 (ssd250):
Mar 8 07:24:07 syd0902 scsi: [ID 107833 kern.notice] Requested Block: 118721463 Error Block: 118721463
Mar 8 07:24:07 syd0902 scsi: [ID 107833 kern.notice] Vendor: FUJITSU Serial Number: 0642G02S2P
Mar 8 07:24:07 syd0902 scsi: [ID 107833 kern.notice] Sense Key: Soft_Error
Mar 8 07:24:07 syd0902 scsi: [ID 107833 kern.notice] ASC: 0x5d (hardware impending failure data error rate too high), ASCQ: 0x12, FRU: 0x0

From format, the internal disk under MPxIO control are listed as below:

AVAILABLE DISK SELECTIONS:
0. c9t500000E015679BE0d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>
/scsi_vhci/ssd@g500000e015679be0
1. c9t500000E014036270d0 <SUN146G cyl 14087 alt 2 hd 24 sec 848>               <<<<<
/scsi_vhci/ssd@g500000e014036270

  This data alone does not provide enough information to determine which actual internal disk has failed since the internal disks are under multipath control

 Caution : Do not assume from the above that c9t500000E014036270d0 is HDD1,  since it is the second disk.

  * From iostat (note the disk number 500000E014036270 of the faulty disk)

  * From explorer output or the command luxadm display, the two paths to the failed disk are listed as below:

$ more luxadm_display_500000e014036270.out
DEVICE PROPERTIES for disk: 500000e014036270
Status(Port A): O.K.
Status(Port B): O.K.
Vendor: FUJITSU
Product ID: MAX3147FCSUN146G
WWN(Node): 500000e014036270
WWN(Port A): 500000e014036271
WWN(Port B): 500000e014036272
Revision: 1103
Serial Num: 000642G02S2P DU22P7102S2P
Unformatted capacity: 140009.438 MBytes
Read Cache: Enabled
Minimum prefetch: 0x0
Maximum prefetch: 0x0
Device Type: Disk device
Path(s):
/dev/rdsk/c9t500000E014036270d0s2
/devices/scsi_vhci/ssd@g500000e014036270:c,raw
Controller /devices/pci@8,700000/pci@3/SUNW,qlc@4/fp@0,0     <<====== Path 1
Device Address 500000e014036272,0
Host controller port WWN 210100e08bb7e2fb
Class primary
State ONLINE
Controller /devices/pci@9,600000/SUNW,qlc@2/fp@0,0           <<====== Path 2
Device Address 500000e014036271,0
Host controller port WWN 210000144f3b3335
Class primary
State ONLINE

  From above, they are...

   pci@8,700000-pci@3-SUNW,qlc@4 (path from HBA)
   pci@9,600000-SUNW,qlc@2 (internal path)

 The slot position of the internal disks can be determined from the explorer output or the `luxadm -e dump_map` command.

   

$ cat luxadm_-e_dump_map_-devices-pci@8,700000-pci@3-SUNW,qlc@4-fp@0,0:devctl.out
     Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
      0 1 7d 0 210100e08bb7e2fb 200100e08bb7e2fb 0x1f (Unknown Type,Host Bus Adapter)
      1 ef 0 ef 500000e014036272 500000e014036270 0x0 (Disk device)                                    <<<<<<<<<
      2 e8 1 e8 500000e015679be2 500000e015679be0 0x0 (Disk device)
  $
  $ cat luxadm_-e_dump_map_-devices-pci@9,600000-SUNW,qlc@2-fp@0,0:devctl.out
    Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
     0 1 7d 0 270000144f3b3335 260000144f3b3335 0x1f (Unknown Type,Host Bus Adapter)
     1 ef 0 ef 500000e014036271 500000e014036270 0x0 (Disk device)                                    <<<<<<<<<
     2 e8 1 e8 500000e015679be1 500000e015679be0 0x0 (Disk device)

 


   In the above luxadm output, the AL_PA of the Disk Node WWN can be compared to the AL_PA to Slot Chart to determine the disk Slot.
  _____________________________________________
   Base Backplane         SeI ID           AL_PA
 _____________________________________________

  Disk 0                          00                  EF                                                                                <<<<<<<<<
  Disk 1                          01                  E8
______________________________________________

 As Disk WWN 500000e014036270 has an AL_PA of ef. AL_PA of ef is associated with Disk Slot 0.

 Disk removal procedure in Solaris OS :

 ---------------------------------------

 The luxadm remove_device command is used to remove the disk drive from OS:

  

 # luxadm remove_device <Disk_Drive>
     where Disk_Drive is /dev/rdsk/cXtXdXs2

  

  Due to the fact that Multipathing is enabled, we do not see the cXtXdXs2 in the format.
  The disk name in format is the name the OS recognizes the disk by - it may have more than one path, but it's still a single disk
  So use the name that is displayed in format, which is the c9 name.

  So use:

  

   # luxadm remove_device /dev/rdsk/c9t500000E014036270d0s2 

  

Note: As controller 'C9' is a logical name, it will not show up in cfgadm or luxadm display <enclosure_name>

This is a consequence of enabling multipath. 'C9' is the MPxIO 'virtual controller path' hence it will not show up in above commands.

 Refer the below doc for more information on physical disk replacement procedures . 

 Removing and Replacing the Sun Fire[TM] 280R , Sun Fire[TM] V480 ,Sun Fire[TM] V490 ,Sun Fire[TM] V880 ,Sun Fire[TM] V880z or Sun Fire[TM] V890 Hot-Pluggable Internal Disk Drives. (Doc ID 1007367.1)

 

For additional information on the Fibre channel Drive implementation on Workgroup Server See:

Sun Fire[TM] Servers (V480, V490, V880, V890):Troubleshooting Fibre Channel Drives (Doc ID 1383660.1)

References

<NOTE:1005907.1> - SPARC Platforms: Matrix of Recognized Device Paths
<NOTE:1383660.1> - Sun Fire[TM] Servers (V480, V490, V880, V890):Troubleshooting Fibre Channel Drives

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback