Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1911993.1
Update Date:2014-07-30
Keywords:

Solution Type  Problem Resolution Sure

Solution  1911993.1 :   Sun Storage 7000 Unified Storage System: All 16 data drives cannot be seen in BUI / CLI after drive replacement (7110 Only)  


Related Items
  • Sun Storage 7110 Unified Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-9350605191>

Applies to:

Sun Storage 7110 Unified Storage System - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

The customer had a drive failure, HDD2, in a Sun Storage 7110 Unified Storage System.

When the drive was replaced, all the drives went in to a removed state in BUI / CLI.

chassis-000  zfsls30    ok        Sun Microsystems, Inc.  Sun Storage 7110                                         

cpu-000      CPU 0      ok        AMD                     Quad-Core AMD Opteron(tm) Processor 2347 HE               unknown
cpu-001      CPU 1      absent    -                       -                                                         -
disk-000     HDD 0      absent    -                       -                                                         -                   --
disk-001     HDD 1      absent    -                       -                                                         -                   --
disk-002     HDD 2      absent    -                       -                                                         -                   --
disk-003     HDD 3      absent    -                       -                                                         -                   --
disk-004     HDD 4      absent    -                       -                                                         -                   --
disk-005     HDD 5      absent    -                       -                                                         -                   --
disk-006     HDD 6      absent    -                       -                                                         -                   --
disk-007     HDD 7      absent    -                       -                                                         -                   --
disk-008     HDD 8      absent    -                       -                                                         -                   --
disk-009     HDD 9      absent    -                       -                                                         -                   --
disk-010     HDD 10     absent    -                       -                                                         -                   --
disk-011     HDD 11     absent    -                       -                                                         -                   --
disk-012     HDD 12     absent    -                       -                                                         -                   --
disk-013     HDD 13     absent    -                       -                                                         -                   --
disk-014     HDD 14     absent    -                       -                                                         -                   --
disk-015     HDD 15     absent    -                       -                                                         -                   --


The akdiskmap gave no output, but format shows all the drives correctly.


Other commands show the drive being successfully replaced:

pool-0.history:2014-07-16.02:23:47 [internal vdev attach txg:11374131] replace vdev=/dev/dsk/c0t5000CCA00A356E00d0s0 for vdev=/dev/dsk/c0t5000C5000F77A807d0s0 [user root on cmsls30]


cfgadm-alv.txt:c1::a,0                        connected    configured   unknown    Client Device: /dev/dsk/c0t5000CCA00A356E00d0s0(sd18)


But the Alerts logs show all the drives go into a removed state.

Wed Jul 16 02:36:23 2014
nvlist version: 0
       class = alert.ak.xmlrpc.hardware.disk.removed
       source = svc:/appliance/kit/akd:default
       chassis_uuid = 3f0cf36e-9f1f-6a54-e542-e946b4ba1d67
       chassis_label = zfsls30
       fru = hc://:server-id=zfsls30:chassis-id=1234567/chassis=0/bay=15/disk=0
       fru_label = HDD 15
       uuid = 7a1e906e-4cd5-4b6f-b5e1-a55ce293ca67

Wed Jul 16 02:36:23 2014
nvlist version: 0
       class = alert.ak.xmlrpc.hardware.disk.removed
       source = svc:/appliance/kit/akd:default
       chassis_uuid = 3f0cf36e-9f1f-6a54-e542-e946b4ba1d67
       chassis_label = zfsls30
       fru = hc://:server-id=cmsls30:chassis-id=1234567/chassis=0/bay=14/disk=0
       fru_label = HDD 14
       uuid = a083c84b-c614-ca43-9281-b19378b53976


The Alert log shows the Drive HDD15 to HDD0 - are all shown in a removed state
.

 

Changes

This issue seems to have been triggered by the drive HDD 2 replacement.

## It is a Hitachi drive H103014SC rather than a Seagate ST914602  - but was recognised by ZFS.

2014-07-16 02:32:53.516 diskname = c0t5000CCA00A356E00d0
2014-07-16 02:32:53.516 devid = id1,sd@n5000cca00a356e00/a
2014-07-16 02:33:02.223 successfully onlined device          <<<<  New disk online.


### The New drive replaced the failed drive and started resilvering okay.

 spare-11                   DEGRADED     0     0     0
   replacing-0              DEGRADED     0     0     0
     c0t5000C5000F77A807d0  FAULTED      0     0     0  external device fault  <<<<  Old HDD15
     c0t5000CCA00A356E00d0  ONLINE       0     0     0  (resilvering)          <<<<  New Drive
     c0t5000C5000F715827d0  ONLINE       0     0     0                         <<<<  spare

  

Cause

When the drive was replaced, these alerts were generated in the debug.sys:

Jul 16 02:24:38 cmsls30 scsi: [ID 243001 kern.warning] WARNING: /pci@0,0/pci10de,375@f/pci1000,3150@0 (mpt0):
Jul 16 02:24:38 cmsls30         Disconnected command timeout for target. Vendor='LSILOGIC' Product='SASX28 A.0' Serial='unknown' Revision='5021' SAS=500605b00002453d Command=0x12 'inquiry' (pkt_time=60 abort_count=0 target=24



See also SR 3-6768433201 - which also saw the same issue.


Solution

Replace the internal SAS HBA (Dual 4x3Gb Internal SAS HBA) - PN 371-3255



See KM 1386224.1  How to replace a PCIe-card in a Sun ZFS Unified Storage Appliance head.

 

References

<NOTE:1019887.1> - Sun Storage 7000 Unified Storage System: How to Collect a Support Bundle using the BUI or CLI
<NOTE:1386224.1> - How to replace a PCIe-card in a Sun ZFS Unified Storage Appliance:ATR:1386224.1:2
<NOTE:1416406.1> - Sun ZFS Storage Appliances Troubleshooting Resource Center

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback