Information in this document applies to any platform.
Date of Workaround release: 16-Mar-2012
Date of Resolved Release: 03-Apr-2012
***Checked for relevance 05-Sep-2013***
___________________________________
Description
After updating 7210, 7310 or 7410 Storage Appliances to the 2011.1.1.0 or 2011.1.1.1 Storage Appliance Software releases, systems with SAS-1 HBAs and J4400 or J4500 disk shelves may experience multiple false disk failures after an initial real disk fault. The issue is triggered by manual and automatic (phone home) support bundle creation related to diagnosing the initial disk fault. The issue can cause storage pool redundancy characteristics to be degraded and the Storage Appliance Software BUI and CLI to be unresponsive.
Occurrence
This issue can occur on the following:
Sun ZFS 7000 Storage Appliance platforms:
- Sun ZFS 7210 Storage Appliance
- Sun ZFS 7310 Storage Appliance
- Sun ZFS 7410 Storage Appliance
for the above platforms:
- with SAS-1 HBAs (includes revisions B3 and C0)
- with Sun Storage J4400 or J4500 SAS disk shelves
- with ZFS Storage Appliance Software 2011.1.1.0 or 2011.1.1.1
Notes:
1. Sun ZFS platforms 7110, 7120, 7320, and 7420 are not affected by this issue.
2. To determine the current Storage Appliance Software revision, run the following command:
7000:> maintenance system updates list
UPDATE DATE STATUS
ak-nas@2011.04.24.1.0,1-1.8 2011-12-21 22:32:50 current
or:
Do the following from the Browser User Interface (BUI) to access "info" about the release name:
a) Navigate to: Maintenance -> System
b) Click on the "i" next to the "Current System Software" entry in the table of available releases.
A pop-up will show the release, for example: "2010.Q3.4.2"
3. The issue will only occur when the SAS-1 HBA is attached to a J4400 or J4500 disk shelf, so only the disk shelf model needs to be checked. The following command can be run prior to a software update from the software CLI to determine if the system has a J4400 or J4500 Disk Shelf. For example:
7000:> maintenance hardware select chassis-001 show
Properties:
name = 0845QAK004
faulted = false
manufacturer = Sun Microsystems, Inc.
model = J4400
serial = 0845QAK004
revision = 3R53
type = storage
rpm = 7200
path = 1
locate = false
Symptoms
Storage pool redundancy characteristics can be degraded due to one or more disk faults. Normally, several false disk faults will happen after an initial real disk failure occurs. The "Configuration::Storage" screen can be used to determine if a pool is degraded, while the "Maintenance::Hardware" screen can be used to view any faulted disk drives. In addition, the Storage Appliance Software BUI and CLI will normally become unresponsive when this issue occurs.
Workaround
This issue is addressed in the following release:
- ZFS Storage Appliance Software 2011.1.2.1 or later
If the systems are already running Storage Appliance Software release 2011.1.1.0 or 2011.1.1.1 but have NOT experienced any symptoms, it is recommended that the systems be updated to the 2011.1.2.1 release immediately using standard update procedures. This issue is triggered by automatic (phone home) and manual support bundles, so support bundles should not be performed and the phone home service should be disabled until the update is complete.
For customers that ARE experiencing the issue, the following procedure should be used to update the systems to the AK 2011.1.2.1 release. These steps should be done during a maintenance window without any client activity. This issue is triggered by automatic (phone home) and manual support bundles, so support bundles should not be performed until the update is complete.
1. Power off the storage appliance controller from the SP console (both heads in a cluster configuration).
For example:
-> stop /SYS -f
Are you sure you want to immediately stop /SYS (y/n)?y
2. Physically power off all disk shelves. Wait 30 seconds. Power on all disk shelves.
3. Power on the storage appliance controller from the SP console (just one head in a cluster configuration).
For example:
-> start /SYS
4. Turn off the phone home service, and cancel any active support bundles.
For example:
7000:> configuration services scrk disable
7000:> maintenance system bundles select 23eb4cc8-edd2-6a26-f2a4-b1cdf54a68e cancel
5. Update to the AK 2011.1.2.1 release. If the update health checks find any single path or other issues, repeat the procedure starting at Step 1. If update health checks cannot be resolved, contact Oracle Support.
For example:
7000:> maintenance system updates select ak-nas@2011.04.24.2.1,1-1.15 upgrade
6. After the update is complete, go to "Maintenance::Problems" and mark any disk or HBA issues repaired. Normally only one issue was real and it will be re-detected automatically if it occurs again.
For example:
7000:> maintenance problems select problem-000 markrepaired
7. In a clustered configuration, perform steps 3 thru 6 on the other controller head.
After the update is complete, the phone home service may be re-enabled and supported bundles may be taken as needed and the storage appliance may be used as normal.
If you are not able to update the software on your own, contact Oracle Support for assistance.
For a listing of ZFS Storage Appliance Software Releases and version information, see <Document:2021771.1>
Example screen capture of ZFS Storage Appliance (ZFSSA) Software GUI below:

History
16-Mar-2012: Date of Workaround release
03-Apr-2012: Update Description, Occurrence, Symptoms, and Workaround/Resolution - issue is Resolved
05-Sep-2013: Checked for currency/relevance; no change in content
19-Sep-2013: Formatting correction for graphic; no change in content
The zpool status command can be used to view the status of the pool and determine
if several disks are faulted.
7410# zpool status
pool: pool-1
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Thu Jan 26 04:58:37 2012
1023G scanned out of 5.59T at 650M/s, 2h3m to go
173G resilvered, 17.87% done
config:
NAME STATE READ WRITE CKSUM
pool-1 DEGRADED 0 0 0
mirror-0 ONLINE 0 0 0
c4t5000C50015BD3146d0 ONLINE 0 0 0
c4t5000C50015A1F4FAd0 ONLINE 0 0 0
c4t5000C50015A29579d0 ONLINE 0 0 0 (resilvering)
mirror-1 ONLINE 0 0 0
c4t5000C50015B054FCd0 ONLINE 0 0 0
c4t5000C50015B4D829d0 ONLINE 0 0 0
c4t5000C50015A2A714d0 ONLINE 0 0 0 (resilvering)
mirror-2 ONLINE 0 0 0
c4t5000C50015C85D98d0 ONLINE 0 0 0
c4t5000C50015AEA493d0 ONLINE 0 0 0 (resilvering)
c4t5000CCA396DFA143d0 ONLINE 0 0 0
mirror-3 DEGRADED 0 0 0
c4t5000C50015BACD5Bd0 ONLINE 0 0 0
c4t5000C500268C6398d0 ONLINE 0 0 0 (resilvering)
replacing-2 DEGRADED 0 0 0
c4t5000C50015A34E59d0 FAULTED 0 0 0 too many errors
c4t5000C50015BB3195d0 ONLINE 0 0 0 (resilvering)
mirror-4 ONLINE 0 0 0
c4t5000C50015C06936d0 ONLINE 0 0 3 (resilvering)
c4t5000C50015BA853Cd0 ONLINE 0 0 0
c4t5000C50015BAD1B4d0 ONLINE 0 0 0
mirror-5 DEGRADED 0 0 0
c4t5000C50015B07195d0 FAULTED 0 0 0 too many errors
c4t5000C50015BA8814d0 ONLINE 0 0 0
c4t5000C50015A34F57d0 FAULTED 0 0 0 too many errors
mirror-6 ONLINE 0 0 0
c4t5000C5001951DFB4d0 ONLINE 0 0 2 (resilvering)
c4t5000C50015B06692d0 ONLINE 0 0 0
c4t5000C50015B08592d0 ONLINE 0 0 2 (resilvering)
mirror-7 DEGRADED 0 0 0
c4t5000C50015C6D612d0 ONLINE 0 0 0
c4t5000C50015C5EA09d0 ONLINE 0 0 0
spare-2 DEGRADED 0 0 0
c4t5000C50015A329FCd0 FAULTED 0 0 0 too many errors
c4t5000C50019512713d0 ONLINE 0 0 0 (resilvering)
mirror-8 ONLINE 0 0 0
c4t5000C50019511BC8d0 ONLINE 0 0 0
replacing-1 ONLINE 0 0 0
c4t5000C50015BA98DBd0 ONLINE 0 0 0
c4t5000C50015BB1B66d0 ONLINE 0 0 0 (resilvering)
c4t5000C50015CE8A47d0 ONLINE 0 0 0
mirror-9 ONLINE 0 0 0
c4t5000C50015BB8730d0 ONLINE 0 0 0
c4t5000C50015BA838Ad0 ONLINE 0 0 0
c4t5000C50015A654C7d0 ONLINE 0 0 0
mirror-10 ONLINE 0 0 0
c4t5000C50019511DADd0 ONLINE 0 0 0
c4t5000C50015CF74DCd0 ONLINE 0 0 0
c4t5000C50015CE0BA8d0 ONLINE 0 0 0
mirror-11 ONLINE 0 0 0
c4t5000C50015BAA4C3d0 ONLINE 0 0 0
c4t5000C5001957A58Bd0 ONLINE 0 0 0
c4t5000C50015AD5D11d0 ONLINE 0 0 0
mirror-12 DEGRADED 0 0 0
c4t5000C50019513E61d0 ONLINE 0 0 0
spare-1 UNAVAIL 0 0 0 insufficient replicas
c4t5000C5001950F3D3d0 FAULTED 0 0 0 too many errors
c4t5000C50015BB3195d0 FAULTED 0 0 0 corrupted data
c4t5000C50015BAC62Bd0 ONLINE 0 0 0
mirror-13 DEGRADED 0 0 0
c4t5000C50019513CBFd0 ONLINE 0 0 0
c4t5000C50015ADB62Dd0 FAULTED 0 0 0 too many errors
mirror-14 DEGRADED 0 0 0
c4t5000C50019511B25d0 ONLINE 0 0 0
c4t5000C5002693BCA2d0 FAULTED 0 0 0 too many errors
c4t5000C50015BACED5d0 FAULTED 0 0 0 too many errors
mirror-15 ONLINE 0 0 0
c4t5000C50019517FB8d0 ONLINE 0 0 0
c4t5000C50015BAA88Bd0 ONLINE 0 0 0 (resilvering)
c4t5000C500195117D1d0 ONLINE 0 0 0
mirror-16 ONLINE 0 0 0
c4t5000C500195143DEd0 ONLINE 0 0 0
c4t5000C5001A732C28d0 ONLINE 0 0 0
c4t5000C50015BAE31Bd0 ONLINE 0 0 0 (resilvering)
mirror-17 DEGRADED 0 0 0
c4t5000C500195187A0d0 ONLINE 0 0 0
spare-1 UNAVAIL 0 0 0 insufficient replicas
c4t5000C50015BAC4CEd0 FAULTED 0 0 0 too many errors
c4t5000C50015BB1B66d0 FAULTED 0 0 0 corrupted data
c4t5000C50019512471d0 ONLINE 0 0 0
logs
c4tATASTECZEUSIOPS018GBYTESSTM0000C3AEAd0 ONLINE 0 0 0
c4tATASTECZEUSIOPS018GBYTESSTM0000D0CE9d0 ONLINE 0 0 0
cache
c0t0d0 ONLINE 0 0 0
spares
c2t5000C50015BB1B66d0 FAULTED corrupted data
c2t5000C50019512713d0 INUSE currently in use
c2t5000C50015BB3195d0 FAULTED corrupted data
c4t5000C50015BACD4Ed0 AVAIL
errors: No known data errors
See CR 7132238 and 7146187 for more information.
Please send technical questions to:
sunalertpublication_us_grp@oracle.com
and copy the Responsible Engineer/Contributor listed
Internal Contributor/Submitter: Christina.Coons@oracle.com
Internal Eng Responsible Engineer: Christian.Rasmussen@oracle.com
Internal Services Knowledge Engineer: David.Mariotto@oracle.com
Internal Eng Business Unit Group: ZFS Storage Appliance
Attachments
This solution has no attachment