Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1951504.1
Update Date:2017-08-16
Keywords:

Solution Type  Problem Resolution Sure

Solution  1951504.1 :   DAT-72 - ASC: 0x0 ASCQ: 0x2 : Volume_Overflow, End-of-Media Detected, End of Partition/Medium Detected  


Related Items
  • Sun Storage DAT 72 Tape Drive
  •  
  • Sun SPARC Enterprise T5220 Server
  •  
  • Sun SPARC Enterprise T2000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Tape Hardware>SN-TP: OEM Drive and Library
  •  


This document presents how to deal with media errors on DAT 72 tape drives with the following symptoms seen in host OS logs:
- ASC/ASCQ: 0x0/0x2
- Volume_Overflow
- End-of-Media Detected
- End of partition/medium detected

In this Document
Symptoms
Changes
Cause
Solution


Applies to:

Sun Storage DAT 72 Tape Drive - Version Not Applicable and later
Sun SPARC Enterprise T5220 Server - Version All Versions and later
Sun SPARC Enterprise T2000 Server - Version All Versions and later
Information in this document applies to any platform.

Symptoms

The host OS will report the following error in the /var/adm/messages file:

/var/adm/messages:
Dec  3 13:55:10 xxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@0/pci@0/pci@8/pci@0/pci@8/pci@0/scsi@8/st@0,0 (st3):
Dec  3 13:55:10 xxxxxxxx     Error for Command: write                   Error Level: Fatal
Dec  3 13:55:10 xxxxxxxx scsi: [ID 107833 kern.notice]     Requested Block: 1588995                   Error Block: 1588995
Dec  3 13:55:10 xxxxxxxx scsi: [ID 107833 kern.notice]     Vendor: HP                                 Serial Number:    9   $DR-1
Dec  3 13:55:10 xxxxxxxx scsi: [ID 107833 kern.notice]     Sense Key: Volume_Overflow
Dec  3 13:55:10 xxxxxxxx scsi: [ID 107833 kern.notice]     ASC: 0x0 (end of partition/medium detected), ASCQ: 0x2, FRU: 0x0
Dec  3 13:55:10 xxxxxxxx scsi: [ID 107833 kern.notice]     End-of-Media Detected
Dec  3 13:55:26 xxxxxxxx scsi: [ID 107833 kern.warning] WARNING: /pci@0/pci@0/pci@8/pci@0/pci@8/pci@0/scsi@8/st@0,0 (st3):
Dec  3 13:55:26 xxxxxxxx     Error for Command: write_file_mark         Error Level: Fatal
Dec  3 13:55:26 xxxxxxxx scsi: [ID 107833 kern.notice]     Requested Block: 1588995                   Error Block: 1588995
Dec  3 13:55:26 xxxxxxxx scsi: [ID 107833 kern.notice]     Vendor: HP                                 Serial Number:    9   $DR-1
Dec  3 13:55:26 xxxxxxxx scsi: [ID 107833 kern.notice]     Sense Key: Volume_Overflow
Dec  3 13:55:26 xxxxxxxx scsi: [ID 107833 kern.notice]     ASC: 0x0 (end of partition/medium detected), ASCQ: 0x2, FRU: 0x0
Dec  3 13:55:26 xxxxxxxx scsi: [ID 107833 kern.notice]     End-of-Media Detected

Note that the "Requested Block" and "Error Block" are the same in both situations and they are very high numbers. If the numbers do not match, this error is unlikely to be correctly reported.

There will be a number of soft errors accumulated on this tape drive:

# iostat -E
[...]
st3       Soft Errors: 6 Hard Errors: 0 Transport Errors: 0
Vendor: HP       Product: C7438A           Revision: ZP8B Serial No:    9

Depending on what method is being used for the backup, different error messages will appear and they may be alarming. For example:

Filesystem backup started @ Wednesday, December  3, 2014 02:19:05 PM EAT
Backing up rpool file system
Backing up rpool/ROOT file system
Backing up rpool/ROOT/SDP5 file system
Backing up rpool/var file system
Backing up rpool/var/opt file system
Backing up rpool/var/opt/fds file system
warning: cannot send 'rpool/var/opt/fds@backup20141203x1341': I/O error
Filesystem backup ended @ Wednesday, December  3, 2014 05:41:56 PM EAT

As can be seen in the example above, the drive was writing successfully for 3 hours and 20 minutes when the error appeared. In case of DAT-72 drives, at 5 MB/s write throughput, it is possible to write approximately 18 GB of data in one hour. If the drive was able to continuously write for this long without an exception, it is unlikely to be a read/write issue.

Changes

There may have been a large amount of data imported to the server recently, or the server has been upgraded, with fallback installation taking up disk capacity.

However, even if there are no apparent changes, the server may have been accumulating data and increasing the backup set until it grew beyond tape capacity

Cause

The cause of the issue is that the backup set has grown too large for the tape drive to fit on a single cartridge.

A DAT 72 cartridge can fit 36 GB of data natively (without compression) or up to 72 GB of data with 2:1 compression. This compression ratio is possible, but due to the small size of the drive's buffer, it is difficult to achieve. In the example presented above, the file systems being backed up were 64.3 GB in size:

# df -kl
Filesystem            kbytes    used   avail capacity  Mounted on
[...]
rpool/ROOT/SDP5      140894208 49825270 59212177    46%    /
rpool                140894208      97 59212177     1%    /rpool
rpool/var/opt/fds    140894208 12965342 59212177    18%    /var/opt/fds

In addition to this data, there may have been other data already written to tape in this session. However, at 64 GB, it will already be barely possible to fit this data on the tape drive.

Solution

Workaround:

If the backup set size can be reduced to below tape cartridge's native capacity, the critical backup at this time will complete. This can be achieved by splitting the backup to several cartridges (if your application or script allows to split the backup), by selectively backing up different data to different cartridges or by erasing unnecessary files on your disk storage.

NOTE: Erasing data is risky. Please take extra care that you do not erase critical data by accident, especially if you do not have recent backups! As this is a system administration task, we cannot advise what data may be safely erased. If you cannot determine this on your own, please engage internal or external consultancy services to advise what data may be deleted

Test the workaround by repeating the backup job. If the backup set size is <36 GB and the job still fails, engage Oracle Tape Support in resolving this issue.

If the backup succeeds, do note that this is only a workaround and it will not resolve the issue permanently.  As the backup set grows, you will inevitably hit a hard limit where you will be unable to erase any more data without compromising your system and you are running at risk of being unable to complete critical backups at that point.

Solution

As a proactive measure, please contact your Oracle Sales representative and inquire about backup solutions better suited to your environment. If it is impossible to reduce the backup set size to less than 36 GB without compromising your system, this is the only solution to the issue you are facing.

In addition to offering vastly more capacity, new tape technology increases throughput by an order of magnitude. A half-height LTO5 tape drive in a 1U rackmount takes up the same amount of space as a typical DAT 72 tape drive, but it allow you to write up to 1500 GB of data natively to a compatible LTO-5 cartridge, or up to 3000 GB with compression for a 40-fold increase over DAT 72. Additionally, maximum throughput of an LTO5 drive is 140 MB/s natively, a 28-fold improvement over DAT 72. This throughput is maintained on restore, allowing much faster disaster recovery.

Please bear in mind that replacing hardware will not resolve this issue. Attempting hardware replacement exposes you to unnecessary operational risk as you may remain without backups for considerable time periods.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback