Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1575225.1
Update Date:2017-06-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1575225.1 :   T10000 - (EOD/EOF) Issue : ASC: 0x44, ASCQ: 0xb6;sam_cancel_call: SC_fscancel error  


Related Items
  • Sun StorageTek T10000 Tape Drive
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Tape Hardware>SN-TP: STK T-Series Drive
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Sun StorageTek T10000 Tape Drive - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

3-7508145131
Customer kernel levels and patch bundle installed on the two sites.

This Library that shows more issues
Solaris 10 Update Level: Update 9
147440-27 Kernel Level
HBA: Qlogic 2462
Recommended OS Patchset Solaris 10 SPARC (2013.01.29)
SUNW,T5440

This Library shows less issues still issues are appearing.
Solaris 10 Update Leve: Update 9
148888-04 Kernel Level
HBA: Qlogic 2462
Recommended OS Patchset Solaris 10 SPARC (2013.06.18)
SUNW,T5440

Note:
The library that is more up to date in kernel is working better than the one secure that is not that update.
Still both seems to be up to date with recommended patches.
-----------------------------------------------------------------------------------------------------------------------------
Customer Verified FC switches and did not find any issues with it.

Symptoms

 Writing EOD, EOF (End of Data, End of File) showing issues on Multiple Drives.

Solaris Messages would show things like this on multiple drives.
 
 scsi: [ID 107833 kern.warning] WARNING: /pci@700/pci@0/pci@c/SUNW,qlc@0,1/fp@0,0/st@w500104f000ae56f4,0 (st6):
 Error for Command: write_file_mark         Error Level: Fatal
 scsi: [ID 107833 kern.notice]    Requested Block: 68                        Error Block: 68
 scsi: [ID 107833 kern.notice]    Vendor: STK                                Serial Number: .205
 scsi: [ID 107833 kern.notice]    Sense Key: Hardware_Error
 scsi: [ID 107833 kern.notice]    ASC: 0x44 (<vendor unique code 0x44>), ASCQ: 0xb6, FRU: 0x0

Note:
ASC:0x44 ASCQ:0xb6 is a Oracle vendor specific FSC -> related to a data path issues.
-----------------------------------------------------------------------------------------------------------------------------
Solaris messages would show a couple of disconnections of the FC that were for concern.

 fctl: [ID 517869 kern.warning] WARNING: fp(1)::N_x Port with D_ID=110900, PWWN=500104f000af366e reappeared in fabric
 fctl: [ID 517869 kern.warning] WARNING: fp(2)::GPN_ID for D_ID=110a00 failed
 fctl: [ID 517869 kern.warning] WARNING: fp(2)::N_x Port with D_ID=110a00, PWWN=500104f000af3667 disappeared from fabric

-----------------------------------------------------------------------------------------------------------------------------
SamFS dev logs would show the following.
2013/07/18 10:01:16 1059 [14249:6359] dir_io.c:306 detail Ready for data transfer
2013/07/18 10:01:33 1066 [14249:6359] dir_io.c:463 time Wrote 65011712 bytes, time 17 seconds
2013/07/18 10:01:33 3205 [14249:6359] tape.c:149   resource ptr: 0xfeaecee0, process_eox: 0, process_wtm: 0, wrote_tm: 0
2013/07/18 10:01:33 3220 [14249:6359] tape.c:158 detail Creating EOF/EOV label
2013/07/18 10:01:33 3001 [14249:6359] tape.c:209 Error: mt_erreg = 0x4 [Drive Hardware Error]


After a analysis on the pointers and address the writing of the EOF is in the correct address but failing with HW (0x4 [Drive Hardware Error]) issue by the drive.

 

Changes

- Multiple Drives have been Replaced over the time on this account on the last time.

- The customer sees on random drives a failure when writing the EOD to the tapes.

- This issue has been going on for at least a month.

- Customer has two sites one remote not secure and one local secure.

- Remote (not secure) seems to show less issues, local site shows a lot of issues.

- SAMFS will mark the medias as bad media.

- No explorer available as site is secure.

Cause

Causes has been tracked to HBA QLC driver or firmware level.

Seems that short command write EOD to the drive gets stuck from time to time and generates issues on the drive side.

 -----------------------------------------------------------------------------------------------------------------------------

A KB of an old SAMFS version related similar cases to QLC Drivers. SAM-QFS: Intermittent errors interrupt staging operation "Direct I/O timed out" (Doc ID 1018937.1)
 [ID 706137 kern.notice] NOTICE: SAM-QFS: samfs_np4arch: sam_cancel_call: SC_fscancel error: rdev: 0 rm_pid: 0 fh_pid: 16962

Note: Still not all references of the issue were found on the customer logs that are been reference in the KB.

Solution

After working and gathering all the data and multiple pieces of the information point to a FC,  still no information points to a specified device showing issues.  Switch issues were discarded by Customer by reviewing the logs.

As all drives show the same issue we can pinpoint issues to server side of things. 

SAMFS is informing about a time out on the HBA ("sam_cancel_call: SC_fscancel error: rdev").

As normal backups were working fine and the issue is specifically to  EOD/EOF command,  issue seems to be with short commands in the HBA.

Customer verified patch for QLC and was not up to date.  Current patch when this KB was written:  149175-02 SunOS 5.10: qlc

After Patch was installed all issues have disappear and case was solved.

 

NOTE:

QLC Patches are not added in Solaris Bundles they are required for Customers to be installed manually.

References

<NOTE:1018937.1> - SAM-QFS: Intermittent errors interrupt staging operation "Direct I/O timed out"

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback