Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1579696.1
Update Date:2016-11-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  1579696.1 :   ACSLS - Encryption-enabled drives show "in Use" and volsers show up only after backup has failed or timed out  


Related Items
  • Oracle Key Manager
  •  
  • Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS)
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Tape Hardware>SN-TP: Tape Library Control Software
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-7714854624>

Applies to:

Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) - Version 7.1 to 8.2 [Release 7.0 to 8.0]
Oracle Key Manager - Version 2.2 to 2.5.2 [Release 2.0]
Information in this document applies to any platform.

Symptoms

ACSLS 'query drive all' shows drives "in use" but without any volser for a very long time.
When the volser is finally displayed, the backup job either has failed or has timed out.

The drives are encrypted-enabled drives and enrolled with the Oracle Key Management software.

Netbackup shows status 85 Input/Output errors during Mount operation:
---------------------------------
Note:  The mount process takes a long time to complete. When the tape is finally loaded, the backup application gets an I/O error and eventually aborts the job.


08/23/2013 18:30:00 - Info bptm (pid=1977) media id D10212 mounted on drive index 8, drivepath /dev/rmt/7cbn, drivename STK.T10000C.001, copy 2
08/23/2013 18:33:02 - Error bptm (pid=1977) read error on media id D10212, drive index 8, reading header block, I/O error
08/23/2013 18:33:03 - Info bptm (pid=1977) EXITING with status 85 <----------

08/23/2013 18:26:20 - Info bptm (pid=20259) media id D10426 mounted on drive index 11, drivepath /dev/nst1, drivename STK.T10000C.000, copy 2
08/23/2013 18:29:22 - Error bptm (pid=20259) read error on media id D10426, drive index 11, reading header block, Input/output error
08/23/2013 18:29:23 - Info bptm (pid=20259) EXITING with status 85 <----------

08/23/2013 18:21:08 - Waiting for scan drive stop StkT10kd6, Media server: mserver
08/23/2013 18:21:10 - granted resource DC7562
08/23/2013 18:21:10 - granted resource StkT10kd6
08/23/2013 18:21:10 - granted resource mserver-hcart3-robot-acs-4
08/23/2013 18:37:30 - Error bpduplicate (pid=1197) host mserver backup id stanli_1377153409 read failed, media manager killed by signal (82).
08/23/2013 18:37:30 - Error bpduplicate (pid=1197) host mserver backupid stanli_1377153409 write failed, media read error (85).
08/23/2013 18:37:31 - Error bpduplicate (pid=1197) Duplicate of backupid stanli_1377153409 failed, media read error (85).
08/23/2013 18:37:31 - Error bpduplicate (pid=1197) Status = no images were successfully processed.


The library error warn info event log shows "Drive is loaded" and "Drive cartridge is present" during Load time:
--------------------------
2013-08-23T17:23:09.667 error 1.1.-1.1.1 3955 "Error from device Code: 601 - Drive is loaded", Data=rewindUnloaderror6441601 ,driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6441 move root hli0 16601
2013-08-23T17:23:13.637 error 1.3.-1.1.1 3955 "Error from device Code: 601 - Drive is loaded", Data=rewindUnloaderror6441601 ,driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6441 move root hli0 16801

2013-08-23T17:35:41.593 error 1.1.-1.1.1 3955 "Error from device Code: 603 - Drive cartridge is present", Data=loaderror6440603000000 D10329T2(driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6440 move root hli0 17801
2013-08-23T17:35:43.493 error 1.3.-1.1.1 3955 "Error from device Code: 603 - Drive cartridge is present", Data=loaderror6440603000000 D10212T2(driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6440 move root hli0 18001



OKM KMA list report shows KMA's not responding:
------------------------------------
3 KMA's are not responding (see RespondingService column):

KMAID                         Name       ... FailedLogin.. KMAVersion Responding RespondingService RespTime  RepLagSize KeysReadyCnt KeysReadyInBackupCnt KeysGenCnt Locked Enrolled HSMStatus
-255598260467144508 DCkma01 ....    0              build1348   true            Responding                0                   0             1002              1002                       0                false    true        Hardware
-287668531459807356 DCkma02 ....    0              build1348   true            Not Responding      85                  0             1009              519                         0                false    true       Hardware
-4395022526292076502 GBkma01 ....   0             build1348    true           Not Responding      156                 0             1002              1002                       0                false    true       Hardware
4249295192349485770 GBkma02 ....    0             build1348    true           Not Responding      114                 0             1005              1005                       0                false    true       Hardware

The dladm list shows:
$ cat dladm.txt
igb0 type: non-vlan mtu: 1500 device: igb0
igb1 type: non-vlan mtu: 1500 device: igb1
igb2 type: non-vlan mtu: 1500 device: igb2
igb3 type: non-vlan mtu: 1500 device: igb3
aggr1 type: non-vlan mtu: 1500 aggregation: key 1
key: 1 (0x0001) policy: L4 address: 0:21:28:fb:67:16 (auto)
  device address speed duplex link state
  igb2 0:21:28:fb:67:16 0 Mbps half down standby <<<<< 
  igb3 0:21:28:fb:67:17 0 Mbps half down standby <<<<<<
  key: 1 (0x0001) policy: L4 address: 0:21:28:fb:67:16 (auto)
  LACP mode: off LACP timer: short
  device activity timeout aggregatable sync coll dist defaulted expired
  igb2 passive short yes no no no no no
  igb3 passive short yes no no no no no

Changes

No known change. 
Customer noted a couple of KMA's in the KMA list are highlighted in pink

Cause

The service network is used by the drives to communicate to the KMAs to get keys
If this link is down. the drives will not be able to get keys and the application will not be able to use the drives
 

Solution

1. Recreate the problem by re-running a failed backup job
2. Gather the following logs:
  - OKM system dump
  - ACSLS output tarfile
  - Library event log
  - Backup application event log
3. Determine that the cause of the reported drive issue is not a hardware issue specific to the drive
4. Determine that the KMAs are responding on both the management network and the service network
5. Determine that there are no ACSLS / Library communication problems
6. If the problem is drive hardware specific, dispatch an FE to fix the drive problem
7. If the problem is with ACSLS / Library communication,
     see <Document: 1020068.1> - Validating ACSLS Host Server Communication with the Sun StorageTek Libraries
8. If the KMAs are not responding on the service network,
  - ping the service network from other nodes in the network
  - login to the switch and check status of connected devices
  - if needed, reset or replace the switch. 
 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback