Asset ID: |
1-72-1579696.1 |
Update Date: | 2016-11-17 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1579696.1
:
ACSLS - Encryption-enabled drives show "in Use" and volsers show up only after backup has failed or timed out
Related Items |
- Oracle Key Manager
- Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS)
|
Related Categories |
- PLA-Support>Sun Systems>TAPE>Tape Hardware>SN-TP: Tape Library Control Software
|
In this Document
Created from <SR 3-7714854624>
Applies to:
Sun StorageTek Auto Cartridge Sys Lib SW (ACSLS) - Version 7.1 to 8.2 [Release 7.0 to 8.0]
Oracle Key Manager - Version 2.2 to 2.5.2 [Release 2.0]
Information in this document applies to any platform.
Symptoms
ACSLS 'query drive all' shows drives "in use" but without any volser for a very long time.
When the volser is finally displayed, the backup job either has failed or has timed out.
The drives are encrypted-enabled drives and enrolled with the Oracle Key Management software.
Netbackup shows status 85 Input/Output errors during Mount operation:
---------------------------------
Note: The mount process takes a long time to complete. When the tape is finally loaded, the backup application gets an I/O error and eventually aborts the job.
08/23/2013 18:30:00 - Info bptm (pid=1977) media id D10212 mounted on drive index 8, drivepath /dev/rmt/7cbn, drivename STK.T10000C.001, copy 2
08/23/2013 18:33:02 - Error bptm (pid=1977) read error on media id D10212, drive index 8, reading header block, I/O error
08/23/2013 18:33:03 - Info bptm (pid=1977) EXITING with status 85 <----------
08/23/2013 18:26:20 - Info bptm (pid=20259) media id D10426 mounted on drive index 11, drivepath /dev/nst1, drivename STK.T10000C.000, copy 2
08/23/2013 18:29:22 - Error bptm (pid=20259) read error on media id D10426, drive index 11, reading header block, Input/output error
08/23/2013 18:29:23 - Info bptm (pid=20259) EXITING with status 85 <----------
08/23/2013 18:21:08 - Waiting for scan drive stop StkT10kd6, Media server: mserver
08/23/2013 18:21:10 - granted resource DC7562
08/23/2013 18:21:10 - granted resource StkT10kd6
08/23/2013 18:21:10 - granted resource mserver-hcart3-robot-acs-4
08/23/2013 18:37:30 - Error bpduplicate (pid=1197) host mserver backup id stanli_1377153409 read failed, media manager killed by signal (82).
08/23/2013 18:37:30 - Error bpduplicate (pid=1197) host mserver backupid stanli_1377153409 write failed, media read error (85).
08/23/2013 18:37:31 - Error bpduplicate (pid=1197) Duplicate of backupid stanli_1377153409 failed, media read error (85).
08/23/2013 18:37:31 - Error bpduplicate (pid=1197) Status = no images were successfully processed.
The library error warn info event log shows "Drive is loaded" and "Drive cartridge is present" during Load time:
--------------------------
2013-08-23T17:23:09.667 error 1.1.-1.1.1 3955 "Error from device Code: 601 - Drive is loaded", Data=rewindUnloaderror6441601 ,driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6441 move root hli0 16601
2013-08-23T17:23:13.637 error 1.3.-1.1.1 3955 "Error from device Code: 601 - Drive is loaded", Data=rewindUnloaderror6441601 ,driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6441 move root hli0 16801
2013-08-23T17:35:41.593 error 1.1.-1.1.1 3955 "Error from device Code: 603 - Drive cartridge is present", Data=loaderror6440603000000 D10329T2(driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6440 move root hli0 17801
2013-08-23T17:35:43.493 error 1.3.-1.1.1 3955 "Error from device Code: 603 - Drive cartridge is present", Data=loaderror6440603000000 D10212T2(driveType=T10000c-Enc,driveFirmware=RL53.315-5.30 Rc:6440 move root hli0 18001
OKM KMA list report shows KMA's not responding:
------------------------------------
3 KMA's are not responding (see RespondingService column):
KMAID Name ... FailedLogin.. KMAVersion Responding RespondingService RespTime RepLagSize KeysReadyCnt KeysReadyInBackupCnt KeysGenCnt Locked Enrolled HSMStatus
-255598260467144508 DCkma01 .... 0 build1348 true Responding 0 0 1002 1002 0 false true Hardware
-287668531459807356 DCkma02 .... 0 build1348 true Not Responding 85 0 1009 519 0 false true Hardware
-4395022526292076502 GBkma01 .... 0 build1348 true Not Responding 156 0 1002 1002 0 false true Hardware
4249295192349485770 GBkma02 .... 0 build1348 true Not Responding 114 0 1005 1005 0 false true Hardware
The dladm list shows:
$ cat dladm.txt
igb0 type: non-vlan mtu: 1500 device: igb0
igb1 type: non-vlan mtu: 1500 device: igb1
igb2 type: non-vlan mtu: 1500 device: igb2
igb3 type: non-vlan mtu: 1500 device: igb3
aggr1 type: non-vlan mtu: 1500 aggregation: key 1
key: 1 (0x0001) policy: L4 address: 0:21:28:fb:67:16 (auto)
device address speed duplex link state
igb2 0:21:28:fb:67:16 0 Mbps half down standby <<<<<
igb3 0:21:28:fb:67:17 0 Mbps half down standby <<<<<<
key: 1 (0x0001) policy: L4 address: 0:21:28:fb:67:16 (auto)
LACP mode: off LACP timer: short
device activity timeout aggregatable sync coll dist defaulted expired
igb2 passive short yes no no no no no
igb3 passive short yes no no no no no
Changes
No known change.
Customer noted a couple of KMA's in the KMA list are highlighted in pink
Cause
The service network is used by the drives to communicate to the KMAs to get keys
If this link is down. the drives will not be able to get keys and the application will not be able to use the drives
Solution
1. Recreate the problem by re-running a failed backup job
2. Gather the following logs:
- OKM system dump
- ACSLS output tarfile
- Library event log
- Backup application event log
3. Determine that the cause of the reported drive issue is not a hardware issue specific to the drive
4. Determine that the KMAs are responding on both the management network and the service network
5. Determine that there are no ACSLS / Library communication problems
6. If the problem is drive hardware specific, dispatch an FE to fix the drive problem
7. If the problem is with ACSLS / Library communication,
see <Document: 1020068.1> - Validating ACSLS Host Server Communication with the Sun StorageTek Libraries
8. If the KMAs are not responding on the service network,
- ping the service network from other nodes in the network
- login to the switch and check status of connected devices
- if needed, reset or replace the switch.
Attachments
This solution has no attachment