CTR Replication On CLINK Hung or Slow Migrations

Asset ID:	1-72-1940570.1
Update Date:	2017-10-09
Keywords:

Solution Type Problem Resolution Sure

Solution 1940570.1 : CTR Replication On CLINK Hung or Slow Migrations

Applies to:

Sun StorageTek Virtual Tape Control Software (VTCS) - Version 7.0 and later
Sun StorageTek Enterprise Library Software (ELS) - Version 7.0 and later
Sun StorageTek VSM5 System - Version All Versions and later
Sun StorageTek VSM System - Version 5 and later
IBM z/OS on System z

Symptoms

Customer saw slow Replications and Migrations (not at the same time) while using VSM5.

VTCS will issue the SLS6946E message if the resource being Replicated or Migrated extends past the typical 10 minute timer.

SLS6946E HOST hostid (PROCESS ID id#, A SS TASK) HAS WAITED xx MINUTES FOR VTV volser HELD BY HOST hostid

The hostids listed in the SLS6946E message may be the same host.

Cause

One potential cause is that the VSM5 has a bug in the micro-code that will, at times, cause very slow Replications or Migrations.

Solution

To determine if the problem is caused by the microcode fault it will be necessary to take an IUP manual statesave for analysis by support. Once the statesave files are uploaded support will then examine the statesave content to verify whether or not the microcode fault referenced in this knowledge document was encountered.

The VSM5 micro-code bug is fixed with D02.19 firmware which is scheduled for release in mid-March 2015.

1. One of the work arounds is to Cancel (via VTCS) the Process ID holding the lock to the resource in question. Use the following VTCS commands to determine the process ID to be canceled:

D LOCK
D TASK
D ACT DET
D Q DET

2. Another work around is to initiate a Disruptive Statesave to the VTSS.

The following are two work arounds:

1. One of the work arounds is to Cancel (via VTCS) the Process ID holding the lock to the resource in question. Use the following VTCS commands to determine the process ID to be canceled:

D LOCK
D TASK
D ACT DET
D Q DET

2. Another work around is to initiate a Disruptive Statesave to the VTSS.

** Internal **

Instructions for VSM support -

How to identify the “Stuck IUP Unavailable” bug

Step 1: obtain an IUP statesave, upload it to the Engineering server (not SPLAT) and run ‘fmttrce’. This will produce a rxxxxxxx.trc file for that statesave.

Step 2: grep for ‘Om_iup_unavail_cond: 0000000A’ in the trace file and pipe it to an output file.

. Example - grep 'Om_iup_unavail_cond: 0000000A' r0129a04.trc > stuck.out

Step 3: Review each trace entry for a repeating occurrence on the same IUP. The IUP number is just to the right of the timestamp.

A stuck IUP will repeat the trace entry every .2sec (200msec) .

Example of “STUCK” condition trace: this indicates IUP6 working with port C018(vcf12/0) is stuck

07:06:19.327485 6: T000 DF2E0018-90010210-00000000-0000000A = NTRC_MONITOR_ISR, C018 …

07:06:19.527509 6: T000 DF2E0018-9102010C-F8000000-0000000A = NTRC_MONITOR_ISR, C018 …

07:06:19.727539 6: T000 DF2E0018-90010210-00000000-0000000A = NTRC_MONITOR_ISR, C018 …

07:06:19.927564 6: T000 DF2E0018-9102010C-F8000000-0000000A = NTRC_MONITOR_ISR, C018 …

** end of Internal **

Attachments

This solution has no attachment