Potential FSC3BBB Warm Boots after upgrading from D/H02.17 code.

Asset ID:	1-72-1662904.1
Update Date:	2016-06-30
Keywords:

Solution Type Problem Resolution Sure

Solution 1662904.1 : Potential FSC3BBB Warm Boots after upgrading from D/H02.17 code.

Applies to:

Sun StorageTek VSM5 System - Version All Versions to All Versions [Release All Releases]
Sun StorageTek VSM4 System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

FSC 3BBB warm boots.

We have seen on a few FSC 3BBB warm boots after upgrading from D/H02.17 to higher levels of microcode.

Changes

Code is upgraded from D/H02.17 to a higher level of microcode.

Cause

This is due to leftover metadata corruption from D/H02.17 code.
The D/H02.17 did not detect this because that code did not use the 'Accelerated Skip' function.
After D/H02.18 or higher code levels we begin to use 'Accelerated Skip' which uses the metadata and it fails causing the 3bbb warm boot.

The problem is NOT with D/H02.18.11.00 or higher codes, it is merely the code that finds the bad metadata left behind by D/H02.17 code.
FYI - the metadata is only used in spacing operations with the Accelerated Skip function - NO customer data is affected.

Solution

The short term solution to prevent the FSC 3BBB warm boots when upgrading from D02.17 code is to have customer do a migrate to zero while still on D02.17 code prior to upgrading to D/H02.18 or higher code.
After the code upgrade the customer can recall VTVs as required. The metadata is reconstructed on the recall correctly with D/H02.18 and higher codes.
This metadata corruption is an artifact of running the D/H02.17 code so if upgrading from any code previous to D02.17 there should not be a problem.

If you have already upgraded from D/H02.17 to a higher level of code the recovery plan is to migrate to zero while still on the new code.
Once the VSM is migrated to zero the customer can recall VTVs as required. The metadata is reconstructed on the recall correctly with D/H02.18 and higher codes.

MOS Knowledge Document 1015547.1 documents how to migrate the VSM to zero.

If you are not familiar with using the statesave simulator then open a MOS collaboration case to support and provide the FSC 3BBB statesave files for analysis.

If you are familiar with using the statesave simulator then upload the fsc3bbb statesave files to the engineering server and continue.

If the fsc 3bbb is incurred use the following procedure to identify that the cause is the bad matadata:
step 1: run the simulator on the 3bbb statesave ---- from unix cmdline 'simxgst' (most TSC guys know this)

step 2: go up the stack to the 3bbb SNO line to identify the failing VTV info
(gdb) up
#1 0x1bf570 in LocateBlock (unit=0x3, vtdev=0x3) at ../ct_vtloc.c:717
717           if ( VTU->chunkRestarts++ > 30 ) { should_not_occur(0x3bbb); };
(gdb) iss curunit
Type unsigned long is not a structure or union type.
Unit 3 dev(0x03) stdCart Vtvid A07765 has 270 pages (19 cyls) with status:
     Mounted, LocActive, read, forward, Frontend Blocked (0x18042)
            , AllocPageAvail, RunCCR (0x30)
            , SkipMarkActive, FastSkipAvail, FastSkipActive (0x10300000)
Cur Tape Seg 1, Tape Block 16(0x10), VTV Page 1(0x01), blocked(10:36:39.978337), unblocked(10:36:40.011531)

Pages avail: allocPage
Page      VTA        vdir        ndir         tcb   tcb state
0 0x387000A9 0x80000000 0x80000000 0x1427B5F8        0x00 Unblock-stage page
1 0x38700062 0x80000000 0x80000000 0x1427B6A8        0x00
2 0x38700063 0x80000000 0x80000000 0x1427B758        0x00
3 0x38700064 0x80000000 0x80000000 0x1427B808        0x00
A 0x38700060 0x80000000 0x80000000 0x1427B548        0x00 Unblock-stage page
Allocated cylinder list(19):
0x38700060 0x38700070 0x38700080 0x38700090 0x387000A0 0x387000B0 0x387000C0 0x387000D0
0x387000E0 0x387000F0 0x38700100 0x38700110 0x38700120 0x38700130 0x38700140 0x38700150
0x38700160 0x38700170 0x38700180
Virtual Unit cache transfer addresses:
page0(0x142D49D8:0xFFFFFFFF)
Tcbs: ( 0x872AD4:0xFFFFFFFF) ( 0x872AD4:0xFFFFFFFF) ( 0x872AD4:0xFFFFFFFF) ( 0x872AD4:0xFFFFFFFF)
Bytes(actual:comp) 400:255 Write(actual:comp) 0:0
Blocks(r:w) 5:0 pages (r:w) 0:0

Found 1 units mounted

step 3: the line "SkipMarkActive, FastSkipAvail, FastSkipActive" is the key indicator of the problem.
step 4: have the customer delete and recall that VTV to correct the problem

Attachments

This solution has no attachment