Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2180239.1
Update Date:2018-04-25
Keywords:

Solution Type  Technical Instruction Sure

Solution  2180239.1 :   Pillar Axiom: How to Perform a Manual CopyAway  


Related Items
  • Pillar Axiom 600 Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Axiom>SN-DK: Ax600
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: The information in this document is only for Oracle employees. Customers must not run these commands.

Applies to:

Pillar Axiom 600 Storage System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

Proactively replace a drive without degrading the RAID group. 

Solution

The RAID CopyAway feature is intended to automatically copy the full content of a drive that has been showing signs of end of life to the HotSpare before failing it. Once the disk has been faulted by the RAID firmware, the drive can be safely replaced using the Guided Maintenance. The newly inserted drive will go into CopyBack (SATA & SATA V2 bricks) or it will become the new HotSpare (FC, FC Expansion & FC V2 bricks).


The feature can also be run manually in order to proactively replace drives that have been showing a lot of sector repairs and disk errors in the logs or for other reasons. Proactive drive replacements need to be approved by RAID Engineering who can advise a CopyAway or a Rebuild depending on the situation (the CopyAway of a struggling drive has potential issues).

NOTE: There is no need to do a CopyAway if the failing drive is set as a HotSpare (not in use - no data). The Guided Maintenance can be used on that drive.
For SATA and SATA V2 bricks, the HotSpare must be Drive 12 (contact Support if that is not the case). 

The brick must be in a Normal state in order to do a CopyAway.

For R5, it is recommended to use the tool CopyAway_R5.exe (attached to this knowledge article) as it performs all the checks and prevents from mistakes trying to manually follow the procedure below.
Most importantly, the tool saves a lot of time for the TSE that has been assigned to the proactive replacement of the drive (the file can be attached to the Service Request with clear instructions to save a webex).

NOTE: Customers should use the 'track' option to avoid additional callhome.
Using the 'track' option will prevent unnecessary Service Requests to be open and additional drives to be dispatched by Automation.
The tool will slightly modify the CallHome matrix at the end of the CopyAway. This suppresses the creation of a log bundle when the BRICK_CRU_STATE_CHANGE event is created.
The original CallHome matrix will be automatically restored when CopyAway.exe finishes.

Always upgrade the Axiom to the latest code and make sure there is no pending CopyAway fixes on that branch before using CopyAway_R5.exe.
The Axiom must be on 05.04.18 or higher for R5 or 04.06.18 or higher for R4 as the CopyAway feature is prone to software errors in earlier builds (Engineering would advise to replace the drive without using CopyAway if the customer cannot upgrade).

C:\>CopyAway_R5.exe /?
CopyAway_R5.exe is a utility for Release 05.03.00 and onwards to proactively replace hard drives without degrading the RAID array.
You can combine this tool with the Windows Task Scheduler to start a CopyAway before the office hours.
Note: the tool will not run a CopyAway if the Brick is not healthy.

Syntax:
CopyAway_R5.exe Axiom_IP_Address BrickName DiskNumber [track]

DiskNumber: accepted values are between 0 and 11
track: optional keyword to have CopyAway to track the task to prevent sending a CallHome.

Example: CopyAway_R5.exe 10.10.30.100 Brick002 5 track

You can also double click on CopyAway_R5.exe and follow the wizard instead of using the command line. 

Using the Windows scheduler, customers can set CopyAway to start outside office hours so the disk will be ready to be replaced on the next day.

It is also possible to run multiple CopyAways at the same time but it is recommended to run no more than 2 on the same Axiom.

This is an example using the track option:
Multiple CopyAway

The drive can be replaced using the Guided Maintenance after the CopyAway task has been completed.

NOTE: There is also a Decode tool attached to this document in case the CopyAway_R5 tool reports an error (an encrypted text file is generated after each run). Ask the customer to upload the report to the Service Request (do not share the Decode tool with the customer), decode the file and seek help with RAID Engineering.

 

 

Manual procedure (for R4 and R5):

Always upgrade the Axiom to the latest code and make sure there is no pending CopyAway fixes on that branch before using this procedure.

  1. Prepare a laptop with a Brick console cable: Document 1389622.1 Pillar Axiom: How to establish a serial connection to a Brick Console

NOTE: If the Brick is an FC Brick Expansion (JBOD), use the Topology tool (see Document 1609365.1 Pillar Axiom: How to generate Axiom topology) to find out the parent FC Brick.
The CopyAway command needs to be started from the parent FC brick. 

 The CopyAway is triggered by a single command called usp (UseSpare) that requires a few parameters that need to be in the following order:

- the brick target number (0 for SATA, SATA V2, FC, FC V2 or 1 for FC Expansion)
- the target slot number (the slot number of the HotSpare: 12 for SATA & SATA V2, floating for FC, FC V2 & FC Expansion)
- the source slot number (the slot number of the drive that needs to be proactively replaced)
- the Max Operations number (we use 64)
- the Max Errors number (we use 255)
- the Maintenance Operation flag number (0x20 for CopyAway)

All this information needs to be retrieved before trying the usp command.

 

  1. Using the Axiom GUI, open the Brick details to identify the HotSpare (it should be Disk 12 for SATA).

In this example, a CopyAway will be launched to replace Disk 11 on an FC brick. The GUI shows that the HotSpare is Disk 2.
HotSpare location

 

  1. Connect to the RAID Controller supposedly in charge of the drive to CopyAway.

For FC & FC V2 bricks, connect to RAID Controller 1.
For SATA & SATA V2 bricks (SATA & SSD drives):
- connect to RAID Controller 0 if the drive to replace is in one of the following slots: 0 - 5
- connect to RAID Controller 1 if the drive to replace is in one of the following slots: 6 - 11

 

  1. Perform a "scsi> chk" to verify that the controller is in charge of the drive to CopyAway:

SCSI> chk
...
TARGET 0000-0000 , WWN :- 20 0C 00 0B 08 3A 5E 95 <-- verify that the WWN is matching the brick where the CopyAway will be done (scroll down to Target 0000-0001 if the drive to replace is on the FC Expansion)
--------------------------------------------

Firmware Version: 01.21.05

Board Revision: 0.A
Assembly Number: 1450-00076-30
Board Serial #: GE0650021263110551

FW Major Ver 0001.
FW Minor Ver 0021.
FW BUILD NUM 0005.

Midplane Brix Type was : FC (03)
12 FC Gen2 650
eventFlag miscFlag fwUpgdProg bioMask Pnet_C0 Pnet_C1
00 01 00 0000-0000 00 00
--------------------------------------------
totalLUNs 0000-0003
LUN raidType LUNState LUNActivity BGnd
0000 0002 Online 02 Clean 01 Idle           <-- verify that all the LUN States are Online and the LUN Activities are either Clean, Dirty or Idle
0001 0002 Online 02 Clean 01 Idle
0002 0003 Online 02 Clean 01 Idle
--------------------------------------------
totalDrives 0000-000C
Drv Owner State
00 01 03_Ready
01 01 03_Ready
02 01 03_Ready
03 01 03_Ready
04 01 03_Ready
05 01 03_Ready
06 01 03_Ready
07 01 03_Ready
08 01 03_Ready
09 01 03_Ready
0A 01 03_Ready
0B 01 03_Ready                               <-- the drive is owned by RC1 ("01" on the Owner column)
--------------------------------------------
total Controllers 0000-0002
ID Status
0000-0200 0000-0002 FC_ONLINE
0000-0201 0000-0002 FC_ONLINE
--------------------------------------------
Drive 00 04_Online 06_InUse
Drive 01 04_Online 06_InUse
Drive 02 04_Online 04_Fresh * HotSpare *     <-- Fresh + HotSpare = that confirms that Disk 2 is the HotSpare
Drive 03 04_Online 06_InUse
Drive 04 04_Online 06_InUse
Drive 05 04_Online 06_InUse
Drive 06 04_Online 06_InUse
Drive 07 04_Online 06_InUse
Drive 08 04_Online 06_InUse * HotSpare *     <-- that drive was acting as HotSpare in the past
Drive 09 04_Online 06_InUse
Drive 0A 04_Online 06_InUse
Drive 0B 04_Online 06_InUse * HotSpare *     <-- that drive was acting as HotSpare in the past
-------------------------------------------- 

Connect to the other RAID controller if the drive is not owned by the default controller.

 

  1. The next step is to identify which COD LUN has a partition for Disk 11.

NOTE:
About FC COD LUN:
Drives 0 - 5 have COD LUN 0 on 6 drives
Drives 6 - 10 have COD LUN 1 on 5 drives
Drives 0 – 10 have Data LUN 2 (11 drives)
Floating HotSpare is on Drive 11.
Over time LUNs will end up on different drives if we have had drives failed and replaced

About SATA COD LUN:
Drives 0 - 5 have COD LUN 0 on 6 drives
Drives 6 - 11 have COD LUN 1 on 6 drives
Drives 0 - 5 have Data LUN 2 on 6 drives
Drives 6 - 11 have Data LUN 3 on 6 drives
HotSpare is on Drive 12.
There is a small chance for LUNs to end up on different drives so it is also important to perform this check on SATA & SATA V2 bricks as well.

Go to the RAID menu:

SCSI> raid
RAID>

Use the shru (Show RAID Unit) command to display the details of COD LUN 1 on target 0 (FC)

RAID> shru 0 1 1   (0 is for brick target 0, 1st '1' is for LUN 1, 2nd '1' is for verbose)

Look for the "psParts[" chapters and search for portNum 11. (drive to CopyAway):

psParts[ 0.] 058D-7D68
PartInfo 058D-7D68
psDI 058D-7D9C
psRU 058D-8DC0
partID 0.
portNum 6.        <-- Disk 6
offset 0.
blocks 524288.
flags 0000_0000_0000_0000
updatecnt 0.
statusBlk 1.
rev 0.

 


NOTE: For FC Expansion, use 1 for the brick target number and add 12 to the Disk number in order to get the relevant port number.
Example: If Disk 11 on an FC Expansion needs to have a CopyAway, look for port number 23. 

In the above example, there was no reference to portNum 11 so the next step is to check the partitions of COD LUN 0:

RAID> shru 0 0 1 (show RAID Unit for brick target 0, LUN 0, verbose), add ... on the next line
...
psParts[ 2.] 058D-8F08
PartInfo 058D-8F08
psDI 058D-8F3C
psRU 058D-ADD0
partID 0.
portNum 11.                        <-- Disk 11
offset 0.
blocks 524288.
flags 0000_0000_0000_0010 HotSpare <-- "scsi> chk" showed that the disk used to be a HotSpare
updatecnt 0.
statusBlk 1.
rev 0.

So Disk 11 has a copy of LUN 0

 

  1. Last step: run the usp command

Go to the config menu:

RAID> config
Config> 

Run the usp command using the following information:

Knowing that the brick target number is 0, the COD LUN# of the drive we want to replace is 0, copy to HotSpare Disk 2 from Disk 11 (disk to replace) with the default value for the last 3 parameters (64 255 0x20)

Config> usp 0 0 2 11 64 255 0x20

 

NOTE: For FC Expansion, do NOT add 12 to the drive numbers on the usp command (different console menu, different rules) and remember to use 1 for the brick target number.

Watch out for this potential error on the console output:

usp 0 0 2 11 64 255 0x20
WQE 0555-9DE8
SR 0537-4570
pBuf AEE3-0000
This RC does not own LUN 0.    <-- we have to try the same command from the other controller
SerUSCB
00DD = 05:04:03:02 :: RC does not own this LUN

If that is the case, type the logout command and move the brick console cable to the other controller (RC 0), go to the config menu and run "usp 0 0 2 11 64 255 0x20"

If all goes well, the output will show something similar:

-- RMD::ADDPART tgtNum 0.
dui model ST3300657FC ........................
dui sernum 6SJ44RZF............
AE::Send (RMDConfig) psAE 0590-4DD0, event 03, flags 0081 local Busy
AE::Send_CB received psAE 0590-4DD0 back
AE::Send (RMDConfig) psAE 0590-4DD0, event 0D, flags 0081 local Busy
AE::Send_CB received psAE 0590-4DD0 back
WrCFG_All target 0.

WrRMD Done.
AE::Send (RMDConfig) psAE 0590-4DD0, event 09, flags 0081 local Busy
AE::Send_CB received psAE 0590-4DD0 back
US_WR_CB
bbl_index invalid. No BBL.
M Sent: 9. 8.
Maintenance Operation To Be Started:

MO_Req (058B-42A4):
mo 0D [0D] CopyAway
target RU 058D-ADD0, tgtNum 0. lun 0.     <-- starting by doing a CopyAway of the COD LUN partition
maxOps 64.
maxErrs 65.
flags 0000-0000

mauDone 0.
mauNeeded 1.
opsIssued 0.
opsDone 0.
opsNeeded 0.
percent of (done*100/needed) 0. <-- percentage of the maintenance operation, "maint > shq" will give the latest status
errorsFound 0.
errorsFixed 0.
begTime 0.
endTime 0.
savedWatermark 0.
startMAU 0.
phase 0.

2. Weeks, 2. Days, 1. Hour, 9. Mins, 12. Secs, 65. Ticks
psMOR 058B-42A4
CopyAway Errors
Consectutive 0.
Total 0.
tgtPUI                    <-- target drive chapter
PUI 058B-430C
partID 0000-0000
DUI 058B-430C
brandPort 02              <-- Disk 2
rsvd 00
brandCount 5.
model ST3300657FC ........................
serNum 6SJ44RZF............

srcPUI                    <-- source drive chapter
PUI 058B-4350
partID 0000-0000
DUI 058B-4350
brandPort 0B              <-- Disk 11
rsvd 00
brandCount 9.
model ST3146854FC ........................
serNum 3KN1LBZ2............ 

Run the logout command and disconnect the brick console cable.

Check the Axiom GUI for the CopyAway background task.

The drive can be replaced using the Guided Maintenance after the CopyAway task has been completed.

References

<NOTE:1389490.1> - Pillar Axiom: How to Collect SMART Data from SATA, Fiber Channel, and SSD Bricks using a Brick Console
<NOTE:1946213.1> - Pillar Axiom: How to Perform an Axiom Healthcheck on an Ax600

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback