Asset ID: |
1-71-1267767.1 |
Update Date: | 2017-10-20 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1267767.1
:
How to Remove and Replace a VLE JBOD Hard Disk Drive:ATR:2205:4
Related Items |
- Sun Virtual Library Extension (VLE)
- Sun Virtual Library Extension (VLE)
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: TAPE-CAP VCAP
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP
Applies to:
Sun Virtual Library Extension (VLE) - Version 1.0 and later
Information in this document applies to any platform.
Goal
Field replacement of a failed or suspect HDD in the VLE JBOD.
Solution
WHAT SKILLS DOES THE ENGINEER NEED:(IS A SITE ENGINEER AVAILABLE?)
1. Training - J4400 and DE2-24C JBODs
2. Knowledge of how to connect to the VLE and the use of the command line in a terminal window.
3. Knowledge of the drive identifier, JBOD type and drive bay within the JBOD and the ZFS fault event id number. This information should be provided in the problem case.
DELIVERY REQUIREMENT:
None. This is a non-disruptive action.
TASK COMPLEXITY: 4
Time Estimate: 150 minutes
FIELD ENGINEER INSTRUCTIONS:
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? : Online
Take ESD Precautions.
The replacement disk drive must be the same capacity and same type as the disk drive it is replacing.
On a J4410 JBOD never remove a disk drive unless the ID/Status LED is blue.
On a DE2-24C JBOD never remove a disk drive unless the orange LED is lit.
The disks are hot-swappable and you do not need to disconnect power from the system
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
NOTE 1: Replacing a VLE drive requires the use of an actual replacement drive. Putting the same drive back into the machine is not supported. Obtain a replacement drive from stock prior to starting this action.
Note 2: Old versions of VLE code must be run with root privileges and prefaced with the 'pfexec' command.
Connect to the VLE and open a terminal window. This can be done via direct connect to the service port (igb3) using IP address 10.0.0.10.
Note 3: Disable ASR during maintenance activities. When maintenance activities are finished, enable ASR.
1. Verify you are connected to the correct machine. Run the following command and verify the serial number of the machine you are connected to matches the serial number in the SR/task being worked.
Enter the command.
$ sudo bash
The following should be entered as a single command.
# /opt/SUNWvle/scripts/getVleInfo | grep Serial
Example:
vleadmin@engvle10-brm07:~$ /opt/SUNWvle/scripts/getVleInfo | grep Serial
fruSerialNumber: AK10000008
2. During maintenance activities disable ASRs using the following procedure.
Note: The system must be registered for ASRs before ASR can be disabled.
a. Login as vleadmin and run sudo bash to gain root privilege.
b. Run the asr_sfb.sh script.
/var/opt/StatusService/scripts/asr_sfb.sh
c. Select the following menu items.
2) Configure ASR Settings ONLY
15) ASR disable
17) ASR query (enabled/disabled) - verifyASR is disabled
q) Quit
3. Issue the following command to run the replace_drive_service script:
# /opt/SUNWvle/scripts/replace_drive_service
The script will search the VLE for drives that are not online or spare. The script will respond one of two ways.
a) If the script does not find a suspect drive it will prompt you to reply with 'c' to continue or 'q' to quit.
Example:
vleadmin@ENVLE11A:/opt/SUNWvle/scripts/replace_drive_service
Bad drives...
NO FAULTED, FAILED, DEGRADED, UNAVAILABLE, OR OFFLINE DRIVES WERE FOUND
Enter 'c' to continue or 'q' to quit:
If you reply 'c' the script will then prompt you with:
Enter the bad drive identifier (e.g c0t5000C50020C42077d0):
b) If it does find a suspect drive or drives it will list them and prompt you to respond with the drive identifier of the drive you wish to replace.
Example:
vleadmin@SVCVLE10:/opt/SUNWvle/scripts/replace_drive_service
Bad drives...
c0t5000C500212B677Bd0 UNAVAIL 0 0 0 cannot open
Enter the bad drive identifier (e.g c0t5000C50020C42077d0):
Note: the drive to be replaced can be a drive the system has listed or can be another drive (online or spare).
c) If you did not respond with a 'q' to quit you should now be at a point where the script has prompted you to respond with the bad drive identifier. Respond accordingly with the identifier of the drive to be replaced.
The drive identifier is the WWN of the drive, it is 21 characters long and will start with c0t5. Example:
c0t5000C500212B677Bd0
d) If the script could not find the drive in question then it may also prompt for the JBOD and bay (slot) numbers of the drive to be replaced. If these prompts are given respond appropriately to continue. The JBOD and bay or slot number should be provided in the task provided by support.
If you chose to replace a drive that was online the script will respond with a warning that the drive is online. Example:
Drive found in vsbpool: c0t5000C5001023C767d0
c0t5000C5001023C767d0 ONLINE 0 0 0
raidz3-3, JBOD2, 22, c0t5000C5001023C767d0, 5000c5001023c765,
ONLINE
c0t5000C5001023C767d0 status is: ONLINE
WARNING: SELECTED DRIVE IS ONLINE!!!
Enter 'c' to continue or 'q' to quit:
If the script has correctly identified the online drive and it is the drive you wish to replace then respond with 'c' and enter to continue. Otherwise respond with 'q' and enter to quit.
From this point on follow the directions given by the script. It will light both the JBOD and drive identifier LEDs to aid in replacement of the correct drive and then instruct you to replace the drive and respond 'c' to continue or 'q' to quit.
CAUTION: Make sure you remove the correct (failed) drive. Drive release latch and drive LEDs are on opposite sides of the failed drive. See this diagram for further clarification on this caution as it pertains to the DE2-24C JBODs.
After you have replaced the suspect drive and replied .c. to continue the script will do the work of deleting the old drive and adding the new drive to the VLE. This process may take up to 15 minutes and may encounter many retries (which are not considered abnormal).
4. Now that the drive replacement script has finished complete the drive replacement process by performing the following steps.
a) Issue the following command:
# devfsadm -C
(removes unconfigure symbolic links)
b) Verify fmd service is running in the expected "online" state, and ensure fmd service is not hung by issuing the following command:
# fmstat
If fmd service is running the above command will return a table of information.
If fmd service is hung the above command will return an error message similar to the following (or will hang and not respond . a .Cntl-C. will then terminate the hung command):
fmstat: failed to connect to fmd: RPC: Program not registered
c) If fmd service is hung, toggle fmd service even if fmd service is still in an online state. Toggle fmd service by issuing the following commands:
# svcadm disable fmd
Monitor fmd state to verify it went into "disable" state by issuing the following command:
# svcs fmd
Then issue the following command:
# svcadm enable fmd
d) Reset the ZFS diagnostic counters by issuing the following commands:
# fmadm reset zfs-diagnosis
# fmadm reset zfs-retire
Run the following and get the UUID for the disk fma event.
# fmadm faulty
# fmadm acquit [UUID]
e) Check there is no new fault from drive replacement service by running following command:
# fmadm faulty
5. Display JBOD and Disk firmware (FW) levels.
Display the current JBOD firmware levels. This will display the .Current. and .Baseline. versions of the JBODs and the drives. This will be used to determine if any components are down level and need to be updated, or to verify that the firmware update has successfully updated the firmware levels.
Note - Run the following command to verify the FW levels but do not upgrade the FW as it is a disruptive procedure. The VLE node needs to be offline and a maintenance window is required.
# /var/opt/StatusService/scripts/asr_sfb.sh
. Main Menu: Select item 4 (JBOD Utilities)
. JBOD Utilities Menu: Select item 1 (Display JBOD and Disk firmware levels)
##########################################################
JBOD Utilities Menu:
##########################################################
# SubSystem Name: TIKKA02
# Host Name: tikka02
# System Serial Number: AK10000002
# Server Chassis Serial Number: tikka02_csn
# Host IP Address: 10.80.143.21
# DNS Name: tikka02-01.us.oracle.com
#
# JBOD Utilities (VLE)
# Enter your selection:
#
# 1) Display JBOD and Disk firmware levels
# 2) Update JBOD SIM/Canister expander firmware
# 3) Update JBOD Disk firmware
# 4) Display JBOD names (list registered arrays)
# 5) Turn on JBOD locate LED
# 6) Turn off JBOD locate LED
# 7) Unregister JBOD arrays Engineering only, see Help
# 8) Delete all devices Engineering only, see Help
# 9) Discover all devices Engineering only, see Help
# 10) Rediscover all devices Engineering only, see Help
# h) Help - JBOD Utilities
# b)Back
# q)Quit
##########################################################
After selecting #1 in the menu above, the Baseline and Current levels of the JBOD and Disk firmware are given:
Analyzing array 1312BRM009,(tikka02),50800200014224be
Disk: All FRUs at baseline
Name Model Current Baseline
Disk.00 H7240AS60SUN4.0T A1CA A1CA
Disk.01 H7240AS60SUN4.0T A1CA A1CA
.Expander: All FRUs at baseline
Name Model Current Baseline
Primary.EXP DE2-24C 0018 0018
Secondary.EXP DE2-24C 0018 0018
Note - After maintenance activities are complete, ASRs must be re-enabled.
6. How to enable ASRs after maintenance activities are complete.
a. Login as vleadmin and run sudo bash to gain root privilege.
b. Run the asr_sfb.sh script
/var/opt/StatusService/scripts/asr_sfb.sh
c. Select the following menu items.
2) Configure ASR Settings ONLY
15) ASR enable
17) ASR query (enabled/disabled) - verify ASR is enabled
q) Quit
WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
None.
PARTS NOTE:
REFERENCE INFORMATION:
http://download-adc.oracle.com/archive/cd_ns/E23581_01/
Attachments
This solution has no attachment