Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2082845.1
Update Date:2017-01-31
Keywords:

Solution Type  Technical Instruction Sure

Solution  2082845.1 :   How to Replace a Big Data Appliance X5-2/X6-2 RAID HBA  


Related Items
  • Big Data Appliance X5-2 Starter Rack
  •  
  • Big Data Appliance X5-2 Full Rack
  •  
  • Big Data Appliance X5-2 Hardware
  •  
  • Big Data Appliance X6-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: CAP document

Applies to:

Big Data Appliance X6-2 Hardware - Version All Versions and later
Big Data Appliance X5-2 Starter Rack - Version All Versions and later
Big Data Appliance X5-2 Hardware - Version All Versions and later
Big Data Appliance X5-2 Full Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

 How to Remove and Replace a RAID HBA in an Big Data Appliance X5-2/X6-2 node

Solution

DISPATCH INSTRUCTIONS:
- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED: BDA Trained
- TIME ESTIMATE: 60 Minutes
- TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:
- PROBLEM OVERVIEW: A faulty RAID HBA in a Big Data Appliance server node has been diagnosed as needing replacement.
- WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

The instructions below assume the Customer system administrator is available and working with the field engineer onsite to manage the host OS and BDA services.
They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer system administrator wants or allows or needs help with these steps.

The server that contains the faulty RAID HBA card should have its services offlined and system powered off.

Preparation for replacement:

1. If the OS is still operating and available, then revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the HBA occurs.
    Set all logical volumes cache policy to WriteThrough cache mode:

# /opt/MegaRAID/storcli/storcli64 -ldsetprop wt -lall -a0

Verify the current cache policy for all logical volumes is now WriteThrough :

# /opt/MegaRAID/storcli/storcli64 -ldpdinfo -a0 | grep BBU

2. The Customer’s system administrator should shutdown the server node and BDA services following the shutdown instructions for Big Data Appliance detailed in MOS Note 2099858.1

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

Physical RAID HBA replacement:

1. Slide out the server for maintenance.
    Do not remove any cables prior to sliding the server forward, or the loose cable ends will jam in the cable management arms. Take care to ensure the cables and Cable Management Arm is moving properly.
    Refer to Note 1444683.1 for CMA handling training.

2. Disconnect the AC power cords.

3. Unlatch and slide off the top cover of the server.

4. Remove the old RAID HBA. (These steps are relevant to BDA nodes based on Oracle Server X5-2L.)
  a) Swivel the air baffle into the upright position to allow access to the PCIe cards.
  b) Rotate the PCIe card slot 6 locking mechanism latch out to disengage the RAID HBA card that has failed.
  c) Lift up and remove the RAID HBA card from the server.
  d) Disconnect the SAS cables from the RAID HBA card making a note of which port each cable goes into so they can go back into the same port.
  e) Disconnect the Super Capacitor Cable from the RAID HBA card.
  f) Place the removed RAID HBA on an anti-static mat.

5. Install the new RAID HBA PCI Card into PCIe Slot 6.
    Reverse the removal instructions in Step 4, taking care to get the SAS cables re-connected to the same ports they were removed from. If reversed, this may affect disk slot mappings.

6. Install the top cover

7. Install the AC power cords

8. Slide the server back into the rack.

OBTAIN CUSTOMER ACCEPTANCE

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

1. Once the ILOM has booted you will see a slow blink on the green LED for the server. Press the power button on the front of the server to power on the unit.

2. During boot, monitor the graphics console through either ILOM javaconsole or the local KVM. When loading its BIOS ROM, the new RAID controller will detect the RAID configuration on the disks and complain it has a Foreign configuration. This is expected. At the prompt, press "F" or "C" to accept the foreign configuration or enter the controller BIOS utility. If you press any other key to continue, then the controller will not import the RAID and will fail to find a bootable disk. If this occurs, it is safe to press "ctrl-alt-del" and reset and get the "F" or "C" prompt again.

3. When the utility loads, there should only be 1 adapter. Select the "Start" button.

4. The foreign configuration screen is shown. Select "Configuration" from the drop down, and select the "Preview" button.

5. Verify the configuration looks correct on the "Virtual Drives" side and select the "Import" button if it is. The correct configuration should be 12 RAID0's, 1 per disk.

6. This will bring you back to the Logical View screen where the virtual drives should be listed out on the right side. Select the "Exit" link from the left side menu.

7. This will bring you to the "Please Reboot" screen. Press "Ctrl-Alt-Del" to reboot the machine and boot the OS.

8. After the OS has booted, login to the OS with ‘root’ privilege.

9. Run the following to update the RAID HBA to the correct supported firmware for the image:

# /opt/oracle/bda/bin/bdaupdatefw

After the firmware updates, the server will reboot again. The disk volumes should remain intact and boot up to the OS again.

10. After the OS is up, login as root and validate the physical and logical volumes are seen properly from the new RAID HBA in the OS and that the battery is seen:
The following command should show 12 disks:

# lsscsi | grep -i LSI
[0:0:20:0] enclosu LSILOGIC SASX28 A.1 502E -
[0:2:0:0] disk LSI MR9261-8i 2.90 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.90 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.90 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.90 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.90 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.90 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.90 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.90 /dev/sdh
[0:2:8:0] disk LSI MR9261-8i 2.90 /dev/sdi
[0:2:9:0] disk LSI MR9261-8i 2.90 /dev/sdj
[0:2:10:0] disk LSI MR9261-8i 2.90 /dev/sdk
[0:2:11:0] disk LSI MR9261-8i 2.90 /dev/sdl

If the device count is not correct check also that the LSI controller has the correct Virtual Drives configured and in Optimal state, physically Online and spun up, with no Foreign configuration. There should be Virtual Drives 0 to 11, and the physical slots 0 to 11 should be allocated to 1 each (not necessarily the same 0:0 1:1 etc. mapping).

# /opt/MegaRAID/storcli/storcli64 -LdPdInfo -a0 | grep "Virtual Drive\|State\|Slot\|Firmware state"
Virtual Drive: 0 (Target Id: 0)
State : Optimal
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 1 (Target Id: 1)
State : Optimal
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 2 (Target Id: 2)
State : Optimal
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 3 (Target Id: 3)
State : Optimal
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 4 (Target Id: 4)
State : Optimal
Slot Number: 4
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 5 (Target Id: 5)
State : Optimal
Slot Number: 5
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 6 (Target Id: 6)
State : Optimal
Slot Number: 6
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 7 (Target Id: 7)
State : Optimal
Slot Number: 7
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 8 (Target Id: 8)
State : Optimal
Slot Number: 8
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 9 (Target Id: 9)
State : Optimal
Slot Number: 9
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 10 (Target Id: 10)
State : Optimal
Slot Number: 10
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 11 (Target Id: 11)
State : Optimal
Slot Number: 11
Firmware state: Online, Spun Up
Foreign State: None

 

# /opt/MegaRAID/storcli/storcli64 -AdpBbuCmd -a0
BBU status for Adapter: 0

BatteryType: CVPM02
Voltage: 9450 mV
Current: 0 mA
Temperature: 29 C
Battery State: Optimal
BBU Firmware Status:

...Output truncated...

 If this is not correct, then there is a problem with the disk volumes that may need additional assistance to correct. The server should be re-opened and the device connections and boards checked to be sure they are secure and well seated BEFORE the following commands are issued.

11. Set all logical drives cache policy to WriteBack cache mode:

# /opt/MegaRAID/storcli/storcli64 -ldsetprop wb -lall -a0

Verify the current cache policy for all logical drives is now using WriteBack cache mode:

# /opt/MegaRAID/storcli/storcli64 -ldpdinfo -a0 | grep BBU

12. Once the hardware is verified as up and running, the Customer's system administrator will need to verify the BDA services are up following the startup procedures for Big Data Appliance detailed in MOS Note 2099858.1

PARTS NOTE:
7085209 8-Port 12Gbps SAS3 Internal RAID HBA

References

<NOTE:2099858.1> - Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback