Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2230380.1
Update Date:2017-02-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  2230380.1 :   How to Replace a Big Data Appliance (Original V1, X3-2 or X4-2) Faulty RAID HBA  


Related Items
  • Big Data Appliance X3-2 Hardware
  •  
  • Big Data Appliance X3-2 Full Rack
  •  
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance X4-2 Hardware
  •  
  • Big Data Appliance X4-2 Full Rack
  •  
  • Big Data Appliance Hardware
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
  • Big Data Appliance X3-2 Starter Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP document

Applies to:

Big Data Appliance X4-2 Hardware - Version All Versions and later
Big Data Appliance Hardware - Version All Versions and later
Big Data Appliance X3-2 Hardware - Version All Versions and later
Big Data Appliance X3-2 Full Rack - Version All Versions and later
Big Data Appliance X3-2 Starter Rack - Version All Versions and later
Information in this document applies to any platform.

Goal

 How to Replace a Big Data Appliance (Original V1, X3-2 or X4-2) Faulty RAID HBA

Solution

DISPATCH INSTRUCTIONS:
- WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED: BDA Trained
- TIME ESTIMATE: 90 Minutes
- TASK COMPLEXITY: 3

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

- PROBLEM OVERVIEW: A faulty RAID HBA in a Big Data Appliance server node has been diagnosed as needing replacement.

- WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

The instructions below assume the Customer system administrator is available and working with the field engineer onsite to manage the host OS and BDA services.
They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer system administrator wants or allows or needs help with these steps.
The server that contains the faulty RAID HBA card should have its services offlined and system powered off.

Preparation for replacement:

1. If the OS is still operating and available, then revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the HBA occurs. Set all logical volumes cache policy to WriteThrough cache mode:

# /opt/MegaRAID/megacli/MegaCli64 -ldsetprop wt -lall -a0

Verify the current cache policy for all logical volumes is now WriteThrough :

# /opt/MegaRAID/megacli/MegaCli64 -ldpdinfo -a0 | grep BBU

2. The Customer’s system administrator should shutdown the server node and BDA services following the shutdown instructions for Big Data Appliance detailed in MOS Note 2099858.1

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:

Physical RAID HBA replacement:

1. Slide out the server for maintenance.

Do not remove any cables prior to sliding the server forward, or the loose cable ends will jam in the cable management arms. Take care to ensure the cables and Cable Management Arm is moving properly. Refer to Note 1444683.1 for CMA handling training.

2. Disconnect the AC power cords.

3. Unlatch and slide off the top cover of the server.

4. Remove the old RAID HBA.

BDA (V1) Server Nodes:
These steps are relevant to BDA nodes based on Sun Fire x4270 M2 Server.
   a) Remove the IB cables from the IB card in slot 3 below the HBA making a note of which port each cable goes into so they can go back into the same port.
   b) Remove back panel PCI cross bar
      i) Loosen the two captive Phillips screws on each end of the crossbar
      ii) Lift the PCI crossbar up and back to remove it from the chassis
   c) Remove the PCIe Riser 2 containing the RAID HBA card to be serviced
      i) Loosen the captive screw holding the riser to the motherboard
      ii) Lift up the riser and the PCIe cards that are attached to it as a unit.
   d) Disconnect the SAS cables from the RAID HBA card making a note of which port each cable goes into so they can go back into the same port.
   e) Extract the RAID HBA card from the PCIe Riser assembly, and place on an anti-static mat.

BDA X3-2 and X4-2 Server Nodes:
These steps are relevant to BDA nodes based on Sun Server X3-2L and Sun Server X4-2L.
   a) Rotate the PCIe card slot 6 locking mechanism latch out to disengage the RAID HBA card that has failed.
   b) Lift up and remove the RAID HBA card from the server.
   c) Disconnect the SAS cables from the RAID HBA card making a note of which port each cable goes into so they can go back into the same port.
   d) Place the removed RAID HBA on an anti-static mat.

5. Remove the RAID HBA's battery from the old RAID HBA

BDA (V1) Server Nodes:
These steps are relevant to BDA nodes based on Sun Fire x4270 M2 Server.
   a) Use a No. 1 Phillips screwdriver to remove the 3 retaining screws that secure the battery to the HBA from the underside of the card. Do not attempt to remove any screws from the top side of the HBA.
   b) Detach the battery pack including circuit board from the HBA by gently lifting it from its circuit board connector on the top side of the HBA

BDA X3-2 and X4-2 Server Nodes:
If the node is Sun Server X3-2L and does not have the remote-mounted battery kit installed, then follow the steps for BDA (V1) Server Nodes.
These steps are relevant to BDA nodes based on Sun Server X3-2L and Sun Server X4-2L with the remote-mounted battery kit installed.
   a) Use a No. 1 Philips screwdriver to remove the 3 retaining screws that secure the interface board to the HBA from the underside of the card.
   b) Detach the HBA Interface board from the HBA by gently lifting it from its circuit board connector on the top side of the HBA.
   c) Keep the cable and cable routing guide connected to the HBA interface board, and place on an anti-static mat.

6. Reinstall the HBA's battery or remote-mounted battery HBA Interface Board onto the new HBA. Reverse the removal instructions in step 5.
Note: If the system has a remote-mounted battery, take care to route the cable correctly. See MOS Note 1561949.1 for more guidance if necessary, including photos and videos.

7. Install the new RAID HBA PCI Card.
    On BDA (V1) Server nodes based on Sun Fire X4270M2, this will be into the empty slot on PCIe Riser 2.
    On BDA X3-2 and X4-2 Server nodes this will be into PCIe Slot 6.

Reverse the removal instructions in Step 4, taking care to get the SAS cables re-connected to the same ports they were removed from. If reversed, this may affect disk slot mappings.
Note: On BDA (V1) Server nodes based on Sun Fire X4270M2 server, take care to also put the IB cables back into the original ports, as well, in the correct orientation. IB cables are factory labeled with the port identification where port 2 is the port nearest the PCI connector, and port 1 is the port near the top side of the card. The cables should be inserted with the latch release tab on the down side, so they fully seat and latch. If inserted upside down, they will not fully seat or latch.

8. Install the top cover

9. Install the AC power cords

10. Slide the server back into the rack.

OBTAIN CUSTOMER ACCEPTANCE

- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

1. Once the ILOM has booted you will see a slow blink on the green LED for the server. Press the power button on the front of the server to power on the unit.

2. During boot, monitor the graphics console through either ILOM javaconsole or the local KVM. When loading its BIOS ROM, the new RAID controller will detect the RAID configuration on the disks and complain it has a Foreign configuration. This is expected. At the prompt, press "F" or "C" to accept the foreign configuration or enter the controller BIOS utility. If you press any other key to continue, then the controller will not import the RAID and will fail to find a bootable disk. If this occurs, it is safe to press "ctrl-alt-del" and reset and get the "F" or "C" prompt again.
Press F or C
Press C
Press Y

2. When the utility loads, there should only be 1 adapter. Select the "Start" button.

3. The foreign configuration screen is shown. Select "Configuration" from the drop down, and select the "Preview" button.

4. Verify the configuration looks correct on the "Virtual Drives" side and select the "Import" button if it is. The correct configuration should be 12 RAID0's, 1 per disk.

5. This will bring you back to the Logical View screen where the virtual drives should be listed out on the right side. Select the "Exit" link from the left side menu. This will bring you to the "Please Reboot" screen. Press "Ctrl-Alt-Del" to reboot the machine and boot the OS.

6. After the OS has booted, login to the OS with ‘root’ privilege.

7. Run the following to update the RAID HBA to the correct supported firmware for the image:

# /opt/oracle/bda/bin/bdaupdatefw

After the firmware updates, the server will reboot again. The disk volumes should remain intact and boot up to the OS again.

8. After the OS is up, login as root and validate the physical and logical volumes are seen properly from the new RAID HBA in the OS and that the battery is seen:
The following command should show 12 disks:

# lsscsi | grep -i LSI
[0:0:20:0] enclosu LSILOGIC SASX28 A.1 502E -
[0:2:0:0] disk LSI MR9261-8i 2.90 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.90 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.90 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.90 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.90 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.90 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.90 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.90 /dev/sdh
[0:2:8:0] disk LSI MR9261-8i 2.90 /dev/sdi
[0:2:9:0] disk LSI MR9261-8i 2.90 /dev/sdj
[0:2:10:0] disk LSI MR9261-8i 2.90 /dev/sdk
[0:2:11:0] disk LSI MR9261-8i 2.90 /dev/sdl

If the device count is not correct check also that the LSI controller has the correct Virtual Drives configured and in Optimal state, physically Online and spun up, with no Foreign configuration. There should be Virtual Drives 0 to 11, and the physical slots 0 to 11 should be allocated to 1 each (not necessarily the same 0:0 1:1 etc. mapping).

# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep "Virtual Drive\|State\|Slot\|Firmware state"
Virtual Drive: 0 (Target Id: 0)
State : Optimal
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 1 (Target Id: 1)
State : Optimal
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 2 (Target Id: 2)
State : Optimal
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 3 (Target Id: 3)
State : Optimal
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 4 (Target Id: 4)
State : Optimal
Slot Number: 4
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 5 (Target Id: 5)
State : Optimal
Slot Number: 5
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 6 (Target Id: 6)
State : Optimal
Slot Number: 6
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 7 (Target Id: 7)
State : Optimal
Slot Number: 7
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 8 (Target Id: 8)
State : Optimal
Slot Number: 8
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 9 (Target Id: 9)
State : Optimal
Slot Number: 9
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 10 (Target Id: 10)
State : Optimal
Slot Number: 10
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 11 (Target Id: 11)
State : Optimal
Slot Number: 11
Firmware state: Online, Spun Up
Foreign State: None
# /opt/MegaRAID/megacli/MegaCli64 -AdpBbuCmd -a0
BBU status for Adapter: 0
BatteryType: iBBU08
...Output truncated...

If this is not correct, then there is a problem with the disk volumes that may need additional assistance to correct. The server should be re-opened and the device connections and boards checked to be sure they are secure and well seated BEFORE the following commands are issued.

9. Set all logical drives cache policy to WriteBack cache mode:

# /opt/MegaRAID/megamli/MegaCli64 -ldsetprop wb -lall -a0

Verify the current cache policy for all logical drives is now using WriteBack cache mode:

# /opt/MegaRAID/megacli/MegaCli64 -ldpdinfo -a0 | grep BBU

10. On BDA (V1) systems based on Sun Fire X4270M2 server, verify also the InfiniBand links are up at 40Gbps as the cables were disconnected:

# /usr/sbin/ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0021:2800:013e:70bb
base lid: 0x50
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:0021:2800:013e:70bc
base lid: 0x51
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)

11. Once the hardware is verified as up and running, the Customer's system administrator will need to verify the BDA services are up following the startup procedures for Big Data Appliance detailed in MOS Note 2099858.1

PARTS NOTE:
375-3701 or 7047503 are the same, difference is RoHS2013 compliance.
8-Port 6Gbps SAS-2 RAID PCI Express HBA, B4 ASIC

References

<NOTE:2099858.1> - Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance
LSI MegaRAID User's Guide - https://www.broadcom.com/support/oem/oracle/6gb/sg_x_sas6-r-int-z

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback