Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2230270.1
Update Date:2017-02-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  2230270.1 :   How to Replace a Big Data Appliance (Original V1) Faulty RAID HBA BBU  


Related Items
  • Big Data Appliance Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP

Applies to:

Big Data Appliance Hardware - Version All Versions and later
Information in this document applies to any platform.

Goal

How to Replace a Big Data Appliance Faulty RAID HBA BBU
(the original V1 based on x4270 M2 servers)

Solution

DISPATCH INSTRUCTIONS WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: BDA trained
TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW:

The Battery Backup Unit (BBU) on the RAID HBA in a Big Data Appliance server node has been diagnosed as needing replacement.

Videos for the physical replacement procedures are attached to this Note 1527626.1 for Exadata X2-2 Storage Cells.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

The instructions below assume the Customer system administrator is available and working with the field engineer onsite to manage the host OS and BDA services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer system administrator wants or allows or needs help with these steps.

The server that contains the faulty RAID HBA BBU should have its services offlined and system powered off.

Preparation for replacement:

1. If the OS is still operating and available, then revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the BBU occurs. Set all logical volumes cache policy to WriteThrough cache mode:

# /opt/MegaRAID/megacli/MegaCli64 -ldsetprop wt -lall -a0

Verify the current cache policy for all logical volumes is now WriteThrough :

# /opt/MegaRAID/megacli/MegaCli64 -ldpdinfo -a0 | grep BBU

2. The Customer’s system administrator should shutdown the server node and BDA services following the shutdown instructions for Big Data Appliance detailed in MOS Note 2099858.1

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Physical RAID HBA BBU replacement:

These steps are relevant to BDA nodes based on Sun Fire x4270 M2 Server.

1. Slide out the server for maintenance.

NOTE:
Do not remove any cables prior to sliding the server forward, or the loose cable ends will jam in the cable management arms.
Take care to ensure the cables and Cable Management Arm is moving properly. Refer to Note 1444683.1 for CMA handling training.

2. Disconnect the AC power cords.

3. Unlatch and slide off the top cover of the server.

4. Remove the HBA PCI card:
   a) Remove the IB cables from the IB card in slot 3 above the HBA making a note of which port each cable goes into so they can go back into the same port.
   b) Remove back panel PCI cross bar
      i) Loosen the two captive Phillips screws on each end of the crossbar
      ii) Lift the PCI crossbar up and back to remove it from the chassis
   c) Remove the PCIe Riser 2 containing the RAID HBA card to be serviced
      i) Loosen the captive screw holding the riser to the motherboard
      ii) Lift up the riser and the PCIe cards that are attached to it as a unit.
   d) Disconnect the SAS cables from the RAID HBA card making a note of which port each cable goes into so they can go back into the same port.
   e) Extract the RAID HBA card from the PCIe Riser assembly, and place on an anti-static mat.

5. Remove the old BBU from the HBA:
   a) Use a No. 1 Phillips screwdriver to remove the 3 retaining screws that secure the battery to the HBA from the underside of the card.

NOTE: Do NOT attempt to remove any screws from the top side of the HBA and battery pack – those screws hold the standoffs that provide the bottom screw holes and should remain with the battery pack.

   b) Detach the battery pack including circuit board from the HBA by gently lifting it from its circuit board connector on the top side of the HBA.

6. Install the new BBU on the HBA:
   a) Attach the battery pack circuit board connector to mate with the HBA’s connector on the top side of the HBA.
   b) Use a No. 1 Phillips screwdriver to install the 3 retaining screws, to secure the battery to the HBA from the underside of the card. If the BBU comes with a package of new screws, then use those new screws - do not re-use the screws from the old BBU attachment.

7. Install the RAID HBA PCI Card into the empty slot on PCIe Riser 2, then re-install the PCIe riser into the server in PCIe Slot 2.

Reverse the removal instructions in Step 4, taking care to get the SAS cables re-connected to the same ports they were removed from. If reversed, this may affect disk slot mappings.
Take care to also put the IB cables back into the original ports, as well, in the correct orientation. IB cables are factory labeled with the port identification where port 2 is the port nearest the PCI connector, and port 1 is the port near the top side of the card. The cables should be inserted with the latch release tab on the down side, so they fully seat and latch. If inserted upside down, they will not fully seat or latch.

8. Install the top cover

9. Install the AC power cords

10. Slide the server back into the rack.


OBTAIN CUSTOMER ACCEPTANCE WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

These should be done in co-operation with the customer’s administrator to complete the procedure and verification prior to the field engineer leaving the customer site.

1. Once the ILOM has booted you will see a slow blink on the green LED for the server. Press the power button on the front of the server to power on the unit.
2. During boot, monitor the graphics console through either ILOM javaconsole or the local KVM.
   To connect to the console through ILOM:
      a. From the ILOM Web browser (preferred):
        Access the “Remote Control → Redirection” tab and then click on the “Launch Remote Console” button.
        (On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen).
      b. From the ILOM CLI:

          → start /SP/console

      c. Use the local KVM and Keyboard/Monitor tray, open the tray and select the appropriate BDA Server Node hostname from the “Target Devices” list, and then select the “Console” button.

Watch in particular, the LSI controller BIOS while it is loading. If it gives a warning message regarding drives with preserved cache, then choose “D” to discard the cache and continue. This is not an issue as the disk will get re-synced after boot by HDFS. If it gives a warning message regarding drives are in write-through mode due to a low battery, then choose to continue.

The boot should continue normally after up to the login prompt.

Note:
If using the ILOM serial console to monitor the boot, there may be a long pause during subsequent boot steps before the login prompt displays, as the default console is the graphics, and portions of the boot messages will only go to the graphics screen and not display on the serial console.

3. Once full boot is completed you should be able to login as ‘root’ user and verify the new battery is seen and is charging.

# /opt/MegaRAID/MegaCli/MegaCli64 -adpbbucmd -a0

4. Set all logical drives cache policy to WriteBack cache mode:

# /opt/MegaRAID/MegaCli/MegaCli64 -ldsetprop wb -lall -a0

5. Verify the current cache policy for all logical drives is now using WriteBack cache mode:

# /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep BBU

6. Verify the InfiniBand links are up at 40Gbps as the cables were disconnected:

# /usr/sbin/ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0021:2800:013e:70bb
base lid: 0x50
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:0021:2800:013e:70bc
base lid: 0x51
sm lid: 0x1
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)

7. Once the hardware is verified as up and running, the Customer's system administrator will need to verify the BDA services are up following the startup procedures for Big Data Appliance detailed in MOS Note 2099858.1

PARTS NOTE:
371-4982 6Gigabit SAS RAID PCI Battery Module (LION), BBU-08
7050794 6Gigabit SAS RAID PCI Battery Module (LION), BBU-08, RoHS2013.

References

Oracle ILOM 3.0 documentation library - http://docs.oracle.com/cd/E19860-01/index.html
<NOTE:2099858.1> - Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance
https://www.broadcom.com/support/oem/oracle/6gb/sg_x_sas6-r-int-z

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback