Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2230374.1
Update Date:2018-03-07
Keywords:

Solution Type  Technical Instruction Sure

Solution  2230374.1 :   How to Replace a Big Data Appliance X3-2 or X4-2 Faulty RAID HBA BBU  


Related Items
  • Big Data Appliance X3-2 Full Rack
  •  
  • Big Data Appliance X3-2 Hardware
  •  
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance X4-2 Hardware
  •  
  • Big Data Appliance X4-2 Full Rack
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
  • Big Data Appliance X3-2 Starter Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: FRU CAP

Applies to:

Big Data Appliance X3-2 Starter Rack - Version All Versions and later
Big Data Appliance X3-2 Full Rack - Version All Versions and later
Big Data Appliance X3-2 Hardware - Version All Versions and later
Big Data Appliance X4-2 Full Rack - Version All Versions and later
Big Data Appliance X4-2 In-Rack Expansion - Version All Versions and later
Information in this document applies to any platform.

Goal

How to Replace a Big Data Appliance X3-2 or X4-2 Faulty RAID HBA BBU

Solution

DISPATCH INSTRUCTIONS WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: BDA trained
TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW:

The Battery Backup Unit (BBU) on the RAID HBA in a Big Data Appliance X3-2 or X4-2 Server node has been diagnosed as needing replacement.
Videos for the physical replacement procedures are attached to Note 1561949.1.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

The instructions below assume the Customer system administrator is available and working with the field engineer onsite to manage the host OS and BDA services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer system administrator wants or allows or needs help with these steps.

The server that contains the faulty RAID HBA BBU should have its services offlined and system powered off. Note that while the BBU may be remote mounted and accessible without a shutdown, In certain revisions of firmware used in BDA X3-2 and X4-2, the server operating system does need to be shutdown in order to prevent an unplanned outage possibility.

Preparation for replacement:
1. If the OS is still operating and available, then revert all the RAID disk volumes to WriteThrough mode to ensure all data in the RAID cache memory is flushed to disk and not lost when replacement of the BBU occurs.

Set all logical volumes cache policy to WriteThrough cache mode that does not use the battery:

# /opt/oracle/bda/bin/MegaCli64 -ldsetprop wt -lall -a0

2. Verify the current cache policy for all logical volumes is now WriteThrough:

# /opt/oracle/bda/bin/MegaCli64 -ldpdinfo -a0 | grep BBU

Repeat the verify command until it is WriteThrough which may take several minutes to complete.

3. The Customer’s system administrator should shutdown the server node and BDA services following the shutdown instructions for Big Data Appliance detailed in MOS Note 2099858.1

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Physical RAID HBA BBU replacement:

BDA X3-2 and X4-2 Server Nodes with the Remote Battery:

These steps are relevant to BDA Server Nodes based on Sun Server X3-2L and Sun Server X4-2L with the remote battery assembly (part 7057184) installed.

1.Locate the battery slot marked with an orange and white BBU label.

      This the right-hand slot on the rear of the chassis above PS1, labeled BBU (previously designated "REAR HDD 1")

2.Unlatch and carefully slide out the old BBU carrier.

3.Insert and carefully slide in the new BBU carrier, and latch it closed

BDA X3-2L/X4-2L Storage Cell nodes without the Remote Battery:

Replace the existing HBA BBU with a remote-mounted battery kit (part 7060020) following the CAP detailed in MOS Note 1561949.1.
This includes a remote battery assembly (part 7057184).

OBTAIN CUSTOMER ACCEPTANCE WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

These should be done in co-operation with the customer’s administrator to complete the procedure and verification prior to the field engineer leaving the customer site.

1. Press the power button on the front of the server to power on the unit.

2. During boot, monitor the graphics console through either ILOM javaconsole or the local KVM.

  To connect to the console through ILOM:
    a. From the ILOM Web browser (preferred):
        Access the “Remote Control → Redirection” tab and then click on the “Launch Remote Console” button.
        (On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen).

    b. From the ILOM CLI:

        → start /SP/console

Watch in particular, the LSI controller BIOS while it is loading. If it gives a warning message regarding drives with preserved cache, then choose “D” to discard the cache and continue.
This is not an issue as the disk will get re-synced after boot by HDFS. If it gives a warning message regarding drives are in write-through mode due to a low battery, then choose to continue.

The boot should continue normally after up to the login prompt.

Note: If using the ILOM serial console to monitor the boot, there may be a long pause during subsequent boot steps before the login prompt displays, as the default console is the graphics, and portions of the boot messages will only go to the graphics screen and not display on the serial console.

3. Once full boot is completed you should be able to login as ‘root’ user and verify the new battery is seen and is charging.

# /opt/oracle/bda/bin/MegaCli64 -adpbbucmd -a0

Note: It may take up to 24 hours for the new BBU battery to be charged sufficiently to be detected by the HBA, and may report as missing/absent until then. The next steps can be done while waiting, if there is concern of a bad replacement battery then the command can be repeated until it is verified to be seen by the HBA which may take up to 24 hours.

4. Set all logical drives cache policy to WriteBack cache mode:

# /opt/oracle/bda/bin/MegaCli64 -ldsetprop wb -lall -a0

5. Verify the current cache policy for all logical drives is now using WriteBack cache mode:

# /opt/oracle/bda/bin/MegaCli64 -ldpdinfo -a0 | grep BBU

Note: .If the current cache policy is WriteThrough mode, and not WriteBack, then check the status of the battery again:

# /opt/oracle/bda/bin/MegaCli64 -adpbbucmd -getbbustatus -a0|grep Battery
...
BatteryType: iBBU08
Battery State : Operational
Battery Pack Missing : No
Battery Replacement required : No

If the "Battery State" is anything other than "Operational" or "Optimal" (exact term depends on image version), investigate and correct the problem before continuing. The current policy will revert to WriteBack after the battery relearn and charge process has reached sufficient charge.  As noted in step 3, if it is reporting missing because it has insufficient charge, then wait before verifying.

6. Inform the customer until charging has completed, bdacheckhw will report the following battery state ERROR:

   SUCCESS: Correct disk controller battery type : iBBU
   ERROR: Wrong disk controller battery state : Degraded(Need Attention)
   INFO: Expected disk controller battery state : Optimal or Operational


They can monitor the status:

# /opt/oracle/bda/bin/MegaCli64 -AdpBbuCmd -a0 | grep -B2 "Charging Status"

  BBU Firmware Status:
    Charging Status : Charging

When charging has completed, the status shows:

# /opt/oracle/bda/bin/MegaCli64 -AdpBbuCmd -a0 | grep -B2 "Charging Status"

 BBU Firmware Status:
   Charging Status : None

 And bdacheckhw will no longer report the error:

   SUCCESS: Correct disk controller battery type : iBBU
   SUCCESS: Correct disk controller battery state : Optimal


7. Once the hardware is verified as up and running, the Customer's system administrator will need to verify the BDA services are up following the startup procedures for Big Data Appliance detailed in MOS Note 2099858.1

PARTS NOTE:
371-4982 6Gigabit SAS RAID PCI Battery - Module ( LION), BBU-08. Battery only, use for direct attachment to HBA.
7050794 6Gigabit SAS RAID PCI Battery - Module ( LION), BBU-08, RoHS2013. Battery only, use for direct attachment to HBA.
7057184 6Gigabit SAS RAID PCI Battery - Remote Mount Assembly (LION, BBU-08). Battery mounted on disk tray sled, use for remote battery replacement, includes battery 7050794.
7060020 Remote Mount Battery Assembly Kit. Use to upgrade a compatible system from direct attachment to remote battery. The kit includes all parts for 1U and 2U systems, and includes 1x 7057184 battery assembly. For BDA, the 1U cable parts can be discarded.

References

<NOTE:2099858.1> - Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance
Oracle ILOM 3.1 documentation library - http://docs.oracle.com/cd/E24707_01/index.html
LSI MegaRAID User's Guide - https://www.broadcom.com/support/oem/oracle/6gb/sg_x_sas6-r-int-z

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback