Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1524308.1
Update Date:2018-04-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  1524308.1 :   How to Replace the internal USB Recovery Drive on an Exadata Storage Server (prior to X5-2L)  


Related Items
  • Exadata X4-2 Hardware
  •  
  • Exadata X3-2 Hardware
  •  
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Oracle SuperCluster T5-8 Full Rack
  •  
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Exadata X4-2 Quarter Rack
  •  
  • Exadata X3-2 Half Rack
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata X4-2 Half Rack
  •  
  • Exadata Database Machine X2-2 Full Rack
  •  
  • Exadata X3-2 Full Rack
  •  
  • Exadata X4-8 Hardware
  •  
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Exadata X3-8 Hardware
  •  
  • Zero Data Loss Recovery Appliance X4 Hardware
  •  
  • Exadata X4-2 Full Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • Exadata X3-8b Hardware
  •  
  • Exadata X4-2 Eighth Rack
  •  
  • Exadata X3-2 Quarter Rack
  •  
  • Exadata Database Machine V2
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
  • SPARC SuperCluster T4-4
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  


How to Replace a Faulty Internal USB Recovery Drive on an Exadata Storage Server (prior to X5-2L).

In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Exadata internal only & HW support partners

Applies to:

Exadata X3-2 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 Half Rack - Version All Versions and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata X3-8 Hardware - Version All Versions and later
Information in this document applies to any platform.
This document applies to Storage Servers prior to X5-2L. For X5-2L (and later) procedure, refer to Doc ID 2011874.1.

Goal

 An internal USB recovery thumb drive in an Exadata Storage Server needs replacement.

Solution

DISPATCH INSTRUCTIONS WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Exadata trained
TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 2

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW:

An internal USB recovery thumb drive in an Exadata Storage Server needs replacement.

Note This document applies to Storage Servers prior to X5-2L. For X5-2L (and later) procedure, refer to Doc ID 2011874.1.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

The instructions below assume the customer DBA is available and working with the field engineer onsite to manage the host OS and DB/ASM services. They are provided here to allow the FE to have all the available steps needed when onsite, and can be done by the FE if the customer DBA wants or allows or needs help with their steps.

1. Locate the server in the rack being serviced. Exadata Storage Servers are identified by a number 1 through 18, where 1 is the lowest most Storage Server in the rack installed in RU2, counting up to the top of the rack.

Turn on the locate indicator light ‘on’ for easier identification of the server being repaired. If the server number has been identified then the Locate Button on the front panel may be pressed. To turn on remotely, use either of the following methods:

From a login to the CellCli:

CellCli> alter cell led on

From a login to the server’s ILOM:

-> set /SYS/LOCATE value=Fast_Blink
Set 'value' to 'Fast_Blink

From a login to the server’s ‘root’ account:

# ipmitool sunoem cli ‘set /SYS/LOCATE value=Fast_Blink’
Connected. Use ^D to exit.
-> set /SYS/LOCATE value=Fast_Blink
Set 'value' to 'Fast_Blink'

-> Session closed
Disconnected

 

2. Shutdown the node for which the USB stick requires replacement.

a. For Extended information on this section check MOS Note:
ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM

This is also documented in the Exadata Maintenance Guide section titled "Maintaining Exadata Storage Servers" subsection "Shutting Down Exadata Storage Server" available on the customer's cell server image in the /opt/oracle/cell/doc directory and online here:

https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-storage-servers.htm#DBMMN21129

In the following examples the SQL commands should be run by the Customers DBA prior to doing the hardware replacement. These should be done by the field engineer only if the customer directs them to, or is unable to do them. The cellcli commands will need to be run as root.

Note the following when powering off Exadata Storage Servers:

  • Verify there are no other storage servers with disk faults. Shutting down a storage server while another disk is fails may result in the running database processes and Oracle ASM to crash if it loses both disks in the partner pair when this server’s disks go offline.

  • Powering off one Exadata Storage Server with no disk faults in the rest of the rack will not affect running database processes or Oracle ASM.

  • All database and Oracle Clusterware processes should be shut down prior to shutting down more than one Exadata Storage Server. Refer to the Exadata Owner’s Guide for details if this is necessary.

b. ASM drops a disk shortly after it/they are taken offline. Powering off or restarting Exadata Storage Servers can impact database performance if the storage server is offline for longer than the ASM disk repair timer to be restored. The default DISK_REPAIR_TIME attribute value of 3.6hrs should be adequate for replacing components, but may have been changed by the Customer. To check this parameter, have the Customer log into ASM and perform the following query:

SQL> select dg.name,a.value from v$asm_attribute a, v$asm_diskgroup dg where a.name = 'disk_repair_time' and a.group_number = dg.group_number;

As long as the value is large enough to comfortably replace the components being replaced, then there is no need to change it.

c. Check if ASM will be OK if the grid disks go OFFLINE.

# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
...sample ...
DATA_CD_09_cel01 ONLINE Yes
DATA_CD_10_cel01 ONLINE Yes
DATA_CD_11_cel01 ONLINE Yes
RECO_CD_00_cel01 ONLINE Yes
RECO_CD_01_cel01 ONLINE Yes
...repeated for all griddisks....

If one or more disks return asmdeactivationoutcome='No', then wait for some time and repeat step #2. Once all disks return asmdeactivationoutcome='Yes', proceed to the next step.


d. Run cellcli command to Inactivate all grid disks on the cell that needs to be powered down for maintenance. (this could take up to 10 minutes or longer)

# cellcli
...sample ...
CellCLI> ALTER GRIDDISK ALL INACTIVE
GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
GridDisk DATA_CD_02_dmorlx8cel01 successfully altered
GridDisk RECO_CD_00_dmorlx8cel01 successfully altered
GridDisk RECO_CD_01_dmorlx8cel01 successfully altered
GridDisk RECO_CD_02_dmorlx8cel01 successfully altered
...repeated for all griddisks...

e. Execute the command below and the output should show asmmodestatus='UNUSED' or 'OFFLINE' and asmdeactivationoutcome=Yes for all griddisks once the disks are offline and inactive in ASM.

CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
DATA_CD_00_dmorlx8cel01 inactive OFFLINE Yes
DATA_CD_01_dmorlx8cel01 inactive OFFLINE Yes
DATA_CD_02_dmorlx8cel01 inactive OFFLINE Yes
RECO_CD_00_dmorlx8cel01 inactive OFFLINE Yes
RECO_CD_01_dmorlx8cel01 inactive OFFLINE Yes
RECO_CD_02_dmorlx8cel01 inactive OFFLINE Yes
...repeated for all griddisks...
 

f. Once all disks are offline and inactive, the customer may shutdown the Cell using the following command:

# shutdown -hP now

 When powering off Exadata Storage Servers, all storage services are automatically stopped.

 

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

  1. Slide out the server for maintenance. Do not remove any cables prior to sliding the server forward, or the loose cable ends will jam in the cable management arms. Take care to ensure the cables and Cable Management Arm is moving properly. Refer to Note 1444683.1 for CMA handling training.

  2. Remove the AC power cords prior to removing the server’s top cover.

  3. Remove and replace the USB thumb drive from the internal USB port.

    On Exadata Storage Servers based on Sun Fire X4275 and Sun Fire X4270M2, this is located between the coin cell battery holder and the PCI Riser 2 slot, underneath the rear of the Flash F20 cards attached to PCI Riser 2. PCI Riser 2 should not need removal but may be removed for easier hand access.

    On Exadata Storage Servers based on Sun Server X3-2L and Sun Server X4-2L, this is located on the system board assembly's Rear I/O Board between PCIe slots 3 and 4 and are mounted vertically close to the rear of the chassis. Insert the USB stick in the left-most internal USB slot (as viewed from the rear).

  4. Replace the server’s top cover and re-attach the AC power cords. ILOM will take up to 2 minutes to boot.

  5. Slide the server back into the rack.

  6. After ILOM has booted, power on the server by pressing the power button, and then connect to the server’s console.
    To connect to the console through ILOM:

      1. From the ILOM Web browser (preferred):
        Access the “Remote Control → Redirection” tab and then click on the “Launch Remote Console” button. (On ILOM 3.1.x systems, the console button can be launched from the initial Summary Information screen).

      2. From the ILOM CLI:

        → start /SP/console
      3. On Exadata V2 and X2-2 Storage Servers with a local KVM and Keyboard/Monitor tray, open the tray and select the appropriate Exadata Storage Server hostname from the “Target Devices” list, and then select the “Console” button.

  7. From the console and monitor the system booting. The server should boot from the primary hard disk. This will be mentioned in the Exadata splash screen.

    Note there may be a long pause between screen outputs on the ILOM serial console during subsequent boot steps as the default console is the graphics, and the Exadata boot splash screen will not display. Once full boot is completed though, you should be able to login here.

  8. After the Storage Server is booted, login as ‘root’ user.

  9. Run the following to copy the recovery image and configuration data to the new USB stick:

    # cd /opt/oracle.SupportTools
    # ./make_cellboot_usb -verbose -force


    Ignore any messages such as the following they do not prevent the action completing:
    WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.

    It may be required to stop the MS service to run this command.
    # cellcli -e alter cell shutdown services MS

    Remember to re-enable this once the make_cellboot_usb has completed.
    # cellcli -e alter cell startup services MS

  10. Set the next boot to forcibly stop at the BIOS setup menu:

    # ipmitool chassis bootdev bios
    Set Boot Device to bios
  11. Reboot the server with the following command:

    # shutdown -r now
  12. Monitor the system booting again. The system should go automatically into the BIOS Setup screen.

  13. Once the BIOS Setup screen is displayed on the console, use the arrow keys to navigate to the Boot screen and then the ‘Boot Order Device Priority’ screen. Set the USB stick (Unigen) to 1 in the order, followed by “RAID HBA Adapter” followed by the onboard network PXE devices. Press “Esc” to exit the ‘Boot Order Device Priority’ screen

  14. Press “F10” or “Ctrl-S” to save and exit, or navigate to the Exit screen and select “Save Changes and Exit”

  15. The server will reset and boot again. This time it should load the Exadata splash screen (grub) from the USB stick and indicate as such.

    a. Press any key during the 3 seconds the splash screen displays to stop grub, and verify the menu contains the normal hard disk boot options and the bottom entries for the USB stick rescue options.

    b. Select the first hard disk entry, and press enter to boot it. The server should boot normally up to the login prompt.

    Note if using the ILOM serial console, the Exadata splash screen (grub) will not display and cannot be stopped and verified. You should see it automatically start booting the hard disk image after the 3 second timeout, it will stop display for a period of time as Linux directs its output to the graphics console, and after a few minutes will return the login prompt.


OBTAIN CUSTOMER ACCEPTANCE WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Steps 8 through 15 should be done in co-operation with the customer’s administrator to complete the procedure and verification prior to the field engineer leaving the customer site. The following steps should be done by the customer's administrator to return the disks to service:

  1. Activate the grid disks:

    # cellcli
        …    
    CellCLI> alter griddisk all active
    GridDisk DATA_CD_00_dmorlx8cel01 successfully altered
    GridDisk DATA_CD_01_dmorlx8cel01 successfully altered
    GridDisk DATA_CD_02_dmorlx8cel01 successfully altered
    GridDisk RECO_CD_00_dmorlx8cel01 successfully altered
    GridDisk RECO_CD_01_dmorlx8cel01 successfully altered
    GridDisk RECO_CD_02_dmorlx8cel01 successfully altered
    ...etc...
  2. Issue the command below and all disks should show 'active':    

    CellCLI> list griddisk
    DATA_CD_00_dmorlx8cel01         active
    DATA_CD_01_dmorlx8cel01         active
    DATA_CD_02_dmorlx8cel01         active
    RECO_CD_00_dmorlx8cel01         active
    RECO_CD_01_dmorlx8cel01         active
    RECO_CD_02_dmorlx8cel01         active
    ...etc...
  3. Verify all grid disks have been successfully put online using the following command. Wait until 'asmmodestatus' is in status 'ONLINE' for all grid disks. The following is an example of the output early in the activation process.

    CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
    DATA_CD_00_dmorlx8cel01 active ONLINE Yes
    DATA_CD_01_dmorlx8cel01 active ONLINE Yes
    DATA_CD_02_dmorlx8cel01 active ONLINE Yes
    RECO_CD_00_dmorlx8cel01 active SYNCING Yes
    RECO_CD_01_dmorlx8cel01 active ONLINE Yes
    ...etc...


    Notice in the above example that 'RECO_CD_00_dmorlx8cel01' is still in the 'SYNCING'  process. Oracle ASM synchronization is only complete when ALL grid disks show ‘asmmodestatus=ONLINE’.  This process can take some time depending on how busy the machine is, and has been while this individual server was down for repair.

PARTS NOTE:
371-4743 4GB USB 2.0 Flash Drive – Exadata Storage Servers based on Sun Fire X4275 and Sun Fire X4270M2.

371-5002 4GB USB 2.0 Flash Drive – Exadata Storage Servers based on Sun Server X3-2L.

371-5002 4GB USB 2.0 Flash Drive or 7090170 / 7318217 8GB USB 2.0 Flash Drive – Exadata Storage Servers based on Sun Server X4-2L.

REFERENCE INFORMATION:

Oracle ILOM 3.0 documentation library - https://docs.oracle.com/cd/E19860-01/index.html

Oracle ILOM 3.1 documentation library - https://docs.oracle.com/cd/E24707_01/index.html

Exadata Maintenance Guide is available here:

https://docs.oracle.com/cd/E80920_01/DBMMN/toc.htm

 

References

<NOTE:2011874.1> - How to Replace an Exadata X5-2/X6-2 Storage Server Internal USB drive

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback