Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2359496.1
Update Date:2018-05-30
Keywords:

Solution Type  Technical Instruction Sure

Solution  2359496.1 :   How to Replace an Exadata X7-2 Storage Cell Server Disk Backplane  


Related Items
  • Exadata X7-2 Hardware
  •  
  • Exadata X7-8 Hardware
  •  
  • Oracle SuperCluster M8 Hardware
  •  
  • Zero Data Loss Recovery Appliance X7 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Exadata internal only for Oracle support engineers use and approved HW partners

Applies to:

Exadata X7-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X7-8 Hardware - Version All Versions to All Versions [Release All Releases]
Zero Data Loss Recovery Appliance X7 Hardware - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster M8 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

How to Replace an Exadata X7-2 Storage Cell Server Disk Backplane.

Solution

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?:  Exadata X7-2 Training.

TIME ESTIMATE: 120 minutes

TASK COMPLEXITY: 3



FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Exadata X7-2 Storage Cell Server Disk Backplane needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

IMPORTANT NOTE TO TSC ENGINEER:  CUT & PASTE the “CUSTOMER ACTIVITY” sections of the Pre-Replacement and Post-Replacement steps into a SR Note and ensure the customer is aware to do these steps prior to the scheduled field engineer activity, and during and after the replacement activity.

CUSTOMER ACTIVITY:

Shutdown of the storage cell is required prior to the part replacement:

Complete Steps 1 to 6 of Note ID 1188080.1 “Steps to shut down or reboot an Exadata storage cell without affecting ASM”.

Where noted, the SQL, CellCLI and commands under ‘root’ should be run by the Customers DBA, unless the Customer provides login access to the Field Engineer

These steps are also provided in the documentation:

   https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-storage-servers.htm#DBMMN-GUID-5903DF08-A052-4F82-9A87-23A3D6F02DA2

 

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Prepare the Server for Service

The customer should have already prepared the server and powered it off.  If not, provide them the instructions in the previous section.

1. Log into the ILOM check the fruid container values and sync them if needed. To avoid mismatched fruid values causing a failure after a disk backplane replacement the fruid data should be confirmed to have matching data.  The Disk Backplane is the primary container so the Backup1 (MB) and Backup2 (PS0) must have valid values that are the same, in order for the replacement disk backplane to be updated to the correct values automatically.

Go into ILOM restricted mode and use the showpsnc command to check this:

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

[(restricted_shell) exa1celadm01-ilom:~]# showpsnc
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0

Element           | Primary           | Backup1           | Backup2
------------------+-------------------+-------------------+-------------------
PPN                 7338449             7338449             7338449
PSN                 1736XC202N          1736XC202N          1736XC202N
Profile             0x00010000          0x00010000          0x00010000
Product Name        ORACLE SERVER X7-2L ORACLE SERVER X7-2L ORACLE SERVER X7-2L
RFID SN             341A583DE5800000000233C6 341A583DE5800000000233C6 341A583DE5800000000233C6
[(restricted_shell) exa1celadm01-ilom:~]#

The above example shows a system with all three containers properly in sync. If the output from the system does not show all of the containers with matching values then you should reset the SP and then re-check the values again. An ILOM reset will attempt to auto-populate the matching values if one container is out of sync.

-> reset /SP
Are you sure you want to reset /SP (y/n)? y
Performing reset on /SP

2. After an ILOM reset if the Backup1 and Backup2 containers match then proceed with the following steps to replace the disk backplane. If these two containers do not match then DO NOT proceed with the replacement yet. Contact TSC for further assistance.

If the containers do not match you will need to use the "copypsnc" command from service or escalation mode to copy the data from the good container so that the Backup1 and Backup2 containers match (Primary is the disk backplane and we are about to replace this so it is not as important at this step). If you are unfamiliar with this process and require assistance please reference the steps for using "copypsnc" to fix the serial number detailed in the "How to update product serial number on systems which implement TLI functionality (Doc ID 1280913.1)" and "How to access service mode and escalation mode on ILOM 3.x and later platforms (Doc ID 1019946.1)". After the fruid data in the Backup1 and Backup2 containers have been confirmed to match proceed with the following steps.

3. Extend the server to the maintenance position

4. Disconnect the power cords from the power supplies.

5. Attach an anti-static wrist strap to your wrist and to a metal area on the chassis or the rack.

6. Remove the server top cover. Use a Torx T10 screwdriver to unlock the release button latch.

Caution - Ensure that all power is removed from the server before removing or installing the disk backplane. You must disconnect the power cables from the system before performing these procedures.

 

Caution - These procedures require that you handle components that are sensitive to electrostatic discharge. This sensitivity can cause the components to fail. To avoid damage, ensure that you follow anti-static practices as described in Electrostatic Discharge Safety.

Removing the Disk Backplane

1. Open the latch on the SuperCap tray and swing it up from the air baffle.

2. Lift up and remove the air baffle.

3. Remove the Fan Modules and Fan Tray:

  1. For each of the fan modules using your forefinger and thumb, lift the fan module straight up and out of the chassis and set them aside on an antistatic mat.
  2. With all fan modules removed, using a Torx T25 screwdriver, loosen the three spring-mounted screws that secure the fan tray to the server chassis.
  3. Lift the fan tray from the server and set aside.

4. Remove the Disk Backplane from the server.

  1. Pull all storage drives out far enough to disengage them from the disk backplane.

    Note - It is not necessary to completely remove the storage drives from the server; simply pull them out far enough to disengage them from the disk backplane. If you do remove the storage drives from the server, record their locations so that you can re-install them in the same locations.
      
  2. Disconnect the power cable from the disk backplane.
  3. Disconnect the data cables from the disk backplane by pressing the latch on the cable connector and then pulling out the connector. There will be 3 SAS cable ends. Note the cable connection locations so that the 3 cables may be reconnected to their original locations.
  4. Disconnect the temperature sensor cable from the disk backplane.
  5. Disconnect the auxiliary signal cable from the disk backplane.
  6. Using a Torx T15 screwdriver, loosen the right-side and left-side spring-mounted screws that secure the disk backplane to the chassis.
  7. Using a Torx T25 screwdriver, loosen the spring-mounted screws that secure the backplane bracket to the chassis, then lift the bracket from the server.
  8. Lift the disk backplane up to release it from the standoff hooks and out of the chassis.
  9. Place the disk backplane on an antistatic mat.

 

Installing the Disk Backplane

1. Install the Disk Backplane.

  1. Unpack the replacement disk backplane.
  2. Lower the disk backplane into the server, and position it to engage the standoff hooks.
  3. Lower the backplane bracket into the server, then using a Torx T25 screwdriver, tighten the spring-mounted screws to secure the bracket to the chassis.
  4. Using a Torx T15 screwdriver, tighten the right-side and left-side spring-mounted screws to secure the disk backplane to the chassis.
  5. Reconnect the auxiliary signal cable to the disk backplane.
  6. Reconnect the temperature signal cable to the disk backplane.
  7. Reconnect the SAS data cables to the disk backplane. There will be 3 SAS data cable ends. Make sure the cables are reconnected to their original locations:
     - Cable #1 is the lowest connector corresponding to disk slots 0-3
     - Cable #2 is the middle connector corresponding to disk slots 4-7
     - Cable #3 is the upper connector corresponding to disk slots 8-11
  8. Re-install all of the storage drives into the server making sure that they are installed into their original locations, and fully seated and latched.

2. Replace the Fan Modules and Fan Tray:

  1. Lower the fan tray into the server.
  2. Using a Torx T25 screwdriver, tighten the three spring-mounted screws that secure the fan tray to the server chassis.
  3. Install each of the fan modules into the server. The fan modules are keyed to ensure that they are installed in the correct orientation.
  4. Press down on each fan module to fully seat the fan module.

3. Lower the air baffle back into place over the DIMMs and CPU's.

4. Lower and clip the SuperCAP tray back into the air baffle.

 

Return the Server to Operation

1. Install the server top cover. Use a Torx T10 screwdriver to lock the release button latch.
2. Reconnect the power cords to the server power supply and connect any other cables to their original locations.
3. Return the server to the normal rack position.
4. Once the power cords have been re-attached and the ILOM has booted you will see a slow blink on the green LED for the server. Power on the server by pressing the power button on the front of the unit.
5. Connect to the server console via the ILOM and monitor the boot.
      By default the ILOM serial console displays the primary console output.
      In the event of unexpected boot behavior, it is advisable to connect to both ILOM serial and ILOM graphics consoles at the same time and monitor.

6. Check and set the system serial number/fruid data:

a. Login to the ILOM as root and then enter the restricted shell to check the psnc values. Follow the example below to enter restricted shell and use the "showpsnc" command:

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

[(restricted_shell) exa1celadm01-ilom:~]# showpsnc
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0

Element           | Primary           | Backup1           | Backup2
------------------+-------------------+-------------------+-------------------
PPN                 7338449             7338449             7338449
PSN                 0000000000          1736XC202N          1736XC202N
Profile             0x00010000          0x00010000          0x00010000
Product Name        ORACLE SERVER X7-2L ORACLE SERVER X7-2L ORACLE SERVER X7-2L
RFID SN             341A583DE5800000000233C6 341A583DE5800000000233C6 341A583DE5800000000233C6
[(restricted_shell) exa1celadm01-ilom:~]#

 

 

b. The above example shows a system with the Primary container not in sync after disk backplane replacement. If the output from the system does not show all of the containers with matching values then you should reset the SP and then re-check the values again. An ILOM reset will attempt to auto-populate the matching values if one container is out of sync.  

-> reset /SP
Are you sure you want to reset /SP (y/n)? y
Performing reset on /SP

If after the ILOM reset all three entries match, this step is done. If they do not match, the containers will need manually programmed - contact the TSC for further assistance.

 

Manual serial number identity programming steps:

  1. If the containers don't match you must enter escalation or service mode to fix it.
  2. Contact the TSC to request an escalation password (service mode will work also if just the copypsnc command ends up needing to be used, if the setpsnc command is needed escalation mode is required. setpsnc is not covered in this procedure).
  3. Provide your TSC contact the output from the following ILOM commands- "version", "show /SYS product_serial_number", and "show /SP/clock". If the product_serial_number information does not give good output then provide the showpsnc output that was seen in step b above as well.
  4. The TSC will provide an escalation password that is made up of 32 short words. Follow the example below to create a new user with the 'Service' role assigned. The Service role is required to access service or escalation modes. In the following example we will create a user named 'escuser' with the service role.
    -> cd /SP/users
    /SP/users
    -> create escuser
    Creating user...
    Enter new password: ********
    Enter new password again: ********
    Created /SP/users/escuser
    -> set escuser role=aucros
    Set 'role' to 'aucros'
    -> show escuser
    /SP/users/escuser
    Targets:
    ssh
    Properties:
    role = aucros
    password = *****
  5. Set the check_physical_presence to false and then exit from the ILOM so that you can login as the newly created user.
    -> set /SP check_physical_presence=false
    Set 'check_physical_presence' to 'false'
    -> show /SP check_physical_presence
    /SP
    Properties:
    check_physical_presence = false

    -> exit
  6. Login using the escuser login and enter escalation mode using the password that was provided by the TSC.
    exa1celadm01-ilom login: escuser
    Password:

    Oracle(R) Integrated Lights Out Manager

    Version 4.0.0.20 r120817

    Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

    Warning: HTTPS certificate is set to factory default.

    Hostname: exa1celadm01-ilom

    -> cd /SP/users/ecsuser/escalation
    -> set SESSION mode=escalation                            
    Password:**** **** **** **** **** *** *** **** **** **** **** **** **** **** **** **** *** *** **** *** **** **** **** *** **** **** *** **** *** *
    Short form password is:  BILL CRAG HIP

    [(escalation_mode) exa1celadm01-ilom:~]#
  7. Use the showpsnc command to confirm the current container values. Confirm that one of the backup containers has a serial number (the value on the PSN line) that matches the system serial number. The system serial number can be checked by comparing to the serial number RFID tag on the front left hand side of the server. After confirming that there is a valid fruid backup use the copypsnc command to write the good data from the backup1 to the primary container on the disk backplane. The following example shows copying from Backup1 to the Primary but you could also copy from Backup2 if needed.
    [(escalation_mode) exa1celadm01-ilom:~]# showpsnc

    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 7338449             7338449             7338449
    PSN                 0000000000          1736XC202N          1736XC202N
    Profile             0x00010000          0x00010000          0x00010000
    Product Name        ORACLE SERVER X7-2L ORACLE SERVER X7-2L ORACLE SERVER X7-2L
    RFID SN             341A583DE5800000000233C6 341A583DE5800000000233C6 341A583DE5800000000233C6
    [(escalation_mode) exa1celadm01-ilom:~]#
    [(escalation_mode) exa1celadm01-ilom:~]# copypsnc Backup1 Primary
    [(escalation_mode) exa1celadm01-ilom:~]# showpsnc

    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 7338449             7338449             7338449
    PSN                 1736XC202N          1736XC202N          1736XC202N
    Profile             0x00010000          0x00010000          0x00010000
    Product Name        ORACLE SERVER X7-2L ORACLE SERVER X7-2L ORACLE SERVER X7-2L
    RFID SN             341A583DE5800000000233C6 341A583DE5800000000233C6 341A583DE5800000000233C6
    [(escalation_mode) exa1celadm01-ilom:~]# exit


  8. At this point if all of the fruid containers match and have the correct serial number data this step is done. If more than one of the fruid containers had non-valid entries then the copypsnc command should be used to copy over the valid data to the other container that is not valid. (ie. "copypsnc Backup1 Primary" to copy Backup1 to Primary). After confirming all fruid data is correct reset the ILOM to confirm that the fruid data persists through a reboot and remove the escalation user if needed.
    -> reset /SP                     
    Are you sure you want to reset /SP (y/n)? y
    Performing reset on /SP
    ..........

    ***login as the root user again and check the fruid data***

    -> set SESSION mode=restricted

    WARNING: The "Restricted Shell" account is provided solely
    to allow Services to perform diagnostic tasks.

    [(restricted_shell) exa1celadm01-ilom:# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 7338449             7338449             7338449
    PSN                 1736XC202N          1736XC202N          1736XC202N
    Profile             0x00010000          0x00010000          0x00010000
    Product Name        ORACLE SERVER X7-2L ORACLE SERVER X7-2L ORACLE SERVER X7-2L
    RFID SN             341A583DE5800000000233C6 341A583DE5800000000233C6 341A583DE5800000000233C6
    [(restricted_shell) exa1celadm01-ilom:~]# exit


    -> cd /SP/users
    /SP/users
    -> delete escuser
    Are you sure you want to delete /SP/users/escuser (y/n)? y
    Deleted /SP/users/escuser
  9. If trouble is encountered during any of the steps of accessing escalation mode and fixing the fruid containers please contact the TSC for assistance.

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

FIELD SERVICE ENGINEER and CUSTOMER ACTIVITY:

1. Verify all expected hardware is visible to the server and the fault is cleared. Assistance from the customer for server login access will be required.

2. Verify there are no outstanding faults in ILOM:

# ipmitool sunoem cli 'show faulty'
Connected. Use ^D to exit.
-> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------

-> Session closed
Disconnected
#

3. Verify all the expected disk devices are present. For 1/8th rack Storage Cells there will be 6 disks, for all others there will be 12 disks:

# lsscsi | grep MR
[8:2:0:0] disk AVAGO MR9361-16i 4.72 /dev/sdc
[8:2:1:0] disk AVAGO MR9361-16i 4.72 /dev/sdd
[8:2:2:0] disk AVAGO MR9361-16i 4.72 /dev/sde
[8:2:3:0] disk AVAGO MR9361-16i 4.72 /dev/sdf
[8:2:4:0] disk AVAGO MR9361-16i 4.72 /dev/sdg
[8:2:5:0] disk AVAGO MR9361-16i 4.72 /dev/sdh
[8:2:6:0] disk AVAGO MR9361-16i 4.72 /dev/sdi
[8:2:7:0] disk AVAGO MR9361-16i 4.72 /dev/sdj
[8:2:8:0] disk AVAGO MR9361-16i 4.72 /dev/sdk
[8:2:9:0] disk AVAGO MR9361-16i 4.72 /dev/sdl
[8:2:10:0] disk AVAGO MR9361-16i 4.72 /dev/sdm
[8:2:11:0] disk AVAGO MR9361-16i 4.72 /dev/sdn

4. Verify all expected logical drives are present and state 'Optl' (Optimal). For 1/8th rack Storage Cells there will be 6 disks, for all others there will be 12 disks:

# /opt/MegaRAID/storcli/storcli64 /c0/vall show

5. Verify there are no outstanding alerts in the Cell:

# cellcli -e list alerthistory

6. Re-activate the Storage Cell grid disks. Follow Steps 7 to 10 of Note ID 1188080.1 “Steps to shut down or reboot an Exadata storage cell without affecting ASM”.

These steps are also provided in the documentation:
 https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-storage-servers.htm#DBMMN21128

 


PARTS NOTE:

7341141 [F] 12-Slot Disk Backplane

 

REFERENCE INFORMATION:

Oracle Exadata Database Machine Maintenance Guide: https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-storage-servers.htm#DBMMN21128 

Oracle Server X7-2L Documentation: https://docs.oracle.com/cd/E72463_01/index.html

Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback