Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2359366.1
Update Date:2018-05-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  2359366.1 :   How to Replace an Exadata X7-2 Compute Node Server Power Supply  


Related Items
  • Exadata X7-2 Hardware
  •  
  • Zero Data Loss Recovery Appliance X7 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Exadata internal only for Oracle support engineers use and approved HW partners

Applies to:

Exadata X7-2 Hardware - Version All Versions and later
Zero Data Loss Recovery Appliance X7 Hardware - Version All Versions and later
Information in this document applies to any platform.

Goal

How to Replace an Exadata X7-2 Compute Node Server Power Supply.

Solution


DISPATCH INSTRUCTIONS:

WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?:
Exadata X7-2 Training

TIME ESTIMATE: 30 minutes

TASK COMPLEXITY: 2

 

FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS:

PROBLEM OVERVIEW: An Exadata X7-2 Compute Node Server Power Supply needs replacement

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:

IMPORTANT NOTE TO TSC ENGINEER: CUT & PASTE the “CUSTOMER ACTIVITY” sections of the Pre-Replacement and Post-Replacement steps into a SR Note and ensure the customer is aware to do these steps prior to the scheduled field engineer activity, and during and after the replacement activity.

CUSTOMER ACTIVITY:

The power supply is hot-swappable and can be replaced when the power is on.

The location of the failed power supply can be verified with the following command:

# dbmcli -e list alerthistory

# dbmcli -e list alerthistory <alert ID> detail

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?:

Prepare the Server for Service

1. Log into the ILOM check the fruid container values and sync them if needed.

To avoid mismatched fruid values causing a failure after a power supply replacement the fruid data should be confirmed to have matching data in at least the Primary (DBP) and Backup1 (MB) containers so that the power supply will have it's container updated automatically after replacement. Go into ILOM restricted mode and use the "showpsnc" command to check this.  

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

[(restricted_shell) exa1dbadm01-ilom:~]# showpsnc
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0

Element           | Primary           | Backup1           | Backup2
------------------+-------------------+-------------------+-------------------
PPN                 7338405             7338405             7338405
PSN                 1735XC3004          1735XC3004          1735XC3004
Profile             0x00010000          0x00010000          0x00010000
Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
RFID SN             341A583DE5800000000232F8 341A583DE5800000000232F8 341A583DE5800000000232F8
[(restricted_shell) exa1dbadm01-ilom:~]# exit

The above example shows a system with all three containers properly in sync. If the output from the system does not show all of the containers with matching values then you should reset the SP and then re-check the values again. An ILOM reset will attempt to auto-populate the matching values if one container is out of sync.

-> reset /SP
Are you sure you want to reset /SP (y/n)? y
Performing reset on /SP

2. After an ILOM reset if the Primary and Backup1 containers match then proceed with the following steps to replace the power supply. If these two containers do not match then DO NOT proceed with the replacement yet.  Contact TSC for further assistance.

If the containers do not match you will need to use the "copypsnc" command from service or escalation mode to copy the data from the good container so that the Primary and Backup1 containers match (Backup2 is the power supply 0 and we are about to replace this so it is not as important at this step). If you are unfamiliar with this process and require assistance please reference the steps for using "copypsnc" to fix the serial number detailed in the "How to update product serial number on systems which implement TLI functionality (Doc ID 1280913.1)" and "How to access service mode and escalation mode on ILOM 3.x and later platforms (Doc ID 1019946.1)". After the fruid data in the Primary and Backup1 containers have been confirmed to match proceed with the following steps.

 

Removing the Power Supply

Power supplies are hot-swappable and do not require you to power off the server.

Caution - These procedures require that you handle components that are sensitive to electrostatic discharge. This sensitivity can cause the component to fail. To avoid damage, ensure that you follow safe anti-static practices.

1. Locate the compute node that has the white LED lit that requires maintenance.

2. Locate the power supply with the amber fault LED on

3. If the service is to be performed while the system is up and running confirm that the second PSU is online and working properly.

4. Attach an anti-static wrist strap to your wrist and to a metal area on the chassis or the rack.

5. Unlatch the sliding attachment of the cable management arm (CMA) on the power supply side by pressing the green release tab, and slide it out from the server's outer slide rail.

6. Unlatch the lacer bar attachment of the CMA from the server's inner slide rail on the power supply side, and slide it to the right into itself to open the space for the power supply removal.

Caution - When disconnecting the CMA left-side connectors, support the CMA so that it does not hang down under its own weight and stress the right-side connectors; otherwise, the CMA might be damaged. You must continue to support the CMA until you have reconnected both of the left-side connectors.

7. Disconnect the power cord from the faulty power supply. Unwrap the velcro tie wrap from around the power cord.

Note - The fans of a failed power supply might still be spinning when the system is powered on. You can remove a power supply while the fans are still spinning.  

8. Grasp the power supply handle, push the power supply latch to the left and pull the power supply out of the chassis. If the power supply is being replaced hot while the system is up then care should be taken to make sure that AC power to the second power supply is not interrupted during the removal of the failed unit.

 

Installing a Power Supply

Caution - Always replace a faulty power supply with a power supply of the same type (model).

1. Remove the replacement power supply from its packaging and place it on an anti-static mat.

2. Align the replacement power supply with the empty power supply slot.

3. Slide the power supply into the bay until it is fully seated. You will hear an audible click when the power supply fully seats and the latch engages.

4. Reconnect the power cord to the power supply. Secure it with the attached velcro wrap.

5. Verify that the amber Fault-Service Required LED on the replaced power supply and the Fault-Service Required LEDs on the front and back panels of the server are no longer lit.

 

Return the Server to Operation

Note - After you have replaced Power Supply 0, you must reset the Oracle ILOM service processor (SP) to propagate the fruid identity data to the new power supply. Power Supply 1 does not contain fruid identity data, and therefore does not require an SP reset after replacement.

1. Slide out the lacer bar attachment of the CMA and latch it to the server's inner slide rail on the power supply side.

2. Re-insert the sliding attachment of the CMA into the server's outer slide rail on the power supply side. Make sure the attachment is in the slide rail grooves cleanly and can move freely in the rail when pushed, and the green release tab is sitting behind it.

3. Return the server to the normal rack position.

4. For Power Supply 0 replacements, check and set the system serial number/fruid data:

a. Login to the ILOM as root and then enter the restricted shell to check the psnc values. Follow the example below to enter restricted shell and use the showpsnc command:

-> set SESSION mode=restricted

WARNING: The "Restricted Shell" account is provided solely
to allow Services to perform diagnostic tasks.

[(restricted_shell) exa1dbadm01-ilom:~]# showpsnc
Primary: fruid:///SYS/DBP
Backup 1: fruid:///SYS/MB
Backup 2: fruid:///SYS/PS0

Element           | Primary           | Backup1           | Backup2
------------------+-------------------+-------------------+-------------------
PPN                 7338405             7338405             7338405
PSN                 1735XC3004          1735XC3004          0000000000
Profile             0x00010000          0x00010000          0x00010000
Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
RFID SN             341A583DE5800000000232F8 341A583DE5800000000232F8 341A583DE5800000000232F8
[(restricted_shell) exa1dbadm01-ilom:~]# exit

 

b. The above example shows a system with the Backup2 container not in sync after PSU replacement. If the output from the system does not show all of the containers with matching values then you should reset the SP and then re-check the values again. An ILOM reset will attempt to auto-populate the matching values if one container is out of sync.  Power Supply 1 does not contain fruid data, and therefore does not require an SP reset after replacement.

-> reset /SP
Are you sure you want to reset /SP (y/n)? y
Performing reset on /SP

If after the ILOM reset all three entries match, this step is done. If they do not match, the containers will need manually programmed - contact the TSC for further assistance.

 

Manual serial number identity programming steps:

  1. If the containers don't match you must enter escalation or service mode to fix it.
  2. Contact the TSC to request an escalation password (service mode will work also if just the copypsnc command ends up needing to be used, if the setpsnc command is needed escalation mode is required. setpsnc is not covered in this procedure).
  3. Provide your TSC contact the output from the following ILOM commands- "version", "show /SYS product_serial_number", and "show /SP/clock". If the product_serial_number information does not give good output then provide the showpsnc output that was seen in step b above as well.
  4. The TSC will provide an escalation password that is made up of 32 short words. Follow the example below to create a new user with the 'Service' role assigned. The Service role is required to access service or escalation modes. In the following example we will create a user named 'escuser' with the service role.
    -> cd /SP/users
    /SP/users
    -> create escuser
    Creating user...
    Enter new password: ********
    Enter new password again: ********
    Created /SP/users/escuser
    -> set escuser role=aucros
    Set 'role' to 'aucros'
    -> show escuser
    /SP/users/escuser
    Targets:
    ssh
    Properties:
    role = aucros
    password = *****
  5. Set the check_physical_presence to false and then exit from the ILOM so that you can login as the newly created user.
    -> set /SP check_physical_presence=false
    Set 'check_physical_presence' to 'false'
    -> show /SP check_physical_presence
    /SP
    Properties:
    check_physical_presence = false

    -> exit
  6. Login using the escuser login and enter escalation mode using the password that was provided by the TSC.
    exa1dbadm01-ilom login: escuser
    Password:

    Oracle(R) Integrated Lights Out Manager

    Version 4.0.0.22 r120818

    Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

    Warning: HTTPS certificate is set to factory default.

    Hostname: exa1dbadm01-ilom

    -> cd /SP/users/ecsuser/escalation
    -> set SESSION mode=escalation                            
    Password:**** **** **** **** **** *** *** **** **** **** **** **** **** **** **** **** *** *** **** *** **** **** **** *** **** **** *** **** *** *
    Short form password is:  NOSE HAAG MED

    [(escalation_mode) exa1dbadm01-ilom:~]#
  7. Use the showpsnc command to confirm the current container values. Confirm that one of the backup containers has a serial number (the value on the PSN line) that matches the system serial number. The system serial number can be checked by comparing to the serial number RFID tag on the front left hand side of the server. After confirming that there is a valid fruid backup use the copypsnc command to write the good data from the primary to the backup2 container on PS0. The following example shows copying from Primary to the Backup2 but you could also copy from backup1 if needed.
    [(escalation_mode) exa1dbadm01-ilom:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 7338405             7338405             7338405
    PSN                 1735XC3004          1735XC3004          0000000000
    Profile             0x00010000          0x00010000          0x00010000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE5800000000232F8 341A583DE5800000000232F8 341A583DE5800000000232F8
    [(restricted_shell) exa1dbadm01-ilom:~]# copypsnc Primary Backup2

    [(escalation_mode) exa1dbadm01-ilom:~]# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 7338405             7338405             7338405
    PSN                 1735XC3004          1735XC3004          1735XC3004
    Profile             0x00010000          0x00010000          0x00010000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE5800000000232F8 341A583DE5800000000232F8 341A583DE5800000000232F8
    [(restricted_shell) exa1dbadm01-ilom:~]# exit


  8. At this point if all of the fruid containers match and have the correct serial number data this step is done. If more than one of the fruid containers had non-valid entries then the copypsnc command should be used to copy over the valid data to the other container that is not valid. (ie. "copypsnc Primary Backup1" to copy Primary to Backup1) After confirming all fruid data is correct reset the ILOM to confirm that the fruid data persists through a reboot and remove the escalation user if needed.
    -> reset /SP                     
    Are you sure you want to reset /SP (y/n)? y
    Performing reset on /SP
    ..........

    ***login as the root user again and check the fruid data***

    -> set SESSION mode=restricted

    WARNING: The "Restricted Shell" account is provided solely
    to allow Services to perform diagnostic tasks.

    [(restricted_shell) exa1dbadm01-ilom:# showpsnc
    Primary: fruid:///SYS/DBP
    Backup 1: fruid:///SYS/MB
    Backup 2: fruid:///SYS/PS0

    Element           | Primary           | Backup1           | Backup2
    ------------------+-------------------+-------------------+-------------------
    PPN                 7338405             7338405             7338405
    PSN                 1735XC3004          1735XC3004          1735XC3004
    Profile             0x00010000          0x00010000          0x00010000
    Product Name        ORACLE SERVER X7-2  ORACLE SERVER X7-2  ORACLE SERVER X7-2
    RFID SN             341A583DE5800000000232F8 341A583DE5800000000232F8 341A583DE5800000000232F8
    [(restricted_shell) exa1dbadm01-ilom:~]# exit


    -> cd /SP/users
    /SP/users
    -> delete escuser
    Are you sure you want to delete /SP/users/escuser (y/n)? y
    Deleted /SP/users/escuser
  9. If trouble is encountered during any of the steps of accessing escalation mode and fixing the fruid containers please contact the TSC for assistance.

 

OBTAIN CUSTOMER ACCEPTANCE:

WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE?:

FIELD SERVICE ENGINEER and CUSTOMER ACTIVITY:

1. Verify all expected hardware is visible to the server and the fault is cleared. Assistance from the customer for server login access will be required.

2. Verify there are no outstanding faults in ILOM:

# ipmitool sunoem cli 'show faulty'
Connected. Use ^D to exit.
-> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------

-> Session closed
Disconnected
#

3. Verify there are no outstanding alerts in the Database Node:

# dbmcli -e list alerthistory

 

PARTS NOTE:

7333459 [F] A266 800/1200 Watt AC Input Power Supply - India ISI Mark
7350780 [F] A266 800/1200 Watt AC Input Power Supply

 

REFERENCE INFORMATION:

Oracle Exadata Database Machine Maintenance Guide: https://docs.oracle.com/cd/E80920_01/DBMMN/maintaining-exadata-database-servers.htm#DBMMN22020

Oracle Server X7-2 Documentation https://docs.oracle.com/cd/E72435_01/index.html

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback