Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1610099.1
Update Date:2016-03-03
Keywords:

Solution Type  Technical Instruction Sure

Solution  1610099.1 :   How to Determine if Power Supply Sensors Indicate the need for PSU Replacement From Oracle Big Data Appliance"bdadiag snapshot"  


Related Items
  • Big Data Appliance X3-2 Full Rack
  •  
  • Big Data Appliance X3-2 Hardware
  •  
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance X4-2 Hardware
  •  
  • Big Data Appliance X4-2 Full Rack
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Big Data Appliance Hardware
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
  • Big Data Appliance X3-2 Starter Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Goal
Solution
References


Created from <SR 3-8243819071>

Applies to:

Big Data Appliance X4-2 Full Rack - Version All Versions and later
Big Data Appliance X4-2 Hardware - Version All Versions and later
Big Data Appliance X3-2 Hardware - Version All Versions and later
Big Data Appliance X4-2 Starter Rack - Version All Versions and later
Big Data Appliance X3-2 Full Rack - Version All Versions and later
Linux x86-64

Goal

If the user level ILOM data gathered from the ipmitool utility, "ipmitool sdr" command to query the BMC for sensor data records (SDR) indicates that the power supply sensors and indicators might have a problem on one of the BDA servers the output from "bdadiag snapshot" can be used to determine if a Power Supply Unit (PSU) is in need of replacement.

For example if "ipmitool sdr" output on a server indicates that the sensors and indicators on a power supply e.g. power supply 0 (PS0) are faulty as in the output below:

# ipmitool sdr |grep PS0
  ...
   47 |PS0/I_IN              | disabled          | ns  
   48 |PS0/V_OUT           | 1.12 Volts       | nr  
   49 |PS0/I_OUT           | disabled          | ns   
   4a |PS0/IN_POWER     | 10 Watts         | ok
   4b |PS0/OUT_POWER  | disabled          | ns  


While the other power supply, in this case power supply 1 (PS1) shows sensors and indicators looking ok:

Whereas the power supply 1 (PS1) output shows: 

# ipmitool sdr |grep PS1 
...
   53 |PS1/I_IN             | 1.25 Amps         | ok
   54 |PS1/V_OUT         | 12.08 Volts        | ok
   55 |PS1/I_OUT          | 19 Amps           | ok
   56 |PS1/IN_POWER    | 270 Watts         | ok
   57 |PS1/OUT_POWER | 230 Watts         | ok


Collect the ILOM snapshot output running: "bdadiag snapshot" as root on the server where ipmitool identifies a potential PSU problem to confirm if the PSU requires replacement. See Oracle Big Data Appliance Diagnostic Information Collection with bdadiag V2.* (Doc ID 1516469.1)

Solution


Unpack the diagnostic output and check the files below for additional information on the PSU sensors and indicators.

1.Check the file: ipmi/@usr@local@bin@ipmiint_sensor_list.out to see if the PSU indicating a problem is fully powered and functional.  

The output below indicates that PSU 0 is not fully powered or functional.

Where:
  0x1 indicates false.
  0x2 indicates true.
  Power OK (POK) is a status when the PSU is fully powered and functional.


Output from:

$ grep PS0 ipmi/@usr@local@bin@ipmiint_sensor_list.out 
  
PS0/PWROK         | 0x1        | discrete    | 0x0001| na        | na        | na        | na        | na          | na <<<<< NO POK
PS0/CUR_FAULT    | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na          | na
PS0/VOLT_FAULT   | 0x2        | discrete   | 0x0002| na        | na        | na        | na        | na          | na <<<<< Fault flag
PS0/FAN_FAULT     | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na          | na
PS0/TEMP_FAULT   | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na          | na
PS0/V_IN              | 210.000  | Volts       | ok       | 70.000 | 80.000  | na        | na        | 270.000  | 280.000
PS0/I_IN              | na          | Amps       | na      | na        | na        | na        | na        | na          | na <<<<< NO Voltage
PS0/V_OUT           | 1.120     | Volts        | nr       | 8.000   | 8.960    | na        | na        | 14.960    | 16.000
PS0/I_OUT           | na          | Amps       | na      | na        | na        | na        | na        | na          | na <<<<< NO Voltage
PS0/IN_POWER     | 10.000    | Watts      | ok      | na        | na        | na        | na        | na          | na
PS0/OUT_POWER  | na          | Watts      | na      | na        | na        | na        | na        | na          | na <<<<< NO Voltage


Compared to output from:

$ grep PS1 ipmi/@usr@local@bin@ipmiint_sensor_list.out
  
PS1/PWROK          | 0x2        | discrete   | 0x0002| na        | na        | na        | na        | na        | na  <<<<< POK
PS1/CUR_FAULT    | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na        | na
PS1/VOLT_FAULT   | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na        | na  <<<<< No Fault
PS1/FAN_FAULT     | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na        | na
PS1/TEMP_FAULT   | 0x1        | discrete   | 0x0001| na        | na        | na        | na        | na        | na
PS1/V_IN              | 210.000  | Volts       | ok       | 70.000  | 80.000  | na       | na        | 270.000 | 280.000
PS1/I_IN               | 1.875     | Amps      | ok       | na        | na        | na        | na        | na        | na <<<<< Voltage
PS1/V_OUT           | 12.080    | Volts       | ok       | 8.000   | 8.960   | na         | na        | 14.960  | 16.000  
PS1/I_OUT           | 27.200    | Amps       | ok      | na        | na        | na        | na        | na        | na <<<<< Voltage
PS1/IN_POWER     | 380.000   | Watts      | ok      | na        | na        | na        | na        | na        | na
PS1/OUT_POWER  | 330.000   | Watts      | ok      | na        | na        | na        | na        | na        | na <<<<< Voltage
 

 
Additional details on individual sensors can be found in the "Oracle Integrated Lights Out Manager (ILOM) 3.0 Supplement for Sun Fire X4170 M2 and X4270 M2 Servers" documentation in Table 2-8 Power Supply Sensors and Indicators.

2. Check the file: ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out for voltage faults.

The output below confirms a voltage fault on PS0.


Output from:

$ grep PS0 ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out
  
       Inventory has been updated starting at node '/SYS/PS0'
       Voltage : /SYS/PS0/V_OUT : Lower Non-recoverable going low  : reading 1.1
       Voltage : /SYS/PS0/V_OUT : Lower Critical going low  : reading 1.12 <= th
       Power Supply : /SYS/PS0/VOLT_FAULT : State Asserted
        Power Supply : /SYS/PS0/PWROK : State Deasserted

Compared to output from:

$ grep PS1 ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out
  
       Inventory has been updated starting at node '/SYS/PS1'



3. Check the file: ilom/@usr@local@bin@spshexec_show_-d_properties_-level_all_@.out for more details on asserted states if required.  For example check the details on PWROK and VOLT_FAULT.

The example below also confirms the PWROK sensor for PS0 is not ok, i.e. the "State Deasserted" while the VOLT_FAULT is set i.e. "State Asserted".

...
/SYS/PS0/PWROK
   Properties:
       type = Power Supply
       ipmi_name = PS0/PWROK
       class = Discrete Sensor
       value = State Deasserted
       alarm_status = major
...
/SYS/PS0/VOLT_FAULT
   Properties:
       type = Power Supply
       ipmi_name = PS0/VOLT_FAULT
       class = Discrete Sensor
       value = State Asserted
       alarm_status = major
...

Compared to the output for PS1:

...
/SYS/PS1/PWROK
   Properties:
       type = Power Supply
       ipmi_name = PS1/PWROK
       class = Discrete Sensor
       value = State Asserted
       alarm_status = cleared
...
/SYS/PS1/VOLT_FAULT
   Properties:
       type = Power Supply
       ipmi_name = PS1/VOLT_FAULT
       class = Discrete Sensor
       value = State Deasserted
       alarm_status = cleared
...


4.  Hence the output in the example above from "bdadiag snapshot" confirms that PS0 is not fully powered and functional. File an SR with Oracle Support to replace the faulty PSU.

 

Note: If the output shows "Power Supply AC lost" this indicates that the external power supply was lost.  It does not indicate a faulty power supply. "Power Supply AC lost" can be the result of a reboot.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback