Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2093580.1
Update Date:2018-01-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  2093580.1 :   FS System: How to Clear the Controller Failure History  


Related Items
  • Oracle FS1-2 Flash Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Flash Storage>SN-EStor: FSx
  •  




In this Document
Goal
Solution
References


Applies to:

Oracle FS1-2 Flash Storage System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

The Oracle FS System excludes a Controller from the cluster when the number of Controller failures reaches a predefined value. When running diagnostics, the counter can record an excessive number of Controller failures that do not correspond to actual failures and the Controller fails to boot.

In such scenarios, the first step for the recovery is to clear the failure history.

The failure thresholds are listed below.

  1. Three warmstarts in an hour.Next warmstart converts to PODR(Power On Data Recovery).
  2. Three PODR in a 24 hour period or 4 PODR in 7 days will disable a controller.
  3. One warmstart that takes longer than 30 seconds to complete will result in a PODR.
  4. Any failure during coldstart will result in the controller node being excluded from the cluster.

Solution

Prerequisites:

  • You need "fscli" to execute the commands below. Refer to <Document 1991938.1> FS System: How to Obtain and Install the fscli Tool Software.
  • You need to login with user "pillar" or any other user with the support role to execute this command.

For the default passwords associated with FS1-2, refer to <Document 2046703.1> FS System: Passwords Associated with the Oracle FS1-2 Flash Storage System.

During diagnostics either a single Controller or both Controllers can get excluded from the cluster.

  1. Log into the Oracle FS System with "fscli":

    # fscli login -u <username> -oracleFs <fs-system IP>
      

  2. Re-enable the Controller:

    # fscli controller -reenable -controller <Controller FQN or unique identifier(ID)>
      

    Example when only controller-01 is failed:
    # fscli controller -reenable -controller /CONTROLLER-01
      

    Example when both controllers are in failed/disabled state:
    # fscli controller -reenable
      

The failure history and list of excluded Controllers is kept in /var/lib/pillar/pcp/node-info.xml on the Active Pilot.

After issuing "fscli controller -reenable", none of the nodes in this file should have a status of "true" for either excluded or disabled and the failure_count should be 0.

Example:

[root@pilot2 ~]# more /var/lib/pillar/pcp/node-info.xml

<?xml version="1.0" encoding="UTF-8" ?>
<node_list>
<cold_start_failure_count>0</cold_start_failure_count>
<node>
<wuName>WNxxxx0</wuName>
<excluded>false</excluded>            
<disabled>false</disabled>             
<failure_count>0</failure_count>
</node>
<node>
<wuName>WNxxxx1</wuName>
<excluded>false</excluded>
<disabled>false</disabled>
<failure_count>0</failure_count>
</node>
</node_list>
[root@pilot2 ~]#

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback