Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2216385.1
Update Date:2018-01-03
Keywords:

Solution Type  Problem Resolution Sure

Solution  2216385.1 :   FS System: System Status READ ONLY or SYSTEM_FAILURE_CONSERVATIVE Due to Pilot Resource Failure  


Related Items
  • Oracle FS1-2 Flash Storage System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>Flash Storage>SN-EStor: FSx
  •  




In this Document
Symptoms
Cause
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Requires log investigation to root cause prior to attempting recovery.
Created from <SR 3-13878975111>

Applies to:

Oracle FS1-2 Flash Storage System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Overall System Status: Read Only / SYSTEM_FAILURE_CONSERVATIVE

Firmware is 6.2.9 or lower

Log bundle indicates the following observations:

server.log.*:  INFO DATA&TIME com.pillardata.server.systemstate.SystemStateMonitor.update(SystemStateMonitor.java:253) PMICommandProcessor - Transitioning system state from NORMAL to SYSTEM_FAILURE_CONSERVATIVE because of PCP_NOT_ACTIVE

server-jni.log.*:  DATE&TIME pilot1 java: 210122682 16505 MemAlloc::getConnection() 0x7f5b7807c010 0x7f5b7807c050 empty connection pool

 

The event log should NOT contain any indication that PERSISTENCE access was lost but WILL contain multiple OPERATION_FAILED events for UNSATISFIED_REQUEST_PMI_COMMUNICATION_ERROR nature such as:

<EventType>OPERATION_FAILED</EventType>
<Severity>WARNING</Severity>
<Category>AUDIT</Category>
<Time>2016-12-19T15:11:38.225</Time>
<ComponentIdentity>
<Guid>414B303032363932A13F17232B4FA59C</Guid>
</ComponentIdentity>
<ComponentName>/InitiatorDiscoveryOperation/2021429/{{USER_2}}</ComponentName>
<EventParameterList>
<ParameterName>EventParameters.TaskFailed.csiError_1</ParameterName>
<ParameterValue>UNSATISFIED_REQUEST_PMI_COMMUNICATION_ERROR</ParameterValue>

 

Further review of the server.log.* files for the time of the condition, will identify the error "UNSPECIFIED_BLT_ERROR ErrorNumber: -8" such as:

INFO 2016-12-19 15:11:38,174 com.pillardata.pmi.net.InfoLogger.logError(InfoLogger.java:105) PMICommandProcessor - MessageHeader[
revision=0
messageID=0x2961962955
transactionID=0
sourceNodeId=2008000101000000
sourceComponent="PDS_COMP_PACMAN(0x1f)"
destNodeId=2008000101000001
destComponent="PDS_COMP_CM(0x1c)"
flags=392
type="PDS_MSG_TYPE_SUPER_CMD(0x5)"
command="CM_MSG_GET_SAN_INITIATOR(0x1c0d43)"
result="PMI_EOK(0x0)"
operationCount=1
operationSize=8
reserve2=0
time=Timeval[
seconds=1482160298
microseconds=170000
]
bodySize=8
] ErrorCode: UNSPECIFIED_BLT_ERROR ErrorNumber: -8

 

If ALL these aspects match and persistence access was not lost, then it is likely that the cause of SYSTEM_FAILURE_CONSERVATIVE was the  pilot running out of resources (Bug 24314003)

The most common cause of READ ONLY / SYSTEM_FAILURE_CONSERVATIVE is loss of access to persistence. It is important to verify from log review the underlying cause. If access to persistence was lost, failing over the pilots per this KM doc will not resolve the condition.

Cause

A defect in the pilot software causes the failure of PCP and the active pilot transitions to SYSTEM_FAILURE_CONSERVATIVE

Solution

Ensure the standby pilot is in a normal status. A suggestion is to ssh to the pilot and confirm the status from a review of the /var/log/pcp.log

Fail the active pilot over to the standby.

# fscli login -u pillar FS1-IP_ADDRESS

# fscli pilot -forceFailover

 

Due to the nature of the pilot resource exhaustion, the fscli pilot -forceFailover command may not result in a pilot failure. An alternative is to ssh to the pilot and issue "service pilotcfg restart" to perform the failover.

 

Defect 24314003 is resolved in 6.2.10 and higher.

References

<BUG:24314003> - COAXM099 BCD ALLOCATION ISSUES PREVENTING JNI REPLY TO STATUS, LEADING TO SXL_ST

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback