Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2291969.1
Update Date:2018-03-29
Keywords:

Solution Type  Technical Instruction Sure

Solution  2291969.1 :   Standby Node Becomes Out of Service After Memory Peak Alarm In Active Node  


Related Items
  • Acme Packet 6300
  •  
Related Categories
  • PLA-Support>Sun Systems>CommsGBU>Session Delivery Network>SN-SND: Acme Service Provider
  •  




In this Document
Goal
Solution
References


Created from <SR 3-15276863631>

Applies to:

Acme Packet 6300 - Version S-Cz7.3.0 and later
Information in this document applies to any platform.

Goal

How Standby Node becomes Out of Service After Memory Peak Alarm In Active Node of SBC
 

Solution

This issue is related to page 913 of Oracle Communications Session Boarder Controller Configuration Guide
HA Media Interface Keepalive

In a lot of cases, since a lot of information is replicated between active and standby, if there is allocated memory to something and its not freeing, the standby will also show that as well
in those cases, the only way out is to reboot both at the same time to free the memory that is not being released by some process.

Log of the system shows:
Jul 5 15:10:48.282 [MAJOR] (0) Peer <ActiveNodename> timed out in state Active, my state is Standby
Jul 5 15:10:48.282 [WARNING] (0) BERPProcess::setPeerAddress() - old = XXX.YY.Z.W, new = XXX.YY.Z.W
Jul 5 15:10:48.282 [WARNING] (0) BERPProcess::decisionStandby() - active peer <ActiveNodename> has unacceptable health (100) or has timed out
Jul 5 15:10:48.282 [WARNING] (0) BerpProcess::decisionStandby() - taking 500 ms to check for peer over media i/f
Jul 5 15:10:48.287 [WARNING] (0) BERPProcess::decisionStandby() - received arp reply from active peer, going out of service
Jul 5 15:10:48.287 [CRITICAL] (0) Switchover, Standby to OutOfService, active peer <ActiveNodeName> has timed out, but active replied to arp within 500ms

Under configuration file following is observed:


redundancy-config
  state enabled
  log-level INFO
  health-threshold 75
  emergency-threshold 50
  port XXXX
  :

<snip>

  :
  gateway-heartbeat-timeout 1
  gateway-heartbeat-health 0
  media-if-peercheck-time 500 ------------> It is enabled

So in this case, the standby stops receiving responses to check point messages it is sending to the active SBC, so it declares it has timed out but you have media if peer check enabled on this HA pair
So with this feature enabled, once the standby stops receiving responses through the HA ports from the active, it sends a arp request to the media interfaces of the active SBC and since it received a response to that arp request, it took itself OOS, which is how that feature is designed, perfectly normal behavior.

Action to be taken for this:
1. Reboot the active to clear the memory
2. Reboot both to clear the memory and restore HA
Either way, both active and Standby with OOS needs to be rebooted.
Both the above steps can be done in the same maintenance window.

 

References

<NOTE:1591900.1> - High CPU Spiking Checklist

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback