Asset ID: |
1-71-2291969.1 |
Update Date: | 2018-03-29 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
2291969.1
:
Standby Node Becomes Out of Service After Memory Peak Alarm In Active Node
Related Categories |
- PLA-Support>Sun Systems>CommsGBU>Session Delivery Network>SN-SND: Acme Service Provider
|
In this Document
Created from <SR 3-15276863631>
Applies to:
Acme Packet 6300 - Version S-Cz7.3.0 and later
Information in this document applies to any platform.
Goal
How Standby Node becomes Out of Service After Memory Peak Alarm In Active Node of SBC
Solution
This issue is related to page 913 of Oracle Communications Session Boarder Controller Configuration Guide
HA Media Interface Keepalive
In a lot of cases, since a lot of information is replicated between active and standby, if there is allocated memory to something and its not freeing, the standby will also show that as well
in those cases, the only way out is to reboot both at the same time to free the memory that is not being released by some process.
Log of the system shows:
Jul 5 15:10:48.282 [MAJOR] (0) Peer <ActiveNodename> timed out in state Active, my state is Standby
Jul 5 15:10:48.282 [WARNING] (0) BERPProcess::setPeerAddress() - old = XXX.YY.Z.W, new = XXX.YY.Z.W
Jul 5 15:10:48.282 [WARNING] (0) BERPProcess::decisionStandby() - active peer <ActiveNodename> has unacceptable health (100) or has timed out
Jul 5 15:10:48.282 [WARNING] (0) BerpProcess::decisionStandby() - taking 500 ms to check for peer over media i/f
Jul 5 15:10:48.287 [WARNING] (0) BERPProcess::decisionStandby() - received arp reply from active peer, going out of service
Jul 5 15:10:48.287 [CRITICAL] (0) Switchover, Standby to OutOfService, active peer <ActiveNodeName> has timed out, but active replied to arp within 500ms
Under configuration file following is observed:
redundancy-config
state enabled
log-level INFO
health-threshold 75
emergency-threshold 50
port XXXX
:
<snip>
:
gateway-heartbeat-timeout 1
gateway-heartbeat-health 0
media-if-peercheck-time 500 ------------> It is enabled
So in this case, the standby stops receiving responses to check point messages it is sending to the active SBC, so it declares it has timed out but you have media if peer check enabled on this HA pair
So with this feature enabled, once the standby stops receiving responses through the HA ports from the active, it sends a arp request to the media interfaces of the active SBC and since it received a response to that arp request, it took itself OOS, which is how that feature is designed, perfectly normal behavior.
Action to be taken for this:
1. Reboot the active to clear the memory
2. Reboot both to clear the memory and restore HA
Either way, both active and Standby with OOS needs to be rebooted.
Both the above steps can be done in the same maintenance window.
References
<NOTE:1591900.1> - High CPU Spiking Checklist
Attachments
This solution has no attachment