![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||
Solution Type Technical Instruction Sure Solution 2288315.1 : DSR Multihoming SCTP Association Terminates Unexpectedly
In this Document
Created from <SR 3-15265217391> Applies to:Oracle Communications Diameter Signaling Router (DSR) - Version DSR 7.0.1 and laterTekelec GoalA multihoming SCTP association is established between DSR and HSS. The HSS sends a cross-path HEARTBEAT packet. This packet is mistakenly balanced by IPFE to the MP2. MP2 responds with SCTP ABORT. HSS considers SCTP association terminated and responds with an ABORT on subsequent DSR messages. SolutionThere were different configuration changes on the DSR. Also following alarms are seen: 0704:095902.385 STK-V sync error, did not receive ping for 1000 milliseconds [8824/IpfeStateSync.C:1292]
0704:095904.005 STK-V timeout trying to connect to 10.a.b.24 for sync [8824/IpfeStateSync.C:1433] mm/dd/yyyy 12:59:02 (IPFE-1: IPFE-A: data read error) ipfe#5003{IPFE state sync run error} mm/dd/yyyy 12:59:04 (IPFE-1: IPFE-A: connect error) ipfe#5003{IPFE state sync run error} mm/dd/yyyy 12:59:04 (IPFE-1: eth2) ipfe#5012{Signaling interface heartbeat timeout} mm/dd/yyyy 12:59:04 (IPFE-1: eth3) ipfe#5012{Signaling interface heartbeat timeout} mm/dd/yyyy 12:59:02 (IPFE-1: IPFE-A: data read error) ipfe#5003{IPFE state sync run error} *C GN_DOWN/WRN ^^ [8824:IpfeStateSync.C:1293] mm/dd/yyyy 12:59:03 (IPFE-1: 10.255.94.13) ipfe#5001{IPFE Backend Unavailable} -* GN_DOWN/WRN ^^ [8824:IpfeBackendMonitor.C:194] mm/dd/yyyy 12:59:03 (IPFE-1: 10.255.94.77) ipfe#5001{IPFE Backend Unavailable} -* GN_DOWN/WRN ^^ [8824:IpfeBackendMonitor.C:194] Possible Reason of wrong dispatch of HB message:Heartbeat is sent from from IP Address 1 (aa.aa.aaa.aa) to TSA IP Address 2 (bb.bbb.bb.bb). The TSA IP Address 2 (bb.bbb.bb.bb) does not have the correct corresponding association info and hence it dispatches the heartbeat package to the wrong destination MP. 1. Peer sent SCTP INIT (src port 3868) via the secondary path to DSR port 49135 (DSR acts as responder), a new responder association to MPx was created. ExplanationWhen IPFE creates new sctp association with peer, the new assocation will be added into the association list which is stored under the xt_recent directory. IPFE will sync the association list between the mate IPFEs. If the heartbeat is sent to the IPFE which does not have the corresponding association info, it may process the heartbeat (from IP Address 1 (aa.aa.aaa.aa) to TSA IP Address 2 (bb.bbb.bb.bb)) as a new incoming message and dispatch this heartbeat package to the wrong destination MP. This points us to believe that the IPFE1 and IPFE2 may not be well synced when the error occurred. This is strengthened by the above mentioned IPFE sync alarms during same time. This issue happened when the customer VMs were shutdown unexpectedly. If similar situation happens, start SCTP PCAP capture and IPFE association dump with the script at least 600 seconds (delete age time) before recovery procedures (like enabling the connections). Note that once this unexpected association record already exists, it is meaningless to analyze the PCAP during DSR initiating the SCTP connection and sending out the SCTP Abort. If the issue reappears, following steps should be takenNormally the peer side should reconnect to DSR listening port (acts as Responder) instead of initiating a connection.
To Recover the ConditionDisable the problematic (fluctuating) connection, wait for more than 600 seconds (default delete age time) to make sure all stale records have been deleted, then enable the connection. References<NOTE:2106578.1> - Connection Between Diameter Routing Agent (DRA) and Client is Going Down Frequently, DRA is Sending Transmission Control protocol (TCP) Reset Message to Bring Down the ConnectionAttachments This solution has no attachment |
||||||||||||||||||||||||
|