Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2384922.1
Update Date:2018-04-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  2384922.1 :   Eagle5 STP - E5ENETB IPSG Card rebooted with Obit: Module ath_vxw.c Line 3314 Class 0001  


Related Items
  • Oracle Communications EAGLE (Hardware)
  •  
Related Categories
  • PLA-Support>Sun Systems>CommsGBU>Global Signaling Solutions>SN-SND: Tekelec Eagle 5
  •  




In this Document
Symptoms
Changes
Cause
Solution


Created from <SR 3-17099951941>

Applies to:

Oracle Communications EAGLE (Hardware) - Version EAGLE 46.5 and later
Information in this document applies to any platform.

Symptoms

E5-ENETB IPSG Card rebooted. After the reload the card was fully functional.

  ****18-03-25  02:28:37****

    0223.0096    CARD 1203 IPSG          Card has been reloaded

    ****18-03-25  02:27:31****

    0218.0014    CARD 1203 IPSG          Card is present

                 ASSY SN:  10212325147   

    ****18-03-25  02:22:30****

    0131.0013 ** CARD 1203 IPSG          Card is isolated from the system

                 ASSY SN:  10212325147

Upon reload the card generated Obit ath_vxw.c  Line  3314  Class 0001 on the active MASP: 

  STH: Received a BOOT APPL-Obituary reply for restart

        Card 1203   Module ath_vxw.c  Line  3314  Class 0001

        Register Dump :

            EFL=00000000    CS =0000        EIP=00000000    SS =0000

            EAX=00000000    ECX=00000000    EDX=00000000    EBX=00000000

            ESP=00000000    EBP=00000000    ESI=00000000    EDI=00000000   

            DS =0000        ES =0000        FS =0000        GS =0000   

   

        Stack Dump :

        [SP+1E]=0000    [SP+16]=0000    [SP+0E]=0000    [SP+06]=0000

        [SP+1C]=0000    [SP+14]=0000    [SP+0C]=0000    [SP+04]=0000

        [SP+1A]=0000    [SP+12]=0000    [SP+0A]=0000    [SP+02]=0000

        [SP+18]=0000    [SP+10]=0000    [SP+08]=0000    [SP+00]=0000

   

        User Data Dump :

         30 78 66 66 66 66 66 66 66 66 20 41 50 50 4c 20      0xffffffff.APPL.

         57 61 74 63 68 64 6f 67 20 74 69 6d 65 6f 75 74      Watchdog.timeout

         20 72 65 73 65 74                                    .reset

        

    Report Date:18-03-25  Time:02:27:31



Changes

 

Cause

Start by analyzing the logs and search for other possible symptoms in the node.

A single malfunction can have multiple causes: internal causes (for example bouncing DPCs) or external causes (for example an issue on a port of the switch which made the card to reboot as a recovering mechanism, or due to a router which is causing heavy retransmissions).

OBIT ath_vxw.c class 0001 is related to Application Trouble Handler. This indicates a HW fault if it keeps repeating. In the user data dump section we see 0xffffffff.APPL.Watchdog.timeout.reset.

On the E5 cards we use 3 types of watchdog mechanisms (hardware watchdogs, low priority starvation, and sanity).

In our case the system points to the hardware.

This is a hardware watchdog and because hardware reset the system without any software involvement, there is no post fail data available. These are typically difficult obits to debug due to the lack of post mortem. Proceed with gathering more data.

 

In order to check if any messages are being discarded by the card:

rept-stat-mfc:mode=stats:service=vsccp:sample=tot24h

rept-stat-mfc:mode=stats:service=mtp3:sample=tot24h

 

rtrv-trbl:loc=<active MASP 1113 or 1115>

rtrv-obit:loc=<active MASP 1113 or 1115>

rtrv-log:mode=full:dir=bkwd:num=500:outgrp=sys:slog=act

rtrv-log:mode=full:dir=bkwd:num=500:outgrp=card:slog=act

rept-stat-rtd

rept-stat-imt:mode=full

rept-imt-lvl1:sloc=1201:eloc=1115:r=summary

rept-stat-mux

rept-stat-db:display=all

rept-stat-ddb:display=all

 

rept-stat-card:loc=<card location>:mode=full

rtrv-card:loc=<card location> 

Solution

While gathering all the logs, continue to monitor the card. If all the other logs are clear continue monitoring for an extended period of time, agreed with the customer.

If a second reboot takes place consider a hard reset by re-seating.

If a third reboot takes place change the board ASAP with a spare. Monitor the behavior after the replacement to confirm the normal functionality.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback