SP Filesystem Full, SPT-8000-ED reported

Asset ID:	1-72-1549537.1
Update Date:	2017-10-05
Keywords:

Solution Type Problem Resolution Sure

Solution 1549537.1 : SP Filesystem Full, SPT-8000-ED reported

Applies to:

SPARC T3-2 - Version All Versions to All Versions [Release All Releases]
SPARC T3-4 - Version All Versions to All Versions [Release All Releases]
SPARC T3-1 - Version All Versions to All Versions [Release All Releases]
SPARC T5-4
Information in this document applies to any platform.

Symptoms

SP reports SPT-8000-ED FMA event (File system full)

SP related event reported to internal SP FMA

Cause

/var/log, /var or any other filesystem becomes full after logging several events in short period of time, usually it will be one of the temporary filesystem in the internal SP system which is cleared after an SP reset and/or logs are rotated after certain periods of time. However if several events are logged in short period of time, filesystem will eventually fill up not allowing any more events to be logged.

From a snapshot we can see:

# more spos_info/@bin@df_-k.out
Filesystem           1k-blocks      Used Available Use% Mounted on
rootfs                   11392     11392         0 100% /
/dev/root                11392     11392         0 100% /
/dev/mtdblock2           11392     11392         0 100% /
tmpfs                     1024       412       612 40% /var
tmpfs                     1024      1024         0 100% /var/log <==================================== in this case /var/log filesystem usage is 100%
tmpfs                    46104       672     45432   1% /dev/shm
modules                   1024        36       988   4% /lib/modules/2.6.27.43
/dev/mtdblock2           11392     11392         0 100% /lib/modules/2.6.27.43/misc
/dev/mtdblock5           16384      2196     14188 13% /persist
/dev/mtdblock4            4096       524      3572 13% /conf
/dev/mtdblock6           12288       996     11292   8% /coredump
/dev/loop0                 192       192         0 100% /vbscdir

Then you can see on spos_logs folder from snapshot the size of the logs files under /var/log

$ ls -lhgo spos_logs/
total 1020K
-rw-rw-r-- 1   74 May 6 14:44 @var@log@bbrd.log
-rw-rw-r-- 1 9.1K May 6 14:44 @var@log@boot.messages
-rw-rw-r-- 1 4.6K May 8 2013 @var@log@capidirectd.log
-rw-rw-r-- 1 2.1K May 6 14:44 @var@log@cdfe.log
-rw-rw-r-- 1   82 Jan 1 1980 @var@log@daemon_critical.log
-rw-rw-r-- 1 113 May 6 14:45 @var@log@dasboot.log
-rw-rw-r-- 1 900K May 8 2013 @var@log@dmesg <================================= 900K dmesg file whereas filesystem size is only 1MB so this is probably the root cause
-rw-rw-r-- 1   82 May 6 14:44 @var@log@ealertd.log
-rw-rw-r-- 1 341 May 6 14:44 @var@log@etcd.log
-rw-rw-r-- 1 2.4K May 8 2013 @var@log@fdd.log
-rw-rw-r-- 1    0 May 6 14:44 @var@log@fishwrapd.log
-rw-rw-r-- 1    0 Jan 1 1980 @var@log@hc.log
-rw-rw-r-- 1 524 May 6 14:44 @var@log@htsignon
-rw-rw-r-- 1   71 May 6 14:44 @var@log@ifc.log
-rw-rw-r-- 1 5.1K May 8 2013 @var@log@ipmi.log
-rw-rw-r-- 1    0 May 6 14:44 @var@log@libfishwrap.log
-rw-rw-r-- 1 364 May 8 2013 @var@log@logmgr.debug
-rw-rw-r-- 1 2.7K May 8 2013 @var@log@lumain.log
-rw-rw-r-- 1 4.0K May 8 2013 @var@log@messages
-rw-rw-r-- 1    0 May 6 14:44 @var@log@ncsi.log
-rw-rw-r-- 1 313 May 6 14:44 @var@log@nwcfg
-rw-rw-r-- 1 370 May 8 2013 @var@log@plhwsvc.log
-rw-rw-r-- 1   43 May 8 2013 @var@log@samba.err
-rw-rw-r-- 1   52 May 6 14:44 @var@log@scc.log
-rw-rw-r-- 1 216 May 6 14:44 @var@log@set_sensor.log
-rw-rw-r-- 1 711 May 6 14:45 @var@log@sfcb.log
-rw-rw-r-- 1 4.0K May 8 2013 @var@log@snmpd.err.log
-rw-rw-r-- 1 3.9K May 8 2013 @var@log@snmpd.log
-rw-rw-r-- 1 474 May 6 14:44 @var@log@usrmgt.log
-rw-rw-r-- 1 226 May 6 14:45 @var@log@webgo.log
-rw-rw-r-- 1    0 May 6 14:45 @var@log@wsmand.log
-rw-rw-r-- 1 105 May 6 14:44 @var@log@wsman.log

In the above example, you should verify the contents of spos_logs/@var@log@dmesg file from snapshot to know why it grew so much and accordingly diagnose further if required.

Solution

This solution applies to clean the temporary filesystem, which will usually be the ones that fill up as those are the ones that holds the daily logs,
This will not stop the error from happening again, in which case you'll need to research what is causing so many log entries.

1) Login to SP/ILOM

2) Clear FMA:

-> start -script /SP/faultmgmt/shell
faultmgmtsp> fmadm repair fc4d9e74-78b2-4c1d-f96c-dd2499cd <================== this is an example uuid, real uuid from show faulty must be used.
faultmgmtsp> fmadm rotate errlog
faultmgmtsp> exit

3) Reset SP

-> reset -script /SP

References

<NOTE:1159601.1> - SPT-8000-ED - File system full

Attachments

This solution has no attachment