Sun Storage 7000 Unified Storage System: svc.configd crashes, /var/run is full

Asset ID:	1-72-1481997.1
Update Date:	2017-10-05
Keywords:

Solution Type Problem Resolution Sure

Solution 1481997.1 : Sun Storage 7000 Unified Storage System: svc.configd crashes, /var/run is full

Applies to:

Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7210 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7110 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

1. You will notice that memory on their appliance is disappearing and hence impacting the performance.

Memory usage for 'Other' increases steadily over the coming days and eventually consumes more memory.

> status memory show

Memory:
   Cache       2.19G bytes
   Unused      32.5G bytes
   Mgmt         104M bytes
   Other       27.7G bytes <<<<
   Kernel      1.49G bytes

2. Also you may not able to login to appliance and drops to restricted Solaris shell.

Last login: Tue Mar 22 18:28:52 2011 from 10.159.55.160
svcprop: Could not connect to configuration repository: repository server unavailable. <===

3. In worst situation bundle will fail to generate because space is not available.

4.
TSE will notice the following :

Memstat will show page cache is consuming much space.

# echo "::memstat" | mdb -k
Page Summary                Pages                MB %Tot
------------     ---------------- ---------------- ----
Kernel                    1053565              4115    6%
ZFS File Data             7087575              2158    4%
Anon                        93794               366    1%
Exec and libs                2845                11    0%
Page cache                  14991             27685   42% <<<<
Free (cachelist)           235642               920    1%
Free (freelist)           8286558             30269   49%

Total                    16774970             65527
Physical                 16774968             65527

5. Services

# svcs -xv
svcs: Could not bind to repository server: repository server unavailable. Exiting.

6. debug.sys may show swap space limit exceeded

Mar 13 09:56:18 sus7110-010 tmpfs: [ID 518458 kern.warning] WARNING: /etc/svc/volatile: File system full, swap space limit exceeded

7. /var/run is consuming most of the swap space

7410node0# df -h | grep swap
swap 902M 480K 902M 1% /etc/svc/volatile
swap 906M 4.0M 902M 1% /tmp
swap 898M 0K 898M 0% /var/ak/rm
swap 47G 46G 898M 99% /var/run <<<<<<
swap 909M 12M 898M 2% /export

8. svc.configd will be much bigger may be upto 190 M

Cause

Swap space gets exhausted by session entries.

Further looking you will observe that this space is being consumed by /var/run/ak/deaddrop
7410node0# cd /var/run/ak/deaddrop

7410node0# ls -ltr
total 421048
drwx--x--x 2 root root 45211431 Jul 17 21:09 aBMxVhvMugtVmHHJIcWLUTVhYNlQkNK
drwx--x--x 2 root root 17973246 Jul 18 11:10 ZSrDKDuYiIIBvCVwcymNmZaHvTSpVMd
drwx--x--x 2 root root 11914890 Jul 19 06:03 ewmZkJwPiiIAZwXFYSzrSeqkNRqJgVv
drwx--x--x 2 root root 49486533 Jul 21 06:32 rrxARCVExPwtgvwaqRufzvZacCPhudN
drwx--x--x 3 root root 24457243 Jul 23 04:06 HXlnivEKTxEeQNBWFbmzsJfHPAGySwv
drwx--x--x 2 root root 29649308 Jul 27 13:03 ZaGGRYliiEUBeaemgiwjugHrnyNsaUL
drwx--x--x 2 root root 30787936 Jul 31 17:31 uGBvzsvpFVHeHHQmHAfoKoRdNZmXUqJ
drwx--x--x 2 root root 12954 Aug 1 05:09 0
drwx--x--x 2 root root 6068523 Aug 1 13:13 cVSrllXRrhwiIqrYGaLkmEAGKYkSgvy

"/var/run/ak/deaddrop" contains session entries which are not cleaned up.

Solution

Upgrade to 2011.1.x or the latest release to mitigate the issue.
Note that the issue has been observed on 2011.1.9.3. For 2011.1.9.3 release, the easy workaround is to restart the resource management daemon, as follows :

cli> maintenance system restart
This will restart the management system. Are you sure? (Y/N) Y

Workaround :

Stop akd
Clean up (remove) /var/run/ak/deadrop contents
Start akd


In worst condition when svc commands not working and it is not possible to stop akd.
We may need to kill akd and cleanup deaddrop contents then start the akd again.

References

<BUG:15704579> - SUNBT7030143 AK_HARDWARE_CONFIG FILLS DEADDROP WITH PRESERVED FILES

Attachments

This solution has no attachment