Solaris 10: Swap Space, /tmp and SMF

By Alasdair Lumsden on 8 Dec 2008

fork: Not enough space

Solaris 10 by default places /tmp on swap. This is good for speed, but not so good on a general purpose box where some applications may fill up /tmp. If you fill /tmp, you essentially reduce the amount of available swap to 0. This can lead to trouble, run out of physical ram, and new processes may not start. You get lovely fork() errors on the shell, and interesting messages in dmesg:

# ps -ef
-bash: fork: Not enough space
# free
-bash: fork: Not enough space
# prstat
-bash: fork: Not enough space
...
# dmesg
...
Dec  7 02:56:27 w01.someserver.everycity.co.uk genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 8193 (munin-node)
Dec  7 02:56:51 w01. someserver.everycity.co.uk tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
Dec  7 02:56:57 w01. someserver.everycity.co.uk genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 8223 (exim)
Dec  7 02:57:26 w01. someserver.everycity.co.uk genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 563 (httpd)
...

The easiest way to fix this is to immediately disable any services that eat ram using svcadm disable, and clear out /tmp. You can then either move /tmp to a physical partition by editing /etc/vfstab, increase the amount of swap, or my favourite, limit the amount of swap /tmp can use by adding a mount option to /etc/vfstab:

# grep /tmp /etc/vfstab
swap    -       /tmp    tmpfs   -       yes     SIZE=2048M

Unfortunately with this you have to reboot the box, which wasn’t an option with the machine I was running on. So I added a bunch more swap for the time being.

SMF Unhappy after running out of swap space

However I encountered a rather bizarre issue, which can only be described as a bug. Services I had stopped using svcadm disable, wouldn’t re-enable with svcadm enable:

# svcs http
STATE          STIME    FMRI
disabled       23:26:00 svc:/network/http:apache22-csk
# svcadm -v enable http
svc:/network/http:apache22-csk enabled.
# svcs http
STATE          STIME    FMRI
disabled       23:26:00 svc:/network/http:apache22-csk

What’s going on here? The log in /var/svc/log didn’t report the enable command either. After investigating, I came to the conclusion that SMF must have broken when the box ran out of memory. SMF is managed by two processes, svc.startd and svc.configd, and thankfully you can restart them. Simply kill them both:

# ps -ef | grep svc
    root 7     1   0 Dec 01 ?           0:01 /lib/svc/bin/svc.startd
    root 9     1   0 Dec 01 ?           0:00 /lib/svc/bin/svc.configd
# pkill -9 svc.configd
# pkill -9 svc.startd
# ps -ef | grep svc
    root 12803     1   0 23:47:07 ?           0:01 /lib/svc/bin/svc.configd
    root 12841     1   0 23:47:09 ?           0:00 /lib/svc/bin/svc.startd

Then enabling the process actually does it this time:

# svcs http
STATE          STIME    FMRI
disabled       23:26:00 svc:/network/http:apache22-csk
# svcadm -v enable http
svc:/network/http:apache22-csk enabled.
# svcs http
STATE          STIME    FMRI
enabled       23:49:00 svc:/network/http:apache22-csk

Problem solved! However I dislike it when things silently break in this way. You have to wonder, if SMF broke, what else may be having issues?