As a follow-up to the last post about /var filling up, here’s another one that’s equally as crazy.
In an effort to fix /var once and for all I scheduled some emergency downtime last night. The aim was to make the /var partition bigger. This got off to a pretty bad start when, in true Solaris fashion, I attempted to drop noisy down to single user by typing init 0. In Solaris this drops a SPARC system to the OBP (like the BIOS), and just reboots x86 machines. In FreeBSD runlevel 0 is equivalent to Solaris runlevel 5… shut the system down and then power it off.
So, at about 22:15 last night I switched the primary CompSoc server off. Hardly the fix I was looking for.
After a number of calls to Andy and Inti I had somebody switch it back on (bear in mind that the system is in Manchester and I’m near London)... in an attempt to minimise the downtime, I decided to do the fix as soon as it came back up. This time I got the right runlevel… init 1, but (as I realise now) all sorts of crazy stuff happens to FreeBSD’s serial redirection in single user mode. It appears to knock off serial support and output only to the video console. Again, not a lot of good for me.
Later on in the day somebody else rebooted it and after a number of attempts to get things working, I decided that I’d do the fix in full multi-user mode. This involved disabling logins, stopping almost all services, etc. More lsof was used to determine what was using /var; these were stopped and when there were no open filehandles I umount -f‘d /var.
I dumped the contents of /var to a different disk and set about updating the disklabel.
# /dev/ad0s1:
8 partitions:
- size offset fstype [fsize bsize bps/cpg]
a: 1048576 0 4.2BSD 2048 16384 8
b: 8388608 1048576 swap
c: 156296322 0 unused 0 0 # "raw" part, don't edit
d: 1048576 9437184 4.2BSD 2048 16384 8
e: 52428800 10485760 4.2BSD 2048 16384 28552
f: 93381762 62914560 4.2BSD 2048 16384 28552
Above is the disklabel before the change… a is /, d is /var, e is /usr and f is /backup2. What I needed to do was grow d (currently just 512MB) by using some of the space from f, which was an unused backup directory. The obvious problem here is that /usr was in the way. My solution was to grow swap by 512MB, totally remove the d line, shrink f to around 8GB and rename it to d. This sounds a little complicated… it took me a while to get my head around it.
Prior to the change the on-disk layout was something like:
[ a (/) ] [ b (swap) ] [ d (/var) ] [ e (/usr) ] [ f (/backup2) ]
Now that I’ve made the changes the on-disk layout is more like:
[ a (/) ] [ b (bigger swap) ] [ e (/usr) ] [ d (/var) ]
The bsdlabel currently looks like:
# /dev/ad0s1:
8 partitions:
size offset fstype [fsize bsize bps/cpg]
a: 1048576 0 4.2BSD 2048 16384 8
b: 9437184 1048576 swap
c: 156296322 0 unused 0 0 # "raw" part, don't edit
d: 16777216 62914560 4.2BSD 2048 16384 28552
e: 52428800 10485760 4.2BSD 2048 16384 28552
The beauty of this (as far as I was concerned) was that everything was still contiguous, no holes and no changing of slice letters. Next step was to newfs the new /var, mount it and restore the contents from the file on the other disk I previously mentioned. No major problems here, although I did manage to restore the contents of /var to both / and my personal home directory. Fortunately this mess was easy to clear up.
So, with all of the files back, I rebooted the box. It didn’t come back up.
After a lot of time talking Inti through the console (which I couldn’t get, because the machine was having none of single-user mode serial) we discovered that the only reason the system wouldn’t boot was because I hadn’t removed the /backup2 entry from /etc/fstab! D’oh! A rookie mistake (but one that I always make).
Once we got this removed the system shot up. Allow a few more hours to get both bump and noisy up with LDAP working and we once again have a fully running CompSoc.
It certainly didn’t go as planned, but I believe the end result is a good one:
# df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/ad0s1a 496M 383M 73M 84% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/ad0s1e 24G 20G 2.0G 91% /usr
/dev/ad0s1d 7.7G 160M 7.0G 2% /var
/dev/da0 541G 190G 308G 38% /data
linprocfs 4.0K 4.0K 0B 100% /usr/compat/linux/proc
procfs 4.0K 4.0K 0B 100% /proc
devfs 1.0K 1.0K 0B 100% /var/named/dev
We really need to work on getting serial output from FreeBSD working properly, not to mention installing a new network card so that we can use the internal 10/100 interface for IPMI, which will allow us serial-over-LAN and full remote power capabilities.
When I got home at 7PM I treated myself with a curry and an episode of Prison Break.
Apologies to anybody that was affected by the downtime!