Debugging the linux boot process
Recently I had to debug some stuff very early in the linux boot process.
When linux boots, it loads the kernel into memory, and you can think of the kernel as a giant program that is the first thing to run, and also an “initrd” image is loaded into memory. The “initrd” image, or “initial RAM disk” image, is the first filesystem mounted, and it is the filesystem the first hand-crafted process, init, sees. There’s an “init” script in this initial RAM disk image, which looks sort of like a shell script, but it isn’t really a “real” shell script, in that the shell which interprets it is not a “real” shell, such as sh, bash, csh or any of those. The shell that this script is written for was (in my case, redhat’s) “nash,” short for “nano-shell,” so named because of its tiny size. Nash only has a few commands, almost all of which are built-in — precompiled into the nash binary — rather than separately compiled binaries.
When the “init” script runs into trouble, maybe because of a misbehaving storage driver, it’d be nice if you could get an interactive shell. Well, it turns out you can, though it’s a bit involved, esp. if it’s the first time you’ve done anything like this.
So the initial RAM disk image, the “initrd” image is this binary blob of stuff. How can you change it? Well, it turns out to be not that difficult. It’s just a gzipped cpio archive.
Look in /boot/grub/menu.lst to find which initrd image you’re using, then copy the initrd image to some working directory. Use gunzip to unzip it.
[root@zuul workdir]# gunzip - < initrd-2.6.26.img > unzipped [root@zuul workdir]# ls initrd-2.6.26.img unzipped [root@zuul workdir]# mkdir tmp [root@zuul workdir]# cd tmp [root@zuul tmp]# cpio -imdu < ../unzipped 17555 blocks [root@zuul tmp]# ls bin dev etc init lib proc sbin sys sysroot usr
So that’s what’s in the initrd image. The “init” file is the init script. So the question is, how can we make this init script give us a real shell?
Well, we need to put a real shell in the bin directory. You could just copy bash in there. But then you’d need all the libraries that bash needs, and bash has some built-in functions, but lots of stuff isn’t built-in, so you’d have to start copying other commands you might need — like vim — and any libraries they might need, and pretty soon it starts to be a lot of stuff, and a lot of work to figure out just what it is you need.
My second thought was to try to compile nash. But nash is distributed as a source rpm, and I started to run into a wall of dependencies, and I just hate rpms in general. I wanted to modify nash, not just build it from the source rpm. Source rpms are not programmer friendly. Source rpms might be considered system administrator friendly, but I’m not even sure about that. The whole RPM design seems to me to be rather incompetent. But that’s neither here nor there. Forget about trying to debug nash, there’s no need (unless you’re very unlucky.)
There’s a better way. Go grab busybox. Busybox is a shell that’s meant for use on embedded systems. It’s got much of the functionality of a real shell, plus lots of what would normally be external commands built in, and it doesn’t have a lot of dependencies on libraries.
To build busybox, you run “make menuconfig”, then “make” (similar to how the kernel is built.) It’s done this way because, like the kernel, there are many things you can turn on or off, depending on what commands your “embedded” system needs. For use in debugging nash init scripts and the early kernel booting process, the default configuration which has most everything turned on is fine.
Once busybox compiles, run ldd on it to see what libraries it needs. Copy busybox and the needed libraries into the bin and lib directories that you unpacked from the initrd image.
Nov 17 2011: Nowadays you can build buysbox statically linked (under build options under the first item of menuconfig) and avoid having to include the libraries
Busybox contains all manner of different commands, like “ls”, “vi”, “ln”, “dd”, and so on. You can either invoke busybox as “busybox” with the command name as the first argument, or you can make a link from the command name to busybox, and busybox, if invoked as “ls”, will perform as “ls”, and if invoked as “vi”, will perform as (a very stripped down) vi, etc.)
It’s probably a good idea to make symlinks from bin/ls to bin/busybox and likewise for whatever other commands you know you’ll need. If you find you’ve forgotten some during debugging, you can make new links via
cd bin busybox ln -s command busybox
So, once busybox, its few needed libraries, and whatever sym links you care to make are done, just edit the init script to invoke “busybox ash” (“ash” for “a shell”) at the point at which you need to debug, and then it’s time to pack up the initrd image.
To do that, it’s just
find . -print | cpio -H newc -o ../myinitrd cd .. gzip -9 myinitrd cp myinitrd.gz /boot
Nov 17, 2011: RHEL6, for example, uses an initramfs instead of an initrd. They are both compressed cpio archives, however the above command to repack the initramfs won’t work. Instead you have to do this for the initramfs:
find . | cpio -R 0:0 -o -H newc | gzip -9 > ../initramfs.img
The important difference is the “-R 0:0″ argument to cpio which makes it give all the files in the archive owner and group ids of 0. The init scripts in the RHEL5 initramfs seem to care about that, whereas I guess in the RHEL5 initrd, file permissions didn’t seem to matter.
Then just edit /boot/grub/menu.lst to use this custome initrd image, and when it boots up, if all has been done correctly, when the init script comes to the line “busybox ash”, you should get a shell prompt, at which point you can run whatever commands you set up in bin, or whatever commands are compiled into busybox, and generally commence debugging.
By this means I was able to debug a problem with a storage driver that, as it turned out, wasn’t allowing the open() call to succeed on the first attempt to read the partition tables, so no partition sysfs entries were being created, so udev never made any partition device files, and so root could not be mounted. That would have been pretty tough to figure out without some visibility into what was going on with the init script, and without the ability to poke around in /sys during the early stages of booting.