7. Troubleshooting, or The Agony of Defeat

When building bootdisks, the first few tries often will not boot. The general approach to building a root disk is to assemble components from your existing system, and try and get the diskette-based system to the point where it displays messages on the console. Once it starts talking to you, the battle is half over because you can see what it is complaining about, and you can fix individual problems until the system works smoothly. If the system just hangs with no explanation, finding the cause can be difficult. The recommended procedure for investigating the problem where the system will not talk to you is as follows:

Once these general aspects have been covered, here are some more specific files to check:

  1. Make sure init is included as /sbin/init or /bin/init. Make sure it is executable.

  2. Run ldd init to check init's libraries. Usually this is just libc.so, but check anyway. Make sure you included the necessary libraries and loaders.

  3. Make sure you have the right loader for your libraries -- ld.so for a.out or ld-linux.so for ELF.

  4. Check the /etc/inittab on your bootdisk filesystem for the calls to getty (or some getty-like program, such as agetty, mgetty or getty_ps). Double-check these against your hard disk inittab. Check the man pages of the program you use to make sure these make sense. inittab is possibly the trickiest part because its syntax and content depend on the init program used and the nature of the system. The only way to tackle it is to read the man pages for init and inittab and work out exactly what your existing system is doing when it boots. Check to make sure /etc/inittab has a system initialisation entry. This should contain a command to execute the system initialization script, which must exist.

  5. As with init, run ldd on your getty to see what it needs, and make sure the necessary library files and loaders were included in your root filesystem.

  6. Be sure you have included a shell program (e.g., bash or ash) capable of running all of your rc scripts.

  7. If you have a /etc/ld.so.cache file on your rescue disk, remake it.

If init starts, but you get a message like:
        Id xxx respawning too fast: disabled for 5 minutes  
it is coming from init, usually indicating that getty or login is dying as soon as it starts up. Check the getty and login executables and the libraries they depend upon. Make sure the invocations in /etc/inittab are correct. If you get strange messages from getty, it may mean the calling form in /etc/inittab is wrong.

If you get a login prompt, and you enter a valid login name but the system prompts you for another login name immediately, the problem may be with PAM or NSS. See Section 4.4. The problem may also be that you use shadow passwords and didn't copy /etc/shadow to your bootdisk.

If you try to run some executable, such as df, which is on your rescue disk but you yields a message like: df: not found, check two things: (1) Make sure the directory containing the binary is in your PATH, and (2) make sure you have libraries (and loaders) the program needs.