Test ram with Memtest86+ and ignore bad parts with badram in grub

Linux
Author

Vinh Nguyen

Published

March 30, 2012

Recently, my computer kept freezing whenever I started conkeror (with 100+ buffers loading from a previous session). Folks over at #conkeror on freenode suggested that the problem might be due to faulty ram. They suggested testing my ram with Memtest86+. It is installed by default on Ubuntu.

If you have multiple sticks of ram, test one stick at a time. It's best to test one stick per night as the test can take hours. To test the ram, restart your computer and go to the grub menu (hold shift if your grub menu doesn't display automatically). Then, select the "Memtest86+" boot option. Press "c", "4", and "3" to display the error locations according to the BadRAM syntax (converting the default faulty memory addresses is not obvious to me and others). If you don't do this, you will end up wasting time fixing your boot options (details later).

If you know which ram sticks are bad, replace them if they are under warranty. If they are not under warranty and you can't afford new ram, you can make use of BadRAM, incorporated by default in grub2, per these documentations. That is, edit /etc/default/grub and specify the faulty ram addresses with the GRUB_BADRAM option.

More information on running Linux with broken memory can be found here.

When I tried this out, I did not use the proper memory address syntax so my computer failed to boot. What made things even worse was that my hard drive was encrypted. Luckily, I can still access grub, and after many trials and tribulations, I fixed the problem by booting the computer with an Ubuntu live disk (usb), mounting the first, unencrypted partition (/dev/sda1) of the hard drive that stored /boot, and removing the badram option in /boot/grub/grub.cfg (replace "boot" with the mount path). Before figuring out the solution, I was trying to mount /dev/sda5, the encrypted partition, according to this and this as I thought that was where /boot resided. I also thought I had to generate a new initrd image. Luckily I didn't have to (and didn't succeed in trying) as that would have further complicate my boot options as I have experienced in the past.

After removing the bad ram, conkeror still crashed for me. Either something is wrong with other pieces of my hardware or something is going on with the xulrunner sucking up my system resources. I was able to stop the crashes by placing this in my conkeror rc file.