Early last Sunday afternoon I noticed that the battery-charge indicator had vanished from (main laptop)Sable's Gnome panel. (That's sort of like the row of icons and such you see along the bottom of the screen on a Mac, except that I've configured it to go vertically down the left-hand edge, where it doesn't reduce the hight of my browser window too much.)

Hmm, says I to myself, maybe it will come back after a reboot. So I did that, and logging in presented me with an empty screen background. ??? A little more experimentation showed that only the Gnome-2 desktop was affected; the Ubuntu one (which I detest) worked fine. So did a console terminal, and SSH. The obvious next step was to run fsck, the file-system checker (and many hackers' favorite stand-in for a certain four-letter expletive).

Well, not quite the next step. Since I figured that fixing file-system corruption might possibly make things worse, I moved over to one of my spare laptops, Raven, sat Sable on the shelf next to my desk, and logged in on Sable with SSH. Then I went to the top of my working tree and ran make status to see what needed to be checked in. I think I've mentioned MakeStuff before -- it's basically a multi-function build tool based on GNU Make, and one of the things it can do is find every git repository under the top-level directory, and do things like check its status, or pull. (Commit takes a little more thought, so you don't want to do it indiscriminately.)

Then I ran MakeStuff/scripts/scripts/pull-all on Raven. Done.

Well, almost. There are a few things in my home directory that aren't under my working tree, mostly Desktop, Documents, Downloads, my Firefox bookmarks, and my Gnome Panel configuration. I hauled out a USB stick, fired up tar (like zip, except that it can save everything about a file, not just what DOS knows about). The command I actually used, because I probably forgot a few things (and should have excluded a few more, like Ruby and Perl), was

    rsync -a --exclude vv --exclude ?cache --exclude ?golang . \
          nova:/vv/backups/steve\@sable

And ran straight into the fact that USB sticks are usually formatted with a FAT filesystem, and limit files to 4Gb. Growf! Faced with the unappetizing prospect of shipping 17GB of backups over WiFi, I carried Sable over to my server and plugged in the ethernet cable that I leave hanging off the router for just such occasions. After that finished, I fired up Firefox bookmarked all my tabs, and exported tabs and bookmarks to an HTML file. Should have done that before I backed up everything, but I didn't think of it.

Finally, I was ready to run fsck and find out the bad news. I plugged in the USB stick with the Ubuntu live installer (one does not run fsck on a mounted filesystem!), brought up a terminal, and ran

e2fsck -cfp /dev/sda5 # check for bad blocks, force, preen

(Force means to do a full check even if the disk claims it's okay; "preen" means to make all repairs that can be done without human approval.) Naturally, after turning up a few dozen bad blocks, it told me that I had to run it manually. I could have replaced the -p option with -y, to say "yes" to all requests for approval; instead I left it off and hit Enter a hundred times or so. Almost all the problems were "doubly-claimed blocks", mostly shared between some other file and the swapfile. Of course. Fsck offered to clone those blocks, and I took it up on that offer. Then ran it again to make sure it hadn't missed anything. It hadn't. But it was still broken, no doubt because of all those corrupted files.

So this morning, after a couple of searches, I installed the debsums program, which finds all of the files you've installed, and compares their checksums against the ones in the packages they came from. The following command then takes that list, and re-installs any package containing a file with a bad checksum:

apt-get install --reinstall $(dpkg -S $(debsums -c) \
       | cut -d : -f 1 | sort -u)

Sable now "works" again. I know at one zip file was corrupted (it was a download, and I was able to find it again), and fsck doesn't appear to have kept a log, so broken files will keep turning up for a while. I know there aren't any bad zip files left because there's an option in unzip, -t, that compares checksums, just the way debsums does, so I could loop through all my downloads with:

for f in *.zip; do echo -n $f:\ ; unzip -tqq $f; echo; done

I have two remaining tasks, I think: one is to validate all of my Git working trees (worst case -- just blow them away and re-clone them), and then comes the really hard one: deciding whether I still trust Sable's SSD, or need to get a new one. And if I get a new one, how big? Sable and its 500GB drive were purchased together, used, from eBay, and brand-new 1TB SSDs are pretty cheap right now. So there's that.