My thinkpad 500GB hard drive had been slowly failing (which I knew courtesy smartd, and , so I got an advanced replacement from Hitachi and copied the failing drive minus its two already failed sectors (courtesy of GNU dd_rescue).
While we're at it, I recommend that people run smartd with the following in their smartd.conf:|
DEVICESCAN -R 194 -R 231 -I 9 -W 5 -a -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/share/smartmontools/smartd-runner
and that you put this in your crontab for later analysis and figuring out what's being going on with your drives, even after the fact:
2 1 * * * root DIR=/var/log/smart; mkdir -p $DIR ; FILE=$DIR/`date '+\%F'`.log; for i in /dev/sd?; do echo $i; smartctl -a $i; echo; echo; done > $FILE
The harder part was figuring out which files were going to be partially lost due to those sectors. After racking my brain on how I could convert those sectors into filenames and getting nowhere, I realized that was a very simple way of finding out: just read all the files and log the errors.
A day or so later I had read the entire filesystem and narrowed it down to the two files that were damaged. Sometimes the low tech option is the best.
While I was doing that, I was trying some new tools and two new boot CDs I had built, something any self respecting system should carry: UBCD and/or UBCD4Win. UBCD4Win is basically a windows live CD you build from your windows install media, and UBCD has a lot of dos boot floppies (disk check, bios, rescue and more), and a nice version of parted magic, recovery linux live distro that runs from RAM.
The only small downside is where gparted, at least on one occasion, has a small bug that kind of shredded an ext3 partition of mine, but it worked fine the rest of the time, even for resizing ntfs partitions.
The cool part was when I tried to rescue to the ext3 partition for practise (I had backups), and saw Ted Tso online in my gtalk friend list (I'm trying to remember when I added him, maybe when I helped him out with his G1 at a linux conference), and he nicely helped me out trying to fake out resize2fs and then debugfs but in the end, the filesystem was kind of mangled though.
Anyway, gnu ddrescue, ubcd and ubcd4win rule (and mine now contain extra tools, including image for linux and image for windows).