Marc's Public Blog - Linux Hacking


vvv Click on the categories below to see other topic specific pages vvv



>>> Back to post index <<<

π 2008-01-16 01:01 in Computers, Linux, Public
magic died from a double disk failure on Dec 31st 2007, at the time I had to fail over all my services to moremagic, which thankfully I had setup and put online just for a eventuality like that one.
It was a then top of the line VA Linux 2250 with hot swappable SCSI SCA drives which I likely built around 2000 or so (don't have exact records nor pictures):

looks like it was at MFN early on while I was at google (before that it was at VA Linux)
looks like it was at MFN early on while I was at google (before that it was at VA Linux)

moved from MFN to via.net in 2003
moved from MFN to via.net in 2003

When I got home to fix it, the server was in a sad state because one of the raid5 drives was dead (you could hear the head scraping the disk when you turned it on) and a second drive would die as soon as you accessed some portions of it. So, when I managed to manually bring the raid5 array on 4 drives instead of 5, that second drive would go offline as soon as I accessed some of its sectors (as /usr was being mounted).
Unfortunately, all the partitions were all logical volumes on a big volume group, and the boot process would try to mount /usr from /dev/intraid5/usr and bring the raid5 array down, along with the volume group and all other logical volumes.
Now, I already had a backup of almost all the data there, but not my ftp site, as it was too big to bother backing up. This was what I was trying to get back.
After a little effort, I actually got it to work: I manually brought a not up to date raid with
mdadm -A --run --force /dev/md5 /dev/sd{a,b,c,d}3
(--run because one drive was missing and --force because /dev/sdc3 was not up to date with the rest of the drives, but at that point, I didn't give a shit, I just wanted it to come up with whatever data it did have).

At that point, I would try to mount /usr to get some binaries and everything would go down and fail. Eventually, I managed to rescue it by mounting just /var/ftp without touching/mounting any other logical volume, and I was able to bring networking up and make use of tar and nc which were both in /bin to copy the data off and save the day (rescuing a system without any binary in /usr can be a bit challenging :) )

While I was working working on bringing magic back up, I decided to build a new server: I decided against fixing the old server as is, just because all my SCSI/SCA drives were old, and replacement ones would be bound to fail again (not counting the fact that they are low capacity too). The old motherboard was also a dual P3 800Mhz with one failed CPU, and RAM that still worked, but wasn't new either.
Instead, I used a lesser VA Linux 2230 system which didn't have any hotswappable drives or SCA midplane (compared to the previous 2250 I had), and used a free server motherboard that a coworker gave me. In hindsight, maybe I should have bought a new server board and populated it, but I eventually made it work: the major challenge was that it didn't have SCSI onboard, and only one PCI slot (also, it unfortunately does not allow for bios console redirection on serial port).

I ended up with a frankstein machine with:

  • dual P3 1.4Ghz CPUs (vs one 800Mhz)
  • no CD-Rom or floppy as I removed them to add 2 drives on which I jerry-rigged a fan
  • the 2 sata drives worked with a sata to ide converter to run them off the system IDE bus
  • In total 6x 250G Sata drives (4 running off Sata in addition to the 2 running off the IDE bus)
  • PCI connection doubler with IDSEL on 27, for one Sata 3114 sil board and one SCSI NCR 53c895 board (see below)
  • 24 to 20 pin power connector and extender
  • The PCI bus was a problem since it only allowed one card, but I was able to find a PCI extender with bus selector for the second card from logicsupply.com. This was challenging to get working (and only one of the two cards will post at boot time), but in the end both cards show up in linux and work, which is all I cared about.

    In the process, I also managed to smoke a SATA card by putting it in the wrong direction in the PCI slot (don't laugh, it has no header bracket and it's very hard to see its front from its back, especially on an angled bracket), I also got some smoke to get off the motherboard when it wasn't screwed to the chassis and it slid while powered on and touched a chassis bolt which shorted a solder trace on the back of the board.
    Amazingly, I'm now stress testing the board and it seems to still work fine, for all of CPU, Ram, and PCI. I've restored my last backup of magic on it, ready for going back in production tomorrow.


    After that, I went to via.net to rack the new frankenstein magic, after having just finished copying live data back from moremagic and switching mail/web/everything back over. This is what it looked, racked up a few months later, still running that huge disk tray filled with obsolete SCSI/SCA drives :)


    We're now back up with: 2TB of disk, 2GB of Ram, and 2x1.4Ghz CPUs (ok, not stellar, but it'll do). The 2TB was through an external SCA/SCSI disk array I had before and could still connect to.

    Here are a few pictures of the new server:

    this was the pci riser/doubler with idsel
    this was the pci riser/doubler with idsel

    The power connector was also 24 pin instead of 20, and too short. Thankfully I had a power connector extension that came in very handy
    The power connector was also 24 pin instead of 20, and too short. Thankfully I had a power connector extension that came in very handy

    I had to adapt the motherboard specific front panel connector (for intel 550GX) to a power connector that worked with a regular motherboard
    I had to adapt the motherboard specific front panel connector (for intel 550GX) to a power connector that worked with a regular motherboard

    4 sata drives on the right, in proper slots, and 2 sata drives on the left, one in the CD-Rom slot and one in the floppy slot, with a jerry-rigged fan I added
    4 sata drives on the right, in proper slots, and 2 sata drives on the left, one in the CD-Rom slot and one in the floppy slot, with a jerry-rigged fan I added

    those IDE to sata convertors (which actually have a sata controller that takes IDE commands and translates them into brand new sata commands), came in quite handy to add 2 more drives when I only had 4 sata slots on the controller card
    those IDE to sata convertors (which actually have a sata controller that takes IDE commands and translates them into brand new sata commands), came in quite handy to add 2 more drives when I only had 4 sata slots on the controller card

    As I write this after the fact, I do wonder "what on earth was I doing?", that was a fair amount of complex engineering to re-use free hardware that wasn't really meant for the job, but once I factor in my engineering time to make it all work, including the PCI doubler config, or making Sata drives work on IDE ports because I didn't have enough ports on my free Sata card and probably could have bought an 8 port Sata card, even back then

    For version 3, it looks like I eventually removed the SCSI card to run that obsolete external array with power hungry obsolete drives, and somehow managed to connect a single 1 or 2TB drive, sitting on the motherboard with a CPU cooler and fan:

    some of my best work, haha
    some of my best work, haha

    This allowed me to get rid of the obsolete disk array filled with drives that got replaced by a single drive sitting on top of the motherboard :) which was non redundant but contained non essential data. It still had 6 sata drives I also got for free (250GB each) running Raid5 for the main system.

    That new system went on to run from 2008 to 2019, not a bad run for such a weirdly pieced up machine that blew smoke from a short when I built it until the motherboard finally died in Aug 2019, apparently from a failed capacitor.

    Further reading

  • rescuing/rebuilding magic, and magic back online and live
  • Moremagic v1 died after 18 years of service
  • Magic v3 died, upgrade to V4, Dell Poweredge 2950 and 64bit linux!
  • Magic v5: From Dell Poweredge 2950 to Raspberry Pi 5 (skipping Dell DSS1510)
  • Finishing Upgrade of Year 2000 Linux System From i386 to amd64 to arm64 for Raspberry Pi5 with mailman 2.1.7 for Python 2 (the last 5% that took 70% of the time)
  • ]
  • Exim4 Mailman2 allow insecure tainted data local parts and local part data (what sadly made this migration a lot less fun around the end)

  • More pages: July 2002 February 2004 March 2004 November 2004 April 2005 August 2005 January 2006 July 2006 August 2007 November 2007 December 2007 January 2008 October 2008 November 2008 December 2008 January 2009 May 2009 July 2009 August 2009 September 2009 November 2009 December 2009 January 2010 March 2010 April 2010 June 2010 August 2010 October 2010 January 2011 July 2011 August 2011 December 2011 January 2012 March 2012 May 2012 August 2012 December 2012 January 2013 March 2013 May 2013 September 2013 November 2013 January 2014 March 2014 April 2014 May 2014 October 2014 January 2015 March 2015 May 2015 January 2016 February 2016 June 2016 July 2016 August 2016 October 2016 January 2017 September 2017 January 2018 March 2018 December 2018 January 2019 August 2019 January 2020 May 2020 January 2021 September 2021 March 2023 April 2023 December 2023 June 2024 September 2024 November 2024 July 2025 August 2025 October 2025 November 2025

    >>> Back to post index <<<

    Contact Email