Marc's Public Blog - Linux Hacking


vvv Click on the categories below to see other topic specific pages vvv



>>> Back to post index <<<

π 2019-08-15 01:01 in Computers, Linux, Public
It's almost amazing that this "what was I thinking in building this server out of free parts that really didn't belong together" magic v2-v3 I rebuilt in Jan 2008 survived for 11.5 years, but it did, until it died in the middle of Aug 2015 from a blown capacitor, conveniently while I was home and not as I was going or at burning man just after. I had just enough time to fix it and make a new one.

you can see the puffy capacitors and one of them died and took the motherboard with it
you can see the puffy capacitors and one of them died and took the motherboard with it

First, I tried to see if I could move all this to another VA Server I had laying around at home, a mighty dual P3 800Mhz. It would have been a downgrade, but would have brought the machine back up:


However, I quickly found out that the machine was way too old, and incapable of booting from Sata when a sata card was plugged in, or anything plugged via USB. Maybe the fact that the motherboard was from the late 90's had something to do with it :)
This time, I thought I'd be smarter and use the 2 unused Dell Poweredge 2950 servers I had sitting at home for a sunny day (maybe for up to 10 years. This time, I'd use a proper server and it would be easier :)

Also, I would finally upgrade to a 64bit kernel (my previous servers could only run 32bit, which started to be a problem with some software that was 64bit only, badly tested on 32bits, never mind code that needed more than 2GB of RAM (I mean technically you could run more than 2GB of RAM on a 32bit kernel, but it was through bounce buffers and non contiguous memory, with still a 2GB limit for each user space process).

But of course, I didn't look up that those servers dated from 2006 and were totally obsolete in 2019. I figured, it wouldn't matter, they were better than the previous even more obsolete free motherboard a coworker gave me and that I managed to fit in a VA Linux server case. Also, they had serial console and even remote serial console over network via a separate IP, luxury!!!

It also had nice disk hardware for SATA drives, so things were going to be great. I however found out that the required Perc Raid card (required because of the backplane the drives were plugged into and cabling), did allow some kind of Sata passthrough, so that I could do software raid with linux (I've never trusted hardware raid, it's always been too slow and vendor dependent). Sadly, it was terribly slow. I then went on the internet a bought some upgraded H700 Raid cards that were supposed to be faster. First I found out they needed new cables, sigh, but even once I got all of this, the entire system disk I/O was still unbearably slow.
In the end, with time running out (days had gone by while I had the old drives running off a server shell on my desk at home and serving queries and Email)

So this was the new server, looked nice inside:


Unfortunately I was forced to use the weird and slow raid card, which was still slow after upgrading to an H700:


4 Cores, 16GB of RAM, luxury in 2006 :)
4 Cores, 16GB of RAM, luxury in 2006 :)

Did I say it was slow? Here is a raid rebuild. I think reads were ok, but writes were terribly slow probably because they were forced to go through some writeback cache on the raid card that I did try to remove. Yeah, this first rebuild shows about 1 week for 7TB :-//

md3 : active raid5 sda6[7] sde6[6] sdf6[4] sdd6[8] sdb6[9] sdc6[2]
      7118325760 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/5] [_UUUUU]
      [=======>............]  recovery = 42.9% (611422712/1423665152) finish=3325.8min speed=4070K/sec

Interestingly, if I rebuild the array on the 7th drive, connected directly via Sata, it's 10x faster:

md1 : active raid6 sdg3[9] sda3[6] sde3[5] sdd3[7] sdb3[8] sdc3[2]
      419164160 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UUUU_U]
      [======>.............]  recovery = 38.7% (40655068/104791040) finish=19.8min speed=53705K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

The previous server was really built on happy thoughts and good luck, it's amazing it ran for 11 years and I didn't have any major double disk failure (I did have some drives I had to replace over time, but nothing major and never an emergency or with any downtime). This time around, I did things seriously:

  • 6 drives in raid6 for all the filesystems that mattered
  • raid1 over 6 drives with a standalone basic linux system for recovery (not the real system, but enough to log in and fix things). This would work even if 5 drives died!
  • a 7th drive sitting on the motherboard to act as hot spare so that I could bring the raids back to full capacity after a drive had died without having to go in person to replace the drive, being urgent
  • a bootable USB key I could remotely boot from via serial console if everything failed. This would allow inspecting thigns, doing recoveries and even remote re-imaging since you can't easily re-image a filesystem that you booted from. I never had to use it, but to this day I thought it was a really cool feature I'm still proud of :)
  • I had dual serial working (real serial and network serial) but in the end was not able to use the real serial as I had nothing to plug it into on the colo side, and the network serial ended up being good enough.
  • dual power supply, never died, never faltered.
  • somehow the motherboard still worked fine even though it was 19 years old, although it didn't run continuously for 19 years.
  • I really really wanted to just put a big sata card and plug it onto all the drives, but the Dell backplane and drive power routing made it mostly impossible. So after the raid card upgrade, I realized it was a lost cause, and changed plans to boot from a 4TB SSD and use it as the main boot and data drive:

    you can see the 7th drive sitting on top of the motherboard and using a spare sata port, and the 4TB SSD on top
    you can see the 7th drive sitting on top of the motherboard and using a spare sata port, and the 4TB SSD on top

    Now, the 4TB Sata SSD, which I got for free after one in a set died an early death, taking all its data with it, did not fill me with confidence, but it was fast and fixed my slow raid issues, so I made the call to use it as a boot drive. Thanks to my btrfs send/receive snapshots setup, the SSD was backed up hourly to spare filesytems on the raid, and it would be easy to switch over remotely with virtually no data loss if it died.
    I found out in Oct 2025 that it survived 6 years without dying. I still wouldn't trust it much, but it survived, so good enough :)

    As laughable it was to setup what turned out to be a 2006 server in 2019, and putting aside the issues with very slow write speed through the raid card I never solved, this was the most overbuilt system I ever built

    What filesystems looked like:

    sdh4          400G  0 part  /mnt/btrfs_boot
    sdh5          1.6T  0 part  /mnt/btrfs_pool1
    md0           953M  0 raid1 /boot2 (spare ext4 linux boot/recovery system)
    md1         399.8G  0 raid6 /mnt/btrfs_bootb (btrfs send/receive of sdh4)
    md2           1.6T  0 raid6 /mnt/btrfs_pool1b (btrfs send/receive of sdh5)

    Quad boot system:

  • usb flash basic recovery tools and linux boot
  • /dev/sdi1          63   24191   24129 11.8M  1 FAT12
    /dev/sdi2       24192   48383   24192 11.8M  1 FAT12
    /dev/sdi3  *    48384 4011839 3963456  1.9G  b W95 FAT32
  • 4TB SSD
  • /dev/sdh1        2048       4095       2048    1M BIOS boot
    /dev/sdh2        4096    2111487    2107392    1G EFI System
    /dev/sdh3     2111488   39063551   36952064 17.6G Linux extended boot
    /dev/sdh4    39063552  877924351  838860800  400G Linux filesystem
    /dev/sdh5   877924352 4283699199 3405774848  1.6T Linux filesystem
    /dev/sdh6  4283699200 4317253631   33554432   16G Linux swap
    /dev/sdh7  4317253632 4585689087  268435456  128G Intel Fast Flash
    /dev/sdh8  4585689088 7501476494 2915787407  1.4T Linux filesystem

  • md0: raid6 small linux bootable system enough to look around and fix
  • md1: raid6 backup for linux boot partition with btrfs send/receive
  • Then 2 more raids with data:

  • md2: raid6 backup for linux data partition with btrfs send/receive
  • md3: extra expandable data, 7TB
  • This is actually the only version of magic I built and that didn't die. I only replaced it with, ironically enough, a duct taped frankenstein Raspberry Pi 5 that really doesn't look solid, but that I hope will work as I write this in 2025. The reason for the switch, is it uses 20x less power and is almost 3x faster, so it's pretty much 60 times more efficient and takes the power use from 400W-ish to 15W


    Price of Power use

    Dell 2950 ancient CPU, 4 real cores (not HT), 65nm die from 2006. Average annual usage: 3,504 kWh per year.

    Price of power is no joke:

    Silicon Valley Power (Santa Clara): $0.175/Kwh	$613/year
    Palo Alto Utilities (Palo Alto): $0.22/Kwh	$771/year
    Pacific Gas & Electric (PG&E): $0.425/Kwh	$1,489/year

    Further reading

  • rescuing/rebuilding magic, and magic back online and live
  • Moremagic v1 died after 18 years of service
  • Magic v3 died, upgrade to V4, Dell Poweredge 2950 and 64bit linux!
  • Magic v5: From Dell Poweredge 2950 to Raspberry Pi 5 (skipping Dell DSS1510)
  • Finishing Upgrade of Year 2000 Linux System From i386 to amd64 to arm64 for Raspberry Pi5 with mailman 2.1.7 for Python 2 (the last 5% that took 70% of the time)
  • Exim4 Mailman2 allow insecure tainted data local parts and local part data (what sadly made this migration a lot less fun around the end)

  • More pages: July 2002 February 2004 March 2004 November 2004 April 2005 August 2005 January 2006 July 2006 August 2007 November 2007 December 2007 January 2008 October 2008 November 2008 December 2008 January 2009 May 2009 July 2009 August 2009 September 2009 November 2009 December 2009 January 2010 March 2010 April 2010 June 2010 August 2010 October 2010 January 2011 July 2011 August 2011 December 2011 January 2012 March 2012 May 2012 August 2012 December 2012 January 2013 March 2013 May 2013 September 2013 November 2013 January 2014 March 2014 April 2014 May 2014 October 2014 January 2015 March 2015 May 2015 January 2016 February 2016 June 2016 July 2016 August 2016 October 2016 January 2017 September 2017 January 2018 March 2018 December 2018 January 2019 August 2019 January 2020 May 2020 January 2021 September 2021 March 2023 April 2023 December 2023 June 2024 September 2024 November 2024 July 2025 August 2025 October 2025 November 2025

    >>> Back to post index <<<

    Contact Email