Replacing a 16 year old Sandy Bridge Server running 12 Spinning Rust Drives with something more efficient
My old Intel Sandy Bridge server gargamel built in 2010, initially with a dual core duo, later upgraded to a quad core with hyperthreading, was 16 years old. It was still working, but I had already replaced the drives multiple times from 2TB to 4TB, 6TB, and eventually 12TB drives as the previous drives were getting old and started failing ( My first ridiculous NAS was 2TB, with 26 SCSI SCA Drives in 3 enclosures, circa 2002 ). I setup that last server with 10 SATA drives in 2 enclosures of 5 drives each. It's been running for over 15 years with a just a few drive upgrades and replacements now at 64TB of spinning rust. Turns out I didn't really need that much but on the last drive upgrade, I went directly from 6TB to 12TB.. The server still works fine, but it's ultimately still running a debian install from 1999 that's been upgraded all these years, including a 32/64bit dual userland without systemd. But fighting "progress" only goes so far, and my 2nd disk array with 10Y+ old 4TB drives was starting to have more drive failures. Also, I realized that 250W+ of power is a bit more than needed, so I decided to upgrade to an rPi5 with 16GB of RAM and see if I could make a decent linux server out of it.Considering a rPi5 with 20 SSDs
Here is what I did:/dev/mapper/dshelf1 30T 5.4T 24T 19% /mnt/btrfs_pool1 => 10x cheap TLC + QLC SSDs in raid6 /dev/mapper/dshelf2 25T 128G 25T 1% /mnt/btrfs_pool2 => 8x more expensive MLC/TLC enterprise drives in raid5 /dev/mapper/dshelf3 447G 6.1M 445G 1% /mnt/btrfs_pool3 => left over space from some QLC drives that are 4.09TB /dev/mapper/dshelf4 447G 6.1M 445G 1% /mnt/btrfs_pool4 => left over space from MLC drivesThe next problem is "how do you power 18 directly connected external drives?". You're going to tell me to just get drive enclosures, but turns out there aren't any or many external drive enclosures for 2.5" drives that offer direct sata connection as well as their own power. You would think it shouldn't be too hard to buy reasonably sized standalone 12V/5V power supplies for sata drives that offer more than 20A fo 5V (even NVME drives can take more than 1A each), but I didn't find any without buying a full bore ATX power supply and deal with it not coming on on its own because it's not connected to a motherboard), so I had to make my own: I took a 40A 5A power supply I laying around for LEDs, joined it with a 12V 7A power supply, and made my own Sata power bus. From there, I could indeed have 18 drives hang off the sata power plugs ;) Or do something a bit better and found these nice enclosures. Unfortunately they cost $90 each when they don't even provide their own power, and sadly the built in fans require 12V, so I have to send them dual power just for that otherwise I'd be able to power the entire thing from 5V:
Making all this work on an rPi5
So you're going to tell me that maybe an rPi5 wasn't really meant to have a PCI bus, never mind to run 20 SSDs (18 Sata + 2 M2 NVME), and maybe you'd be right, but I got excited when I got this quad NVME expander board for my Pi5:
![]()
I mean it does look pretty and exciting ;)
![]()
routing the Pi5 Ribbon is a bit tricky and requires longer ribbon cables to read the middle splitter board
SSDs and Prices, using cheaper DRAM-less SSDs and QLC RDAT drives with Raid5/6
OBviously I picked the wrong time to buy a bunch of SSDs. Proper 4TB SSDs run around $700, if not worse, so I went for low grade DRAM-less TLC or QLC drives off Ebay (still around $300 a piece). I figured with RAID6, it would not be so bad, and for one of my 2 arrays, write performance and many rewrites were not a concern. I also found out that the TeamGroup 4TB drives were a mix of TLC, QLC, with 3 different kinds of controllers and some were 4.09TB where others were just 4.00TB. Then I found out about discard/TRIM support and this:/dev/sdr * Deterministic read ZEROs after TRIM /dev/sdi * Deterministic read data after TRIMThe better, expensive drives guarantee RZAT, and the cheap ones are RDAT. The RDAT drives cannot support TRIM through raid5 or raid6 because raid requires that drives return 0 after TRIM so that parity works out later, and RDAT drivers do not give that guarantee, linux raid nicely detects that and turns off discard support. This however also means that after deleting data, there no way to mark that flash as free for the drives, you can trim or fstrei. The only sad thing with btrfs is that it does wear leveling of the underlying drives, which means over time all the SSD blocks get used, and there is no way to tell the drives what blocks are free, which is not ideal for QLC drives especially as they are quite slow to rewrite blocks when they don't have plenty of free space.
Knowing that, I made sure to build that array as a write once mostly, which will make the write penalty not as important.
My other array used for backups and lots of rewriting, I made use to use higher grade DRAM TLC Samsung and Micron enterprise drives I had laying around. I still had one drive in that array that didn't support RZAT but with those higher rate drives, not having TRIM was not as bad (they do a better job rewriting and do ok enough with their reserved space).
Stressing the rPi5 and the ASM1184e PCI switch
I then learned a bunch of the the limitations of PCI port switches like ASM1184e. Once I started using mine seriously, got a bunch of weird errors and disconnects until Gemini found that it's a known issue with them overheating under load. I just put an RC plane video chip radiator on the chip and now the radiator is hot and the chip seems to work reliably. Then I found out that my cheap teamgroup 4TB DRAM-less drives (the real TLC DRAM ones are now hovering between 6 to $700 a pop for 4TB ) are fine, until they stall during a big copy/btrfs scrub or whatever.When they stall, they eventually time out the PCI bus, which behind the quad PCI switche, causes the rPI to reset everything, and in the end this caused enough PCI mayhem that the sata cards were reset and 3 of the teamgroup drives crashed and failed to write what they had to a point that they were corrupted enough for linux to not be able to use their partitions anymore. Yes, a single drive stall caused a PCI timeout long enough to crash/reset the SATA controllers, which apparently managed to get the cheap teamgroup drives to corrupt the partition table blocks and have the blocks be unmapped and unreadable and unwritable:
nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[57247.067230] ata8.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[57247.076133] ata8: SError: { RecovData Handshk }
[57247.081246] ata8.00: failed command: READ DMA
[57247.086014] ata8.00: cmd c8/00:08:c8:03:5b/00:00:00:00:00/e1 tag 2 dma 4096 in
[57247.086014] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[57247.101469] ata8.00: status: { DRDY }
[57247.105822] ata8: hard resetting link
[57247.153797] nvme nvme0: 3/0/0 default/read/poll queues
[57247.587051] ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[57247.630423] ata8.00: supports DRM functions and may not be fully accessible
[57247.750869] ata8.00: supports DRM functions and may not be fully accessible
[57247.807025] ata8.00: configured for UDMA/133
[57247.811957] sd 7:0:0:0: [sdh] tag#2 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=32s
[57247.822653] sd 7:0:0:0: [sdh] tag#2 Sense Key : 0xb [current]
[57247.829121] sd 7:0:0:0: [sdh] tag#2 ASC=0x0 ASCQ=0x0
[57247.835477] sd 7:0:0:0: [sdh] tag#2 CDB: opcode=0x88 88 00 00 00 00 00 01 5b 03 c8 00 00 00 08 00 00
[57247.845243] I/O error, dev sdh, sector 22741960 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[57247.855511] ata8: EH complete
[57247.872535] ata8.00: Enabling discard_zeroes_data
[60367.285453] ata9.00: exception Emask 0x10 SAct 0x0 SErr 0x400100 action 0x6 frozen
[60367.293666] ata9.00: irq_stat 0x08000000, interface fatal error
[60367.300313] ata9: SError: { UnrecovData Handshk }
[60367.306530] ata9.00: failed command: WRITE DMA EXT
[60367.311966] ata9.00: cmd 35/00:00:78:8c:f7/00:05:1e:00:00/e0 tag 9 dma 655360 out
[60367.311966] res 50/00:00:ff:03:f7/00:00:1e:00:00/e0 Emask 0x10 (ATA bus error)
[60367.328871] ata9.00: status: { DRDY }
[60367.333036] ata9: hard resetting link
[60367.805496] ata9: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[60367.863205] ata9.00: configured for UDMA/133
[60367.868064] ata9: EH complete
[60397.357520] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[60397.453616] nvme nvme0: 3/0/0 default/read/poll queues
[60398.929509] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[60398.959616] ata1: SError: { RecovData Handshk }
[60398.966761] ata1.00: failed command: READ DMA
[60398.972859] ata1.00: cmd c8/00:08:78:b9:4a/00:00:00:00:00/e2 tag 22 dma 4096 in
[60398.972859] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[60398.990825] ata1.00: status: { DRDY }
[60398.995717] ata1: hard resetting link
[60399.473455] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[60399.541525] ata1.00: configured for UDMA/133
[60399.546657] sd 0:0:0:0: [sda] tag#22 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=DRIVER_OK cmd_age=32s
[60399.557577] sd 0:0:0:0: [sda] tag#22 Sense Key : 0xb [current]
[60399.564532] sd 0:0:0:0: [sda] tag#22 ASC=0x0 ASCQ=0x0
[60399.570665] sd 0:0:0:0: [sda] tag#22 CDB: opcode=0x88 88 00 00 00 00 00 02 4a b9 78 00 00 00 08 00 00
[60399.580758] I/O error, dev sda, sector 38451576 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 2
[60399.590585] ata1: EH complete
[60399.640204] ata1.00: Enabling discard_zeroes_data
[72688.943036] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[72688.951084] ata1: SError: { RecovData Handshk }
[72688.956422] ata1.00: failed command: WRITE DMA
[72688.961594] ata1.00: cmd ca/00:20:00:ac:82/00:00:00:00:00/e5 tag 14 dma 16384 out
[72688.961594] res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[72688.977731] ata1.00: status: { DRDY }
[72688.981969] ata1: hard resetting link
[72688.986211] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[72688.994663] ata2: SError: { RecovData Handshk }
[72688.999881] ata2.00: failed command: WRITE DMA
[72689.005000] ata2.00: cmd ca/00:20:c0:b0:82/00:00:00:00:00/e5 tag 19 dma 16384 out
[72689.005000] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[72689.022962] ata2.00: status: { DRDY }
[72689.027430] ata2: hard resetting link
[72689.499039] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[72689.506396] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[72689.611777] ata1.00: configured for UDMA/133
[72689.616890] ata1: EH complete
[72689.723181] ata2.00: configured for UDMA/133
[72689.728156] ata2: EH complete
[72689.865333] ata1.00: Enabling discard_zeroes_data
[72689.871277] ata2.00: Enabling discard_zeroes_data
[73227.538624] nvme nvme1: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[73227.640436] nvme nvme1: D3 entry latency set to 8 seconds
[73227.658550] nvme nvme1: 1/0/0 default/read/poll queues
[86766.334170] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[86766.442187] nvme nvme0: 3/0/0 default/read/poll queues
[86766.863105] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[86766.877356] ata6: SError: { RecovData Handshk }
[86766.884232] ata6.00: failed command: WRITE DMA
[86766.891103] ata6.00: cmd ca/00:80:18:95:b5/00:00:00:00:00/e6 tag 20 dma 65536 out
[86766.891103] res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[86766.908556] ata6.00: status: { DRDY }
[86766.914377] ata6: hard resetting link
[86766.919016] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[86766.930307] ata2: SError: { RecovData Handshk }
[86766.937937] ata2.00: failed command: READ DMA
[86766.943738] ata2.00: cmd c8/00:38:a0:e6:3b/00:00:00:00:00/e5 tag 4 dma 28672 in
[86766.943738] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[86766.965459] ata2.00: status: { DRDY }
[86766.970640] ata2: hard resetting link
[86766.976782] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x400001 action 0x6 frozen
[86766.989369] ata3: SError: { RecovData Handshk }
[86767.001777] ata3.00: failed command: WRITE DMA
[86767.010295] ata3.00: cmd ca/00:80:18:95:b5/00:00:00:00:00/e6 tag 21 dma 65536 out
[86767.010295] res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[86767.060215] ata3.00: status: { DRDY }
[86767.071409] ata3: hard resetting link
[86767.550253] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[86767.563271] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[86767.572715] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[86767.585598] ata6.00: supports DRM functions and may not be fully accessible
[86767.616959] ata6.00: supports DRM functions and may not be fully accessible
[86767.631980] ata3.00: configured for UDMA/133
[86767.639404] ata3: EH complete
[86767.643354] ata6.00: configured for UDMA/133
[86767.661059] ahci 0001:03:00.0: port does not support device sleep
[86767.663591] ata3.00: Enabling discard_zeroes_data
[86767.676336] ata6: EH complete
[86767.745871] ata2.00: configured for UDMA/133
[86767.754280] ata2: EH complete
[86767.772933] ata2.00: Enabling discard_zeroes_data
[95256.566913] nvme nvme0: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[95256.574928] nvme nvme1: controller is down; will reset: CSTS=0x3, PCI_STATUS=0x10
[95256.679475] nvme nvme1: D3 entry latency set to 8 seconds
[95256.689110] nvme nvme0: 2/0/0 default/read/poll queues
[95256.694718] nvme nvme1: 1/0/0 default/read/poll queues
[95256.697626] I/O error, dev nvme0n1, sector 264208 op 0x1:(WRITE) flags 0x29800 phys_seg 1 prio class 2
[95256.712397] I/O error, dev nvme0n1, sector 264208 op 0x1:(WRITE) flags 0x29800 phys_seg 1 prio class 2
[95256.722258] md: super_written gets error=-5
[95256.727133] md/raid1:md0: Disk failure on nvme0n1p2, disabling device.
[95256.727133] md/raid1:md0: Operation continuing on 1 devices.
[95256.742401] I/O error, dev nvme0n1, sector 77334752 op 0x1:(WRITE) flags 0x4000800 phys_seg 1 prio class 2
[95256.753375] BTRFS error (device nvme0n1p3): bdev /dev/nvme0n1p3 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0
[95256.764177] I/O error, dev nvme0n1, sector 77335776 op 0x1:(WRITE) flags 0x4000800 phys_seg 1 prio class 2
[95256.774805] BTRFS error (device nvme0n1p3): bdev /dev/nvme0n1p3 errs: wr 2, rd 1, flush 0, corrupt 0, gen 0
[97602.825969] ata6.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[97602.833948] ata6.00: failed command: WRITE DMA EXT
[97602.839911] ata6.00: cmd 35/00:00:78:5a:4c/00:04:09:00:00/e0 tag 22 dma 524288 out
[97602.839911] res 40/00:01:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[97602.858583] ata6.00: status: { DRDY }
[97602.863617] ata6: hard resetting link
[97603.337938] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[97603.346750] ata6.00: supports DRM functions and may not be fully accessible
[97603.370306] ata6.00: supports DRM functions and may not be fully accessible
[97603.430476] ata6.00: configured for UDMA/133
[97603.445466] ahci 0001:03:00.0: port does not support device sleep
[97603.452251] ata6: EH complete
[97637.643844] BTRFS warning (device dm-1): csum failed root 263 ino 3692950 off 386400256 csum 0xd04e5f48 expected csum 0x6b9afaa1 mirror 1
[97637.657936] BTRFS error (device dm-1): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[97638.110104] BTRFS warning (device dm-1): csum failed root 263 ino 3692950 off 386400256 csum 0xd04e5f48 expected csum 0x6b9afaa1 mirror 1
[97638.123856] BTRFS error (device dm-1): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
[97662.159091] BTRFS warning (device dm-1): csum failed root 263 ino 3692950 off 386400256 csum 0xd04e5f48 expected csum 0x6b9afaa1 mirror 1
[97662.173941] BTRFS error (device dm-1): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
[97662.906008] BTRFS warning (device dm-1): csum failed root 263 ino 3692950 off 386400256 csum 0xd04e5f48 expected csum 0x6b9afaa1 mirror 1
[97662.920993] BTRFS error (device dm-1): bdev /dev/mapper/dshelf2 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Recovering unusable DRAM-less Teamgroup drives
By then it was impossible to read or write to the 3 Teamgroup drives that failed, si I had to blkdiscard (TRIM) the entire 3 crashed drives (out of 7) to restart with 0's everywhere (which included full data loss of course), and start over.Gemini gave me linux kernel sata and PCI options to make it less likely for this to happen again, but it also warned me it very much can happen again and DRAM less drives should never be behind a PCI switch. At the same time, it became painfully obvious that the rPi5 has single lane PCI, and all those PCI switches are adding more channels while dividing the single lane bandwidth, making things slower and slower, was a bit of fool's errand.
By then, I had to admit defeat and since I wanted to run frigate for my cameras anyway, Gemini suggested I get an N355 based server which has an H264 and H265 ASIC for all those video streams (while rPi5 would have to do it in software), and at least 4 PCI lanes, which is much better (it's still 4x single lane M2 NVME, but at least 4 times faster and without a PCI switch to confuse things and cause full hangs if a single sata drive is freezing while writing its data)