How to use Btrfs raid5/6
Since I didn't find good documentation of where Btrfs raid5/raid6 was at, I did some tests, and with some help from list members, can write this page now.
This is as of kernel 3.14 with btrfs-tools 3.12. If your are using a kernel and especially tools older than that, there are good chances things will work less well.
Btrfs raid5/6 in a nutshell
It is important to know that raid5/raid6 is more experimental than btrfs itself is. Do not use this for production systems, or if you do and things break, you were warned :)
If you're coming from the mdadm raid5 world, here's what you need to know:
polgara:~# btrfs fi show Label: backupcopy uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1 Total devices 11 FS bytes used 564.54GiB devid 1 size 465.76GiB used 63.14GiB path /dev/dm-0 devid 2 size 465.76GiB used 63.14GiB path /dev/dm-1 devid 3 size 465.75GiB used 30.00GiB path <- this device is missing devid 4 size 465.76GiB used 63.14GiB path /dev/dm-2 devid 5 size 465.76GiB used 63.14GiB path /dev/dm-3 devid 6 size 465.76GiB used 63.14GiB path /dev/dm-4 devid 7 size 465.76GiB used 63.14GiB path /dev/mapper/crypt_sdi1 devid 8 size 465.76GiB used 63.14GiB path /dev/mapper/crypt_sdj1 devid 9 size 465.76GiB used 63.14GiB path /dev/dm-7 devid 10 size 465.76GiB used 63.14GiB path /dev/dm-8 devid 11 size 465.76GiB used 33.14GiB path /dev/mapper/crypt_sde1 <- this device was added
Create a raid5 array
polgara:/dev/disk/by-id# mkfs.btrfs -f -d raid5 -m raid5 -L backupcopy /dev/mapper/crypt_sd[bdfghijkl]1WARNING! - Btrfs v3.12 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using
Turning ON incompat feature 'extref': increased hardlink limit per file to 65536 Turning ON incompat feature 'raid56': raid56 extended format adding device /dev/mapper/crypt_sdd1 id 2 adding device /dev/mapper/crypt_sdf1 id 3 adding device /dev/mapper/crypt_sdg1 id 4 adding device /dev/mapper/crypt_sdh1 id 5 adding device /dev/mapper/crypt_sdi1 id 6 adding device /dev/mapper/crypt_sdj1 id 7 adding device /dev/mapper/crypt_sdk1 id 8 adding device /dev/mapper/crypt_sdl1 id 9 fs created label backupcopy on /dev/mapper/crypt_sdb1 nodesize 16384 leafsize 16384 sectorsize 4096 size 4.09TiB polgara:/dev/disk/by-id# mount -L backupcopy /mnt/btrfs_backupcopy
polgara:/mnt/btrfs_backupcopy# df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/crypt_sdb1 4.1T 3.0M 4.1T 1% /mnt/btrfs_backupcopy
As another example, you could use -d raid5 -m raid1 to have metadata be raid1 while data being raid5. This specific example isn't actually that useful, but just giving it as an example.
Replacing a drive that hasn't failed yet on a running raid5 array
btrfs replace does not work:
polgara:/mnt/btrfs_backupcopy# btrfs replace start -r /dev/mapper/crypt_sem1 /dev/mapper/crypt_sdm1 . Mar 23 14:56:06 polgara kernel: [53501.511493] BTRFS warning (device dm-9): dev_replace cannot yet handle RAID5/RAID6
No big deal, this can be done in 2 steps:
polgara:/mnt/btrfs_backupcopy# btrfs device add -f /dev/mapper/crypt_sdm1 . polgara:/mnt/btrfs_backupcopy# btrfs fi show Label: backupcopy uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1 Total devices 11 FS bytes used 114.35GiB devid 1 size 465.76GiB used 32.14GiB path /dev/dm-0 devid 2 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdd1 devid 4 size 465.76GiB used 32.14GiB path /dev/dm-2 devid 5 size 465.76GiB used 32.14GiB path /dev/dm-3 devid 6 size 465.76GiB used 32.14GiB path /dev/dm-4 devid 7 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdi1 devid 8 size 465.76GiB used 32.14GiB path /dev/dm-6 devid 9 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdk1 devid 10 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdl1 devid 11 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sde1 devid 12 size 465.75GiB used 0.00 path /dev/mapper/crypt_sdm1
polgara:/mnt/btrfs_backupcopy# btrfs device delete /dev/mapper/crypt_sde1 . Mar 23 11:13:31 polgara kernel: [40145.908207] BTRFS info (device dm-9): relocating block group 945203314688 flags 129 Mar 23 14:51:51 polgara kernel: [53245.955444] BTRFS info (device dm-9): found 5576 extents Mar 23 14:51:57 polgara kernel: [53251.874925] BTRFS info (device dm-9): found 5576 extents polgara:/mnt/btrfs_backupcopy#
Note that this is slow, 3.5h for just 115GB of data. It could take days for a terabyte array.
polgara:/mnt/btrfs_backupcopy# btrfs fi show Label: backupcopy uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1 Total devices 10 FS bytes used 114.35GiB devid 1 size 465.76GiB used 13.14GiB path /dev/dm-0 devid 2 size 465.76GiB used 13.14GiB path /dev/mapper/crypt_sdd1 devid 4 size 465.76GiB used 13.14GiB path /dev/dm-2 devid 5 size 465.76GiB used 13.14GiB path /dev/dm-3 devid 6 size 465.76GiB used 13.14GiB path /dev/dm-4 devid 7 size 465.76GiB used 13.14GiB path /dev/mapper/crypt_sdi1 devid 8 size 465.76GiB used 13.14GiB path /dev/dm-6 devid 9 size 465.76GiB used 13.14GiB path /dev/mapper/crypt_sdk1 devid 10 size 465.76GiB used 13.14GiB path /dev/mapper/crypt_sdl1 devid 12 size 465.75GiB used 13.14GiB path /dev/mapper/crypt_sdm1
There we go, I'm back on 10 devices, almost as good as a btrfs replace, it simply took 2 steps
Replacing a missing drive on a running raid5 array
Normal mount will not work:
polgara:~# mount -v -t btrfs -o compress=zlib,space_cache,noatime LABEL=backupcopy /mnt/btrfs_backupcopy mount: wrong fs type, bad option, bad superblock on /dev/mapper/crypt_sdj1, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so Mar 21 22:29:45 polgara kernel: [ 2288.285068] BTRFS info (device dm-8): disk space caching is enabled Mar 21 22:29:45 polgara kernel: [ 2288.285369] BTRFS: failed to read the system array on dm-8 Mar 21 22:29:45 polgara kernel: [ 2288.316067] BTRFS: open_ctree failedSo we do a mount with -o degraded polgara:~# mount -v -t btrfs -o compress=zlib,space_cache,noatime,degraded LABEL=backupcopy /mnt/btrfs_backupcopy /dev/mapper/crypt_sdj1 on /mnt/btrfs_backupcopy type btrfs (rw,noatime,compress=zlib,space_cache,degraded) Mar 21 22:29:51 polgara kernel: [ 2295.042421] BTRFS: device label backupcopy devid 8 transid 3446 /dev/mapper/crypt_sdj1 Mar 21 22:29:51 polgara kernel: [ 2295.065951] BTRFS info (device dm-8): allowing degraded mounts Mar 21 22:29:51 polgara kernel: [ 2295.065955] BTRFS info (device dm-8): disk space caching is enabled Mar 21 22:30:32 polgara kernel: [ 2336.189000] BTRFS: device label backupcopy devid 3 transid 8 /dev/dm-9 Mar 21 22:30:32 polgara kernel: [ 2336.203175] BTRFS: device label backupcopy devid 3 transid 8 /dev/dm-9
Then we add the new drive:
polgara:/mnt/btrfs_backupcopy# btrfs device add -f /dev/mapper/crypt_sde1 . polgara:/mnt/btrfs_backupcopy# df . /dev/dm-0 5.1T 565G 4.0T 13% /mnt/btrfs_backupcopy < bad, it should be 4.5T, but I get space for 11 drives
https://btrfs.wiki.kernel.org/index.php/FAQ#What_does_.22balance.22_do.3F says:
"On a filesystem with damaged replication (e.g. a RAID-1 FS with a dead and removed disk), it will force the FS to rebuild the missing copy of the data on one of the currently active devices, restoring the RAID-1 capability of the filesystem."
See also: https://btrfs.wiki.kernel.org/index.php/Balance_Filters
If we have written data since the drive was removed, or if we are recovering from a unfinished balance, doing a filter on devid=3 tells balance to only rewrite data and metadata that has a chunk on missing device #3 (this is a good way to finish the balance in multiple passes if you have to reboot in between, or the filesystem deadlocks during a balance, which unfortunately is still common as of kernel 3.14.
polgara:/mnt/btrfs_backupcopy# btrfs balance start -ddevid=3 -mdevid=3 -v . Mar 22 13:15:55 polgara kernel: [20275.690827] BTRFS info (device dm-9): relocating block group 941277446144 flags 130 Mar 22 13:15:56 polgara kernel: [20276.604760] BTRFS info (device dm-9): relocating block group 940069486592 flags 132 Mar 22 13:19:27 polgara kernel: [20487.196844] BTRFS info (device dm-9): found 52417 extents Mar 22 13:19:28 polgara kernel: [20488.056749] BTRFS info (device dm-9): relocating block group 938861527040 flags 132 Mar 22 13:22:41 polgara kernel: [20681.588762] BTRFS info (device dm-9): found 70146 extents Mar 22 13:22:42 polgara kernel: [20682.380957] BTRFS info (device dm-9): relocating block group 937653567488 flags 132 Mar 22 13:26:12 polgara kernel: [20892.816204] BTRFS info (device dm-9): found 71497 extents Mar 22 13:26:14 polgara kernel: [20894.819258] BTRFS info (device dm-9): relocating block group 927989891072 flags 129
As balancing happens, data is taken out of devid3, the one missing, and added to devid11 (the one added):
polgara:~# btrfs fi show Label: backupcopy uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1 Total devices 11 FS bytes used 564.54GiB devid 1 size 465.76GiB used 63.14GiB path /dev/dm-0 devid 2 size 465.76GiB used 63.14GiB path /dev/dm-1 devid 3 size 465.75GiB used 30.00GiB path <- this device is missing devid 4 size 465.76GiB used 63.14GiB path /dev/dm-2 devid 5 size 465.76GiB used 63.14GiB path /dev/dm-3 devid 6 size 465.76GiB used 63.14GiB path /dev/dm-4 devid 7 size 465.76GiB used 63.14GiB path /dev/mapper/crypt_sdi1 devid 8 size 465.76GiB used 63.14GiB path /dev/mapper/crypt_sdj1 devid 9 size 465.76GiB used 63.14GiB path /dev/dm-7 devid 10 size 465.76GiB used 63.14GiB path /dev/dm-8 devid 11 size 465.76GiB used 33.14GiB path /dev/mapper/crypt_sde1 <- this device was added
You can see status with:
polgara:/mnt/btrfs_backupcopy# while : > do > btrfs balance status . > sleep 60 1 out of about 72 chunks balanced (2 considered), 99% left 2 out of about 72 chunks balanced (3 considered), 97% left 3 out of about 72 chunks balanced (4 considered), 96% left
At the end (and this can take hours to days), you get:
polgara:/mnt/btrfs_backupcopy# btrfs fi show Label: backupcopy uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1 Total devices 11 FS bytes used 114.35GiB devid 1 size 465.76GiB used 32.14GiB path /dev/dm-0 devid 2 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdd1 devid 3 size 465.75GiB used 0.00 path <---- drive is freed up now. devid 4 size 465.76GiB used 32.14GiB path /dev/dm-2 devid 5 size 465.76GiB used 32.14GiB path /dev/dm-3 devid 6 size 465.76GiB used 32.14GiB path /dev/dm-4 devid 7 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdi1 devid 8 size 465.76GiB used 32.14GiB path /dev/dm-6 devid 9 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdk1 devid 10 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdl1 devid 11 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sde1 Btrfs v3.12
But the array still shows 11 drives with one missing and will not mount without -o degraded.
You do this with:
polgara:/mnt/btrfs_backupcopy# btrfs device delete missing . polgara:/mnt/btrfs_backupcopy# btrfs fi show Label: backupcopy uuid: eed9b55c-1d5a-40bf-a032-1be6980648e1 Total devices 10 FS bytes used 114.35GiB devid 1 size 465.76GiB used 32.14GiB path /dev/dm-0 devid 2 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdd1 devid 4 size 465.76GiB used 32.14GiB path /dev/dm-2 devid 5 size 465.76GiB used 32.14GiB path /dev/dm-3 devid 6 size 465.76GiB used 32.14GiB path /dev/dm-4 devid 7 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdi1 devid 8 size 465.76GiB used 32.14GiB path /dev/dm-6 devid 9 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdk1 devid 10 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sdl1 devid 11 size 465.76GiB used 32.14GiB path /dev/mapper/crypt_sde1
And there we go, we're back in business!
From the above, you've also learned how to grow a raid5 array (add a drive, run balance), or remove a drive (just run btrfs device delete and the auto balance will restripe your entire array for n-1 drives).