Marc's Public Blog - Btrfs

I've been using btrfs since 2012, and while as of 2014, it's far from done, it's gone a long way in that time. I thought I'd post some tips and scripts from things I've learned through my own use and sharing with others, hence this page.
More generally, you'll find Btrfs documentation on the btrfs wiki.

Table of Content for btrfs:

More pages: March 2014 April 2014 May 2014 October 2014 March 2018

2014/05/04 Fixing Btrfs Filesystem Full Problems

π 2014-05-04 01:01 in Btrfs, Linux

Fixing Btrfs Filesystem Full Problems

Clear space now

If you have historical snapshots, the quickest way to get space back so that you can look at the filesystem and apply better fixes and cleanups is to drop the oldest historical snapshots.

Two things to note:

If you have historical snapshots as described here, delete the oldest ones first, and wait (see below). However if you just just deleted 100GB, and replaced it with another 100GB which failed to fully write, giving you out of space, all your snapshots will have to be deleted to clear the blocks of that old file you just removed to make space for the new one (actually if you know exactly what file it is, you can go in all your snapshots and manually delete it, but in the common case it'll be multiple files and you won't know which ones, so you'll have to drop all your snapshots before you get the space back).

After deleting snapshots, it can take a minute or more for btrfs fi show to show the space freed. Do not be too impatient, run btrfs fi show in a loop and see if the number changes every minute. If it does not, carry on and delete other snapshots or look at rebalancing.

Note that even in the cases described below, you may have to clear one snapshot or more to make space before btrfs balance can run. As a corollary, btrfs can get in states where it's hard to get it out of the 'no space' state it's in. As a result, even if you don't need snapshot, keeping at least one around to free up space should you hit that mis-feature/bug, can be handy

Is your filesystem really full? Mis-balanced metadata and/or data chunks

Below, you'll see how to rebalance data blocks and metadata, and you are unlucky enough to get a filesystem full error before you balance, try running this first:

legolas:~# btrfs balance start -musage=0 /mnt/btrfs_pool1 &
legolas:~# btrfs balance start -dusage=0 /mnt/btrfs_pool1 &

A null rebalance will help in some cases, if not read on.

Also, if you are really unlucky, you might get in a no more space error that requires adding a temporary block device to your filesystem to allow balance to run. See below for details.

Pre-emptively rebalancing your filesystem

In an ideal world, btrfs would do this for you, but it does not.
I personally recommend you do a rebalance weekly or nightly as part of of a btrfs scrub cron job. See the btrfs-scrub script.

Is your filesystem really full? Mis-balanced data chunks

Look at filesystem show output:

legolas:~# btrfs fi show
Label: btrfs_pool1  uuid: 4850ee22-bf32-4131-a841-02abdb4a5ba6
	Total devices 1 FS bytes used 441.69GiB
	devid    1 size 865.01GiB used 751.04GiB path /dev/mapper/cryptroot

Only about 50% of the space is used (441 out of 865GB), but the device is 88% full (751 out of 865MB). Unfortunately it's not uncommon for a btrfs device to fill up due to the fact that it does not rebalance chunks (3.18+ has started freeing empty chunks, which is a step in the right direction).

In the case above, because the filesystem is only 55% full, I can ask balance to rewrite all chunks that have less than 55% space used. Rebalancing those blocks actually means taking the data in those blocks, and putting it in fuller blocks so that you end up being able to free the less used blocks.
This means the bigger the -dusage value, the more work balance will have to do (i.e. taking fuller and fuller blocks and trying to free them up by putting their data elsewhere). Also, if your FS is 55% full, using -dusage=55 is ok, but there isn't a 1 to 1 correlation and you'll likely be ok with a smaller dusage number, so start small and ramp up as needed.


legolas:~# btrfs balance start -dusage=55 /mnt/btrfs_pool1 &

# Follow the progress along with:
legolas:~# while :; do btrfs balance status -v /mnt/btrfs_pool1; sleep 60; done
Balance on '/mnt/btrfs_pool1' is running
10 out of about 315 chunks balanced (22 considered),  97% left
Dumping filters: flags 0x1, state 0x1, force is off
  DATA (flags 0x2): balancing, usage=55
Balance on '/mnt/btrfs_pool1' is running
16 out of about 315 chunks balanced (28 considered),  95% left
Dumping filters: flags 0x1, state 0x1, force is off
  DATA (flags 0x2): balancing, usage=55
(...)

When it's over, the filesystem now looks like this (note devid used is now 513GB instead of 751GB):

legolas:~# btrfs fi show
Label: btrfs_pool1  uuid: 4850ee22-bf32-4131-a841-02abdb4a5ba6
	Total devices 1 FS bytes used 441.64GiB
	devid    1 size 865.01GiB used 513.04GiB path /dev/mapper/cryptroot

Before you ask, yes, btrfs should do this for you on its own, but currently doesn't as of 3.14.

Is your filesystem really full? Misbalanced metadata

Unfortunately btrfs has another failure case where the metadata space can fill up. When this happens, even though you have data space left, no new files will be writeable.

In the example below, you can see Metadata DUP 9.5GB out of 10GB. Btrfs keeps 0.5GB for itself, so in the case above, metadata is full and prevents new writes.

One suggested way is to force a full rebalance, and in the example below you can see metadata goes back down to 7.39GB after it's done. Yes, there again, it would be nice if btrfs did this on its own. It will one day (some if it is now in 3.18).

Sometimes, just using -dusage=0 is enough to rebalance metadata (this is now done automatically in 3.18 and above), but if it's not enough, you'll have to increase the number.


legolas:/mnt/btrfs_pool2# btrfs fi df .
Data, single: total=800.42GiB, used=636.91GiB
System, DUP: total=8.00MiB, used=92.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=10.00GiB, used=9.50GiB
Metadata, single: total=8.00MiB, used=0.00

legolas:/mnt/btrfs_pool2# btrfs balance start -v -dusage=0 /mnt/btrfs_pool2
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
  Done, had to relocate 91 out of 823 chunks

legolas:/mnt/btrfs_pool2# btrfs fi df .
Data, single: total=709.01GiB, used=603.85GiB
System, DUP: total=8.00MiB, used=88.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=10.00GiB, used=7.39GiB
Metadata, single: total=8.00MiB, used=0.00

Balance cannot run because the filesystem is full

If a null rebalance (-musage=0 and then -dusage=0 explained above) doesn't work, one last trick to get around this is to add a device (even a USB key will do) to your btrfs filesystem. This should allow balance to start, and then you can remove the device with btrfs device delete when the balance is finished.

Note, it's even possible for a filesystem to be full enough in a way that you cannot even delete snapshots to free space. This shows how you would work around it:

root@polgara:/mnt/btrfs_pool2# btrfs fi df . Data, single: total=159.67GiB, used=80.33GiB System, single: total=4.00MiB, used=24.00KiB Metadata, single: total=8.01GiB, used=7.51GiB <<<< BAD root@polgara:/mnt/btrfs_pool2# btrfs balance start -v -dusage=0 /mnt/btrfs_pool2 Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=0 Done, had to relocate 0 out of 170 chunks root@polgara:/mnt/btrfs_pool2# btrfs balance start -v -dusage=1 /mnt/btrfs_pool2 Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=1 ERROR: error during balancing '/mnt/btrfs_pool2' - No space left on device There may be more info in syslog - try dmesg | tail root@polgara:/mnt/btrfs_pool2# dd if=/dev/zero of=/var/tmp/btrfs bs=1G count=5 5+0 records in 5+0 records out 5368709120 bytes (5.4 GB) copied, 7.68099 s, 699 MB/s root@polgara:/mnt/btrfs_pool2# losetup -v -f /var/tmp/btrfs Loop device is /dev/loop0 root@polgara:/mnt/btrfs_pool2# btrfs device add /dev/loop0 . Performing full device TRIM (5.00GiB) ...

# optional step if you have snapshots to delete, if not try the balance below root@polgara:/mnt/btrfs_pool2# btrfs subvolume delete space2_daily_20140603_00:05:01 Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140603_00:05:01' root@polgara:/mnt/btrfs_pool2# for i in *daily*; do btrfs subvolume delete $i; done Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140604_00:05:01' Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140605_00:05:01' Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140606_00:05:01' Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140607_00:05:01' Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140608_00:05:01' Delete subvolume '/mnt/btrfs_pool2/space2_daily_20140609_00:05:01'

root@polgara:/mnt/btrfs_pool2# btrfs balance start -v -dusage=1 /mnt/btrfs_pool2 Dumping filters: flags 0x1, state 0x0, force is off DATA (flags 0x2): balancing, usage=1 Done, had to relocate 5 out of 169 chunks root@polgara:/mnt/btrfs_pool2# btrfs device delete /dev/loop0 .

root@polgara:/mnt/btrfs_pool2# btrfs fi df . Data, single: total=154.01GiB, used=80.06GiB System, single: total=4.00MiB, used=28.00KiB Metadata, single: total=8.01GiB, used=4.88GiB <<< GOOD

Misc Balance Resources

For more info, please read:

https://btrfs.wiki.kernel.org/index.php/FAQ#Raw_disk_usage

https://btrfs.wiki.kernel.org/index.php/Balance_Filters

2014/05/19 Btrfs-diff Between Snapshots

π 2014-05-19 01:01 in Btrfs, Linux

Differences between two btrfs snapshots

When you have historical snapshots, it may be useful to know what changed between 2 snapshots.

The best way to do this long term is to modify "btrfs send" to compute changes between the snapshots and just output the filelist instead of a stream with data.

However, until then, there is a hack that shows you files that got added and removed between two snapshots. It's not bulletproof like btrfs send, but it can give you a quick mostly working diff between two snapshots (*it will not show renames or deletes*). See more caveats on this original serverfault post.

legolas:/mnt/btrfs_pool1# btrfs-diff usr_ro.20140513_05:00:01/ usr_ro.20140514_06:00:02/ share/doc/linux-image-3.15.0-rc5-amd64-i915-preempt-20140216s1/buildinfo.gz share/doc/linux-image-3.15.0-rc5-amd64-i915-preempt-20140216s1/Buildinfo.gz share/doc/linux-image-3.15.0-rc5-amd64-i915-preempt-20140216s1/changelog.Debian.gz share/doc/linux-image-3.15.0-rc5-amd64-i915-preempt-20140216s1/Changes.gz (...)

You can download my latest snapshot of btrfs-diff. Note that I am not the author, it was copied from this serverfault post.

#!/bin/bash

# Author: http://serverfault.com/users/96883/artfulrobot
# License: Unknown
#
# This script will show most files that got modified or added.
# Renames and deletions will not be shown.
# Read limitations on:
# http://serverfault.com/questions/399894/does-btrfs-have-an-efficient-way-to-compare-snapshots
# 
# btrfs send is the best way to do this long term, but as of kernel
# 3.14, btrfs send cannot just send a list of changed files without
# scanning and sending all the changed data blocks along.

usage() { echo $@ >&2; echo "Usage: $0 <older-snapshot> <newer-snapshot>" >&2; exit 1; }

[ $# -eq 2 ] || usage "Incorrect invocation";
SNAPSHOT_OLD=$1;
SNAPSHOT_NEW=$2;

[ -d $SNAPSHOT_OLD ] || usage "$SNAPSHOT_OLD does not exist";
[ -d $SNAPSHOT_NEW ] || usage "$SNAPSHOT_NEW does not exist";

OLD_TRANSID=`btrfs subvolume find-new "$SNAPSHOT_OLD" 9999999`
OLD_TRANSID=${OLD_TRANSID#transid marker was }
[ -n "$OLD_TRANSID" -a "$OLD_TRANSID" -gt 0 ] || usage "Failed to find generation for $SNAPSHOT_NEW"

btrfs subvolume find-new "$SNAPSHOT_NEW" $OLD_TRANSID | sed '$d' | cut -f17- -d' ' | sort | uniq

2014/05/20 Historical Snapshots of Backups With Btrfs

π 2014-05-20 01:01 in Btrfs, Linux

How to manage historical snapshots of backups with Btrfs

I have a setup where I backup a certain number of machines to a central server. There are multiple ways to do hierarchical backups with btrfs.

snapshots and rsync on top

http://marc.merlins.org/linux/talks/Btrfs-LC2014-JP/html/img33.html

not great because of COW relationship lost to unix tools,

not great because backing up server on another one requires a lot more snapshots of snapshots for btrfs send of old things that never change

works for deduping data that partially changes or changes owners

cp -a --link + rsync

http://marc.merlins.org/linux/talks/Btrfs-LC2014-JP/html/img34.html

newer btrfs should be ok with many hardlinks

du can figure out the data saved

you can use hardlinks.py instead of bedup

can be transferred via cp/rsync without losing links

but hardlinks will not work across subvolumes

cp -a --reflink + rsync

http://marc.merlins.org/linux/talks/Btrfs-LC2014-JP/html/img35.html

very nice, does not require hardlinks which you can't do across subvolumes

works for deduping data that partially changes or changes owners

but totally lost as soon as you copy it anywhere else (unless you use btrfs send)

2014/05/21 My Btrfs Talk at Linuxcon JP 2014

π 2014-05-21 01:01 in Btrfs, Linux

This all started with my trying Btrfs after Avi Miller's talk at Linux.conf.au. I started asking some questions on the mailing list, and since the wiki wasn't up to date on them, I ddi the obvious thing to start updating the Main Btrfs Wiki here and there.

After I had learned a fair amount about Btrfs already I felt others would benefit from a introduction to it, as well as best practises and what features it offers that would make you want to switch, I ended up submitting a talk to Linuxcon. In the process of writing the talk, I learned even more about Btrfs, and wrote some of them on my Btrfs blog while linking or putting the other relevant ones on the Main Btrfs Wiki.

The fancy talk description is here:

The presentation will give you everything you know to get up to speed with btfrs, why you should want to trust your data to btrfs, how it offers a lot of what ZFS offers without the licensing problems, as well as best practices for using it.
I will go into:
the basics of administration of a btrfs filesystem

How btrfs, swraid, dmcrypt, and lvm fit or don't fit together

how to work with a single storage pool and create all your partitions from it without having to ever resize them, or require LVM as a slow and somewhat unreliable block layer.

how to have virtually as many snapshots as you want and why you really want this

how to do very efficient block level backups of changes only and much faster than rsync ever will

how those block backups can be used to deploy OS upgrades at the file level like I explained in my talk on how Google maintains its many servers last year.

The video:

The quality isn't great due to poor lighting, sorry about this, but you get the talk slides here or view them inline below.
For easier to click on links, you may prefer the pdf version or the libreoffice version.

> To click on URLs in the presentation, click on 'with contents' in the upper left, and use those links (open to a new tab)

See more images for My Btrfs Talk at Linuxcon JP 2014