Day 2: Conference: Tux2 Failsafe Filesystem

Daniel Phillips told us about his Tux2 filesystem.

[picture]

Daniel came up with the idea 12 years ago but didn't get to implement it on linux until recently. In the meantime, Network Appliance came up with a similar idea which they use in their filers with WAFL

Tux2 is a failsafe like journalling filesystem but it doesn't have a journal, no recovery procedure is needed as there is no journal to replay.

There has been other approaches to solving this problem: journalling, soft updates, and logging filesystems.
There are other atomic updates algorithms with shadow blocks on disks which are switched all at once, like Auragen and WAFL

While Netapp patented in WAFL an idea that had been discovered independently 5 years earlier, Tux2 uses a different algorithm, so it shouldn't have any patent problems. Daniel is however against Netapp's patent as he thinks it is invalid due to his prior art.

In Tux2, writes are done in a new tree, which is switched to with an atomic block write. Basically changes are written to new blocks, and the parent is updated to point to the new blocks atomically.
Blocks to be freed are put in a defree list and are only freed when the atomic switch has actually been done.
Daniel explained that he only needed to do minor edits to the ext2 code to get Tux2 working. One difference is that compared to ext2 where metadata and data are put in different places in groups of blocks, in Tux2, they are mixed by design and this offers a little extra performance in the process.
The other good news is that you allocate from a common pool for both data and metadata but it makes fsck more complex because it has to find the inodes (in theory, you don't need fsck, but your filesystem could always break for whatever reason like hardware failure)

Problems with Tux2 are

Extra metadata writes, but they shring to an extra 1% as the phase gets long
Fragmentation, and block permutation in a single file, but they can be worked around somewhat and they're not as fatal as they sound

What's nice in this scheme is that you get checkpoints for free, just like Netapps, and you can therefore keep prior versions of your filesystem which greatly facilitates backups and file restores

You can find more details in Daniel's slides in PDF format and you can also find other documentation in text and html on his web page

Picture library

Back to Main Page

Email
Link to Home Page

2001/01/28 (12:44): Version 1.0