Marc's Public Blog - Linux Hacking


All | Aquariums | Arduino | Btrfs | Cars | Cats | Clubbing | Computers | Dining | Diving | Electronics | Exercising | Festivals | Flying | Halloween | Hbot | Hiking | Linux | Linuxha | Mexico | Monuments | Museums | Outings | Public | Rc | Sciencemuseums | Solar | Tfsf | Trips

This page has a few of my blog entries about linux, but my main linux page is here
Picture of Linus

Here is a list of older linux event reports I made before my blog was started, then the rest are below
1996/11/18-21:Linux Pavillion Comdex Fall 1996 (photos only). I've been going since then to help at the linux pavillion.
1997/11/18-21: Linux Pavillion Comdex Fall 1997 (photos only)
1998/05/28-30: Linuxexpo 1998 (photos only)
1998/11/16-20: Linux Pavillion Comdex Fall 1998 (full report)
1998/11/11: Silicon Valley Tea Party (report with pictures)
1999/02/15: Windows Refund Day (report with pictures)
1999/03/20: SVLUG KTEH night (photos only)
1999/03/01-04: LinuxWorld Expo Winter 99 (complete report with many pictures)
1999/03/31: Mozilla Party one year anniversary (photos only)
1999/05/18-22: Linuxexpo 1999 (complete report with many pictures)
1999/06/07: June 99 Balug meeting with Linus
1999/08/09-12: LinuxWorld Expo Summer 99 (complete report with many pictures)
1999/11/15-19: Linux Business Show at Comdex Fall 1999 (full report with pictures)
2000/08/14-17: LinuxWorld Expo Summer 2000 (complete report with many pictures)
2001/01/17-20: Linux.conf.au/LCA 2001 (complete report with pictures)
2001/07/25-28: OLS 2001 (photos only)
2001/08/25: Linux 10th Anniversary (report with pictures)
2001/09/27-30: LinuxWorld Expo Summer 2001 report with pictures)
2001/11/05-10: ALS 2001 (photos only)
2002/06/26-29: OLS 2002 (photos only)
2003/01/20-25: LCA 2003 (photos only)
2003/07/23-26: OLS 2003 (photos only)
2004/01/12-17: LCA 2004 (photos only)
2004/07/21-24: OLS 2004 (photos only)
2005/04/18-23: LCA 2005 (photos only)
2006/01/24-28: LCA 2006 (photos only)
2007/01/17-21: LCA 2007 (photos only)

Here is a list of all the talks I've given:

And below are my blog posts:



>>> Back to post index <<<

2012/05/01 Handy tip to save on inodes and disk space: finddupes, fdupes, and hardlink.py
π 2012-05-01 01:01 in Linux
I've been rsyncing my linux machines on my disk server for the last 10 years, and while I've tried to save space by using the trick below, clearly it hadn't applied carefully everywhere, and it didn't consolidate files across backups from multiple servers.

For a single server, the trick to keep snapshots in history of your server backup without losing a lot of space, is to rsync to directory current and cp -al current oldbackup_20120501. This allows rsyncing to current, and keep oldbackup made out of hardlinks until current changes to something different.

While this served me well, turns out it wasn't perfect, there were some admin errors in the past, and duplicates across different servers backed up. So, I looked for dupe finders so that I can re-hardlink identical files after the fact.
The first thing I quickly found was that comparing all files with the same size was going to be way way too slow, so I had to limit the deduping to files that had different names, or the pool of files to dedupe would just be way too big.

  • apt-get install fdupes: has lots of options for recursive scanning, can delete, hardlink, or even symlink. I could not find how to tell it to only compare files with the same names.
  • http://code.google.com/p/hardlinkpy/ : it's in python, but it actually runs faster than fdupes for me, and has useful options to work on huge trees: hardlink.py -c -f -x options.txt -x Makefile dir1 dir2 dir3. Its one flaw right now is that it runs out of RAM on my 4GB system when run on 27 million files. To save on time for deduping system backups, it's useful to tell hardlinks.py to only compare files with the same name.
  • http://www.pixelbeat.org/fslint/ : I didn't try this one but it looked nice when you need a GUI.
  • http://svn.red-bean.com/bbum/trunk/hacques/dupinator.py : is a simple python script you can hack on if you just need to find dupes and act on them.
  • apt-get install hardlink (yes, another one). hardlink -v -f -p -t -x options.txt -x Makefile dir1 dir2. Mmmh, that one took so much memory on my 4GB server that within 20mn it was swapping hard.
  • hardlinks.py is my favourite for now, over several days of runs (afterall, there are many files to scan/compare), I've already saved 5,646,995 files and about 300GB, not bad :)


    More pages: February 2004 March 2004 November 2004 April 2005 August 2005 January 2006 July 2006 August 2007 November 2007 January 2008 October 2008 November 2008 December 2008 January 2009 May 2009 July 2009 August 2009 September 2009 November 2009 December 2009 January 2010 March 2010 April 2010 June 2010 August 2010 October 2010 January 2011 July 2011 August 2011 December 2011 January 2012 March 2012 May 2012 August 2012 December 2012 January 2013 March 2013 May 2013 September 2013 November 2013 January 2014 March 2014 April 2014 May 2014 October 2014 January 2015 March 2015 May 2015 January 2016 February 2016 March 2016 June 2016 July 2016 August 2016 October 2016 January 2017 September 2017 January 2018 March 2018 December 2018 January 2019 January 2020 May 2020 September 2021 March 2023 April 2023

    >>> Back to post index <<<

    Contact Email