Marc's Public Blog - Linux

This page has a few of my blog entries about linux, but my main linux page is here
Picture of Linus

Here is a list of older linux event reports I made before my blog was started, then the rest are below

1996/11/18-21: Linux Pavillion Comdex Fall 1996 (photos only). I've been going since then to help at the linux pavillion.

1997/11/18-21: Linux Pavillion Comdex Fall 1997 (photos only)

1998/05/28-30: Linuxexpo 1998 (photos only)

1998/11/16-20: Linux Pavillion Comdex Fall 1998 (full report)

1998/11/11: Silicon Valley Tea Party (report with pictures)

1999/02/15: Windows Refund Day (report with pictures)

1999/03/20: SVLUG KTEH night (photos only)

1999/03/01-04: LinuxWorld Expo Winter 99 (complete report with many pictures)

1999/03/31: Mozilla Party one year anniversary (photos only)

1999/05/18-22: Linuxexpo 1999 (complete report with many pictures)

1999/06/07: June 99 Balug meeting with Linus

1999/08/09-12: LinuxWorld Expo Summer 99 (complete report with many pictures)

1999/11/15-19: Linux Business Show at Comdex Fall 1999 (full report with pictures)

2000/08/14-17: LinuxWorld Expo Summer 2000 (complete report with many pictures)

2001/01/17-20: Linux.conf.au/LCA 2001 (complete report with pictures)

2001/07/25-28: OLS 2001 (photos only)

2001/08/25: Linux 10th Anniversary (report with pictures)

2001/09/27-30: LinuxWorld Expo Summer 2001 report with pictures)

2001/11/05-10: ALS 2001 (photos only)

2002/06/26-29: OLS 2002 (photos only)

2003/01/20-25: LCA 2003 (photos only)

2003/07/23-26: OLS 2003 (photos only)

2004/01/12-17: LCA 2004 (photos only)

2004/07/21-24: OLS 2004 (photos only)

2005/04/18-23: LCA 2005 (photos only)

2006/01/24-28: LCA 2006 (photos only)

2007/01/17-21: LCA 2007 (photos only)

Here is a list of all the talks I've given:

And below are my blog posts:

Table of Content for linux:

More pages: July 2002 February 2004 March 2004 November 2004 April 2005 August 2005 January 2006 July 2006 August 2007 November 2007 January 2008 October 2008 November 2008 December 2008 January 2009 May 2009 July 2009 August 2009 September 2009 November 2009 December 2009 January 2010 March 2010 April 2010 June 2010 August 2010 October 2010 January 2011 July 2011 August 2011 December 2011 January 2012 March 2012 May 2012 August 2012 December 2012 January 2013 March 2013 May 2013 September 2013 November 2013 January 2014 March 2014 April 2014 May 2014 October 2014 January 2015 March 2015 May 2015 January 2016 February 2016 June 2016 July 2016 August 2016 October 2016 January 2017 September 2017 January 2018 March 2018 December 2018 January 2019 January 2020 May 2020 January 2021 September 2021 March 2023 April 2023 December 2023 June 2024 November 2024

2008/11/12 MythTVs

π 2008-11-12 01:01 in Linux

So, since we have two TVs and rooms to watch them, I figured it would make sense to have two MythTVs when I only had one. My other motivation was that my current MythTV was getting a bit old and was unable to play 1080p content encoded in H264.
The solution was simple: just build a second mythtv box, move my main mythtv setup to the new hardware, make the old hardware a secondary frontend, and upgrade the hardware in the older PC after that. That was a good plan on paper.

So, the first part, the new PC went out ok because I used a bit of brains and threw money at the problem: I'm just too old to fuck around with PC hardware and build my own HTPC case: there are too many things that can not work together, requiring multiple trips to the store to exchange part, take stuff out and back in...
I sent a bid to microcenter, and they actually did a good job building the HTPC. I got a good enough case, was able to get drivers to talk to the front panel LCD, and effectively everything worked except the built in IR port that was hardwired to only talk to a microsoft remote (no thank you). After adding a PVR-350 and wiring its IR receiver, everything worked hardware-wise (Asus P5E-VM HDMI G35, dual core duo 3Ghz, and got the built in intel video chip to work with Xorg. The case is Antec Fusion Black 430 HTPC, which is not small but fairly nice).

Mmmh, and then:

Moving my mythtv setup to work on the new box cost me a lot of lost hair and sleep. This is where the DB in mythtv is a pain in the ass. I had to hand edit the DB to change the IP of my main mythtv server (I didn't even try anything as foolish as renaming the hostname, especially as I called my main myth server, 'myth', making a search/replace in the DB a guaranteed failure).
What happened is that I set a NULL value to my hostname and later fixed it back to be a 'NULL', except that phpmyadmin was nice enough to put the 'NULL' ASCII string instead of NULL, making debug output perplexing since it was effectively looking for NULL and not finding NULL in the DB.
Thanks to Mikal for steering me in the right direction for debugging this after about a week of pulling my hair... After that, everything was working.

This is where a smart person would have quit while he was ahead, but no, I had that 3rd task which was to upgrade the CPU in my old myth box (an AMD Semptron 3100+). I ordered an AMD 4000+ (2.6Ghz instead of 1.8Ghz), the fastest socket 754 upgrade available for that motherboard. I hoped that it would be fast enough to decode H264/1080p, but it turned out not to be that easy to find out.

My old HTPC

The CPU of Doom

One of my many attempts at making it working: a beefier PS from my desktop PC

So the full story took over a month, but basically the new CPU has an integrated memory controller that is very subtly incompatible with my motherboard (I probably have and older bad revision of the hardware).
End result: the CPU works fine if I limit the memory in linux to 252MB. Anything beyond that and it'll crash. Lovely!
(yes, yes, I really tried everything: other memory, other slots, better power supply, memtest, standing on one foot, etc...).
And the best part, kinda? After about a month of trying I did get linux to boot and work with 252MB, and was able to verify that even overclocked at 2.8Ghz, the new CPU can't decode H264 at 1080p anyway (including with the enhanced windows software decoder you pay for).

Boy, I want my 20 hours back!

2008/11/29 Solved Disk Array Instability

π 2008-11-29 01:01 in Linux

Oh boy, do I feel like putting an egg in my face...
I finally found the problem that caused me soo much grief when I upgraded 5 of my drives from 250GB to 1TB a bit more than a year ago, and then the reason why since that upgrade, I've had repeated failures with my other array comprized of 500GB drives.
I spent countless hours debugging port multiplier problems and once that was stable enough to run (although it would still log loads of warnings/errors/retries), my 500GB drives started to be somewhat unreliable, and would have a high likelyhood of dying during the monthly scrub (/usr/share/mdadm/checkarray).

So, I'll give you the answer right away: my 600W power supply wasn't delivering enough power to the drives through the disk array. It's unclear how or why, the said disk array had multiple power connectors, but everything was working fine when I first set it up for power and load, back when I had 250GB drives.
It's only later as I upgraded the drives that the new ones were just a bit too power hungry, and that the disk array had poor power routing, causing some occasional unreliability (i.e. it worked well enough and long enough that I didn't suspect that a power problem had come back). The fix was pretty simple, power each disk array from a different power source (one now uses a molex power strand while the other uses a sata power strand). Just for fun, I'll add that the entire system actually only uses 200W out of its 600W power supply, so it didn't seem obvious at the time (and still isn't), that I was simply overloading one of the power branches, or that the disk arrays really needed more than one connector to be plugged in.

This was really the problem where you can cook a frog by slowly warming up the water it is in. I never noticed that I got into a situation where the power was marginal, because it happened slowly, and I got unclear symptoms: errors on PMP, but I started using PMP back from when it was unstable and errors were common, and I was getting drive failures on my 500GB drives while the 1TB ones were rock solid (on the same power bus, go figure). The worst part is that the seagate drives would develop real bad sectors as a result, so it just looked like PMP wasn't very stable still and that the seagate drives I had were crap (for the record, those drives are still iffy as they do not reallocate bad blocks by themselves, which is not supposed to happen, marginal power or not).
The haha moment was finally when I was testing my 3rd brand "new remanufactured" drive from seagate, that drive was having issues too, even though it only had 2 hours of runtime. Then I noticed with smartctl -HAi /dev/device that the drive had 168 power on events... in 2 hours! Yes, from there I could tell it had been losing power. The rest is history...

I'm happy I finally found the problem, but I must have put 40 hours down the drain over the last 2 years as a result of this power issue :(

2008/11/29 Ubuntu Intrepid Ibex Upgrade From Hell and Network Manager Sucks

π 2008-11-29 01:01 in Linux

This started with me trying to debug a networking issue with my networking jumping wireless networks behind my back. It was a pretty minor problem, but it had beeen annoying me a bit, so I figured I'd tackle it.
Against better judgement, I figured I'd first upgrade my ubuntu hardy to the just released Intrepid (I guess the name said it all). After the upgrade, I had no more networking, and no more X. Swell...

Networking was easy to bring up temporarily: I had to bring up the interface by hand and networkmanager would no more see loss of link and bring down eth0, which in turn triggered one of my scripts to bring up wireless on eth1.

The upgrade to Xorg 1.5.3 was supposed to be a good thing, but it made X crash every 10 minutes or so with fglrx (which was nicely upgraded for me), or with the radeonhd driver. I first had to upgrade my kernel to 2.6.27.7 (from 2.6.24) to stop the crashes, and after a fair amount of work, got the radeon and radeonhd drivers working with my mobility firegl V5200 (3d almost works, it just crashes with radeonhd and is very slow with radeon, but when I have time, I'll do some svn pull to get even later drivers and it should work I'm told).
At least the good news is that I'm now running an OSS radeon driver, no more fglrx binary blob. I also get 3D and for the very first time: AIGLX and compositing in enlightenment.

Networkmanager might be supposed to be cool, but totally fucks up your life if you're not using it exactly the way it was intended. I had auto plugging working, that stopped after an upgrade. I had auto switching from wired to wireless (through dhcp scripts) and that stopped too (after networkmanager took over the function of ifplugd), and that stopped working too. After that, I even got networkmanager to just SEGV in protest.

Then, I tried to make networkmanager work, but I soon found out that it's been riddled with bugs and not been playing nice with the rest of the system if you have non standard configs or need to admin some interfaces by hand. Sure, it has an exclude mode, where it will now not even bring the interface down on loss of link (it used to), forcing me to go back to ifplugd or the newer wicd.
The old network manager had null asserts that I reported and were never fixed.
The new one is even worse, it segvs if I manually bring up eth1:


NetworkManager: <info>  Unmanaged Device found; state CONNECTED forced. (see http://bugs.launchpad.net/bugs/191889)
NetworkManager: <WARN>  nm_supplicant_interface_add_cb(): Unexpected supplicant error getting interface: wpa_supplicant couldn't grab this interface.
[1]+  Segmentation fault      NetworkManager --no-daemon

Then I tried starting clean by removing all my interfaces from =/etc/network/interfaces=, and networkmanager refused to manage my interface anyway. It looks like it's one of the many problems that people have been seeing.
I like the fix, which says:

As a workaround removed network-manager
sudo apt-get remove network-manager
And i started my network device with:
sudo ifup eth0
Hope this helps

I filed my bug here anyway

And then, as I read the pretty light docs with no info on real troubleshooting, or WTF won't it even manage my eth0, I see gems like these:

you may want to restart the system-settings daemon using the command:
"sudo killall nm-system-settings" to apply those changes.

Err, what? You have to kill a daemon with killall to re-read config files? WTFBBQ?

NetworkManager, you're not managing any of my networks anymore. It looks like wicd will do the job, and if not I'll just go back to ifplugd and custom scripts.

Ubuntu folks: you put out a good distro, but your love affair with gnome and utter shite like networkmanager is not making you look good.

2008/11/30 Magic Motherboard Crash And Raid Rebuild With DD Rescue

π 2008-11-30 01:01 in Linux

Less than a year after I built it, magic started rebooting almost daily while one of its drives was exhibiting some worrisome smart errors. On the way back from Palo Alto Aiport, with my fiancée's visiting family in tow, I thought I'd stop by the data center on the way, swap the power supply and the bad drive. It was supposed to be a 10mn job.
Yes, you already know the rest, it wasn't.

First, the machine never rebooted after I put in the new power supply, nor would it power up with the old one (well, the fans started, but no POST). I eventually gave up and brought the machine home for further diagnostics. I found out in the end that one of the CPU slots on the motherboard donated by benley went bad, and the machine would not boot with any CPU in it (the CPUs themselves still seemed ok).
Luckily, I got an old machine called 'ins1' a while ago, as a spare should something like this happen, so it was just a matter of switching motherboards and CPUs. Good thing I had planned for that.

The part where I screwed up is that I had to replace sda with a new drive that I had prepared. I had 6 drives in the machine and no way to know which one was which outside of a label I had made on the front of the box, for a case just like this. So, I pulled the drive, and put a new one in and rebooted the machine with one CPU. I had meant to boot single user mode, but I messed up the boot command line, and when I tried to sysrq to stop multiuser, it wasn't working and the machine eventually booted in multi user mode and started to write on the degraded raid set. (turns out I had a mini keyboard that didn't support sending sysrq)
It's only a bit later that I logged in and realized that I had pulled the wrong drive and since I had written on the raidset I couldn't just shut down and put the good drive back in without some amount of filesystem corruption (I did have to do this once because I had no choice, but it's not something you do first).
(oh, and it was the wrong drive because during the install, I replaced that sata board for another one, and the other board had its port in reverse order, so my labels were also in reverse order...

By then, I only had once choice left, rebuild on a drive that was already good by using the failing drive, and sure enough the failing drive had bad sectors that prevented the rebuild to complete. I still could have forced the raid to discard the bad drive and rebuild the raidset by forcing options to use the drive I was rebuilding on, as a good drive. It works perfectly if you didn't write on raidset in between, but since I had, I figured I'd try to just clone the bad drive since it only had about 5 bad blocks.

First, I went with dd conv=noerror,sync bs=512, but then googled during the long copy that there was a better way: Gnu ddrescue (don't get confused between that in the older dd_rescue and ddr_help). ddrescue is really mostly the same, except that it copies bigger blocks until it finds and error, had a logfile with recovery, and will retry bad blocks a few times before giving up on them (dd just skips them and replaces them with zeros, which you won't find with with rsync, unless you call rsync with -c and you even know which file(s) have 0s in side, which is very non trivial with a filesystem over lvm over raid5).

The magic command is therefore: ddrescue -v -r 10 -d /dev/sda4 /dev/sdd4 log which takes about 3H on a 250GB drive at 25MB/s average speed.

If ddrescue isn't able to rescue the bad blocks, in theory I should be able to compute the parity for just those blocks from the other drives (including the one I was rebuilding on), hoping/assuming that those blocs weren't ones that got changed in the short amount of time the good drive was removed from the raid. Unfortunately, doing so is pretty non trivial, and there are no tools that I could find to hand pick sectors to rebuild in one direction vs another direction (not counting that it would be super error prone).
The good news is that ddrescue -r 10 was about right: it tried to re-read my bad block 3 times and was able to get the data off the 3rd time, so I got a perfect mirror copy of my drive with issues and won't have to wonder later which portion of which filesystem got a bunch of 0s in the middle of it. Yeah! :)
(the actual data wasn't that important, I had backups of most of it, but it would have been a bit of a pain to recreate, and I always use such an opportunity to learn about the different recovery techniques and tools so that I know what to do the day I come across something very important to restore, hopefully not my data :) )

Contact Email

1996/11/18-21:	Linux Pavillion Comdex Fall 1996 (photos only). I've been going since then to help at the linux pavillion.
1997/11/18-21:	Linux Pavillion Comdex Fall 1997 (photos only)
1998/05/28-30:	Linuxexpo 1998 (photos only)
1998/11/16-20:	Linux Pavillion Comdex Fall 1998 (full report)
1998/11/11:	Silicon Valley Tea Party (report with pictures)
1999/02/15:	Windows Refund Day (report with pictures)
1999/03/20:	SVLUG KTEH night (photos only)
1999/03/01-04:	LinuxWorld Expo Winter 99 (complete report with many pictures)
1999/03/31:	Mozilla Party one year anniversary (photos only)
1999/05/18-22:	Linuxexpo 1999 (complete report with many pictures)
1999/06/07:	June 99 Balug meeting with Linus
1999/08/09-12:	LinuxWorld Expo Summer 99 (complete report with many pictures)
1999/11/15-19:	Linux Business Show at Comdex Fall 1999 (full report with pictures)
2000/08/14-17:	LinuxWorld Expo Summer 2000 (complete report with many pictures)
2001/01/17-20:	Linux.conf.au/LCA 2001 (complete report with pictures)
2001/07/25-28:	OLS 2001 (photos only)
2001/08/25:	Linux 10th Anniversary (report with pictures)
2001/09/27-30:	LinuxWorld Expo Summer 2001 report with pictures)
2001/11/05-10:	ALS 2001 (photos only)
2002/06/26-29:	OLS 2002 (photos only)
2003/01/20-25:	LCA 2003 (photos only)
2003/07/23-26:	OLS 2003 (photos only)
2004/01/12-17:	LCA 2004 (photos only)
2004/07/21-24:	OLS 2004 (photos only)
2005/04/18-23:	LCA 2005 (photos only)
2006/01/24-28:	LCA 2006 (photos only)
2007/01/17-21:	LCA 2007 (photos only)

Marc's Public Blog - Linux Hacking