Marc's Public Blog - Linux Hacking


All | Aquariums | Arduino | Btrfs | Cars | Cats | Clubbing | Computers | Dining | Diving | Electronics | Exercising | Festivals | Flying | Halloween | Hbot | Hiking | Linux | Linuxha | Mexico | Monuments | Museums | Outings | Public | Rc | Sciencemuseums | Solar | Tfsf | Trips

This page has a few of my blog entries about linux, but my main linux page is here
Picture of Linus

Here is a list of older linux event reports I made before my blog was started, then the rest are below
1996/11/18-21:Linux Pavillion Comdex Fall 1996 (photos only). I've been going since then to help at the linux pavillion.
1997/11/18-21: Linux Pavillion Comdex Fall 1997 (photos only)
1998/05/28-30: Linuxexpo 1998 (photos only)
1998/11/16-20: Linux Pavillion Comdex Fall 1998 (full report)
1998/11/11: Silicon Valley Tea Party (report with pictures)
1999/02/15: Windows Refund Day (report with pictures)
1999/03/20: SVLUG KTEH night (photos only)
1999/03/01-04: LinuxWorld Expo Winter 99 (complete report with many pictures)
1999/03/31: Mozilla Party one year anniversary (photos only)
1999/05/18-22: Linuxexpo 1999 (complete report with many pictures)
1999/06/07: June 99 Balug meeting with Linus
1999/08/09-12: LinuxWorld Expo Summer 99 (complete report with many pictures)
1999/11/15-19: Linux Business Show at Comdex Fall 1999 (full report with pictures)
2000/08/14-17: LinuxWorld Expo Summer 2000 (complete report with many pictures)
2001/01/17-20: Linux.conf.au/LCA 2001 (complete report with pictures)
2001/07/25-28: OLS 2001 (photos only)
2001/08/25: Linux 10th Anniversary (report with pictures)
2001/09/27-30: LinuxWorld Expo Summer 2001 report with pictures)
2001/11/05-10: ALS 2001 (photos only)
2002/06/26-29: OLS 2002 (photos only)
2003/01/20-25: LCA 2003 (photos only)
2003/07/23-26: OLS 2003 (photos only)
2004/01/12-17: LCA 2004 (photos only)
2004/07/21-24: OLS 2004 (photos only)
2005/04/18-23: LCA 2005 (photos only)
2006/01/24-28: LCA 2006 (photos only)
2007/01/17-21: LCA 2007 (photos only)

Here is a list of all the talks I've given:

And below are my blog posts:



>>> Back to post index <<<

2008/11/29 Solved Disk Array Instability
π 2008-11-29 01:01 in Linux
Oh boy, do I feel like putting an egg in my face...
I finally found the problem that caused me soo much grief when I upgraded 5 of my drives from 250GB to 1TB a bit more than a year ago, and then the reason why since that upgrade, I've had repeated failures with my other array comprized of 500GB drives.
I spent countless hours debugging port multiplier problems and once that was stable enough to run (although it would still log loads of warnings/errors/retries), my 500GB drives started to be somewhat unreliable, and would have a high likelyhood of dying during the monthly scrub (/usr/share/mdadm/checkarray).

So, I'll give you the answer right away: my 600W power supply wasn't delivering enough power to the drives through the disk array. It's unclear how or why, the said disk array had multiple power connectors, but everything was working fine when I first set it up for power and load, back when I had 250GB drives.
It's only later as I upgraded the drives that the new ones were just a bit too power hungry, and that the disk array had poor power routing, causing some occasional unreliability (i.e. it worked well enough and long enough that I didn't suspect that a power problem had come back). The fix was pretty simple, power each disk array from a different power source (one now uses a molex power strand while the other uses a sata power strand). Just for fun, I'll add that the entire system actually only uses 200W out of its 600W power supply, so it didn't seem obvious at the time (and still isn't), that I was simply overloading one of the power branches, or that the disk arrays really needed more than one connector to be plugged in.

This was really the problem where you can cook a frog by slowly warming up the water it is in. I never noticed that I got into a situation where the power was marginal, because it happened slowly, and I got unclear symptoms: errors on PMP, but I started using PMP back from when it was unstable and errors were common, and I was getting drive failures on my 500GB drives while the 1TB ones were rock solid (on the same power bus, go figure). The worst part is that the seagate drives would develop real bad sectors as a result, so it just looked like PMP wasn't very stable still and that the seagate drives I had were crap (for the record, those drives are still iffy as they do not reallocate bad blocks by themselves, which is not supposed to happen, marginal power or not).
The haha moment was finally when I was testing my 3rd brand "new remanufactured" drive from seagate, that drive was having issues too, even though it only had 2 hours of runtime. Then I noticed with smartctl -HAi /dev/device that the drive had 168 power on events... in 2 hours! Yes, from there I could tell it had been losing power. The rest is history...

I'm happy I finally found the problem, but I must have put 40 hours down the drain over the last 2 years as a result of this power issue :(


More pages: February 2004 March 2004 November 2004 April 2005 August 2005 January 2006 July 2006 August 2007 November 2007 January 2008 October 2008 November 2008 December 2008 January 2009 May 2009 July 2009 August 2009 September 2009 November 2009 December 2009 January 2010 March 2010 April 2010 June 2010 August 2010 October 2010 January 2011 July 2011 August 2011 December 2011 January 2012 March 2012 May 2012 August 2012 December 2012 January 2013 March 2013 May 2013 September 2013 November 2013 January 2014 March 2014 April 2014 May 2014 October 2014 January 2015 March 2015 May 2015 January 2016 February 2016 March 2016 June 2016 July 2016 August 2016 October 2016 January 2017 September 2017 January 2018 March 2018 December 2018 January 2019 January 2020 May 2020 September 2021 March 2023 April 2023

>>> Back to post index <<<

Contact Email