Marc's Public Blog - All Items

My main server, magic.merlins.org, which you are reading this page on, had its biggest downtime in a while: 5 to 8 hours depending on the services (www came back up first).
I could actually have brought the services back up quicker by failing over to my secondary live server, but because of state involved, and work involved in making my secondary server, primary for mail, and then switching back (this includes making my mailman backup primary too, and then dealing with queues, archives, and all that fun stuff).
After asserting that I'd be able to bring magic back up, I just opted to ride the downtime and not worry about switching the services to moremagic, and then back to magic a few hours later: too much work was involved, and I had enough work on my hands recovering magic as is.
That said, if magic were to really die one day, like the hardware dying (and it could happen, I found out that one of my two CPUs in there actually has died and that the server is continuing to work with one CPU left), then I would do a bona fide switchover to moremagic.

So what happened?
I went to the colo to upgrade the drives in my external array (from 36G to 180G, upping the external storage to 1TB).
Unfortunately, while I was swapping the drives on the live server, for some reason, I decided to run rescan-scsi-bus to see my new drives were being seen, and something went very wrong there: that command caused something very bad to happen on my primary system SCSI bus and caused the system array to fail.
When I rebooted (oh and that was with a new kernel, since I used the reboot to upgrade kernels too), my raid5 array was not being seen, and I only had my root filesystem: no /usr, /var, or anything else.
From there, I started debugging, and trying the typical commands to bring back a raid array that was killed, but it would only bring one drive back out of 5, which was insufficient.
At that point, the next step is to rebuild the raid5 array on top of itself, which is supposed to bring every back up. I had done this in the very distant past, and it had worked.
Unfortunately, it worked enough for my raid5 array to function as a physical volume for my lvm volume group, and it even showed my logical volumes within that VG. I thought I was home free, until I got the dreaded error that none of my filesystems were mountable or even looked like ext3.
After several reboots which were not fun because I had to boot with init=/bin/bash due to a problem with the new kernel (I didn't know that yet), and then manually bring up udev, udevd, lvm, and raid5 (it's become non trivial to do this nowadays), I realized that the new mdadm tools created a different default raid5 array when the tools from 2002, so I had overlayed new md blocks that weren't compatible with the data I had on disk (yet, it was close since I could see my VG and LVs). After more time and more reboots, I realized that the chunck size for raid had changed from 32K to 64K and that the new default raid layout was left-symmetric instead of left-asymmetric (WTF did they have to change that).
Well, 2H later, I had my raid array back up, with my VG and LVs. I was then able to mount all my filesystems, except /var which had been damaged beyond e2fsck recovery (i.e the entire filesystem was in pieces in lost+found). In hindsight, I should have backed up that data before wiping it, but at the time, I felt the data was toast, and I didn't have the time to wait for a 10GB copy to another partition.
My recovery plan was to copy /var from moremagic, which would be close, but not quite the same (it was as different machine, but I had some shared data pieces that were rsynced daily), and then rsync/overlay the real data that I had on an almost full machine backup on my main disk server at home.
Then, I had to add the missing pieces (like recent pictures), from my laptop.
In the end, it took 4 to 6 hours of copies to get most of the system back to where it was, with very little data loss. I did lose files that had recently been uploaded to my ftp server (I don't back that up, it's too big), and I did lose 8 hours of work and frustration to piece everything back together.

I was then able to bring apache back up first, but I had to wait longer for Email for a 2GB mailman sync to finish. As I write this, I'm still rsyncing logs back and it'll probably take another 12H or so, but the server has been back up and working since about 17:30.
On one side, I'm glad I had reasonable backups and lost virtually nothing, as well as the fact that I was able to rebuild the server in place instead of bringing it back home and having to make a new one from scratch, but on the other side, the 8 or so hours I spent doing this, sucked.
I'm also concerned that I was able to lose an entire partition just for running rescan-scsi-bus, which I had run many times in the past without such problems.

Update1
Actually, I found out that I lost most of my archived web logs from 1999 to 2005. I'm kind of sad about that, but such is life I guess. It could have been much worse...

Update2
Never mind, I actually didn't lose anything, except a lot of time. After rebooting this morning (after my last backup restores had finished over night, a full 24H after the machine went down), I just realized that /var/ftp, which I thought I lost was indeed a separate partition (duh!) and therefore wasn't lost when /var was lost. This means that in the end I didn't lose any data at all, except a lot of time.
I can't quite say that I haven't lost anything on a raid5 array anymore, but at least I didn't lose the actual data since I had backups of it all. Pffeew...

2007/11/17 Sometimes, I'm wondering WTF linux is going

π 2007-11-17 17:31 by Merlin in Linux

In the old days, we had ifconfig, dhclient, APM, and things were simple.

First, came ACPI. This is not linux's fault, but boy did it make something simple as putting your laptop to sleep a real pain in the ass sometimes. I'm not sure how many hours I spent learning the acpi system, and getting it work on my thinkpad back before distros made it mostly work in most cases (but seriously APM, just worked, and ACPI was a pain in the ass)
I recently upgraded to a new laptop (thinkpad Z61p), on which I figured I'd put a brand new ubuntu feisty (now upgraded to gutsy), and I'm still running a recent kernel.org kernel instead of the vendor provided one. Maybe I'm getting punished for refusing to run gnome/KDE (I really tried, but gnome still sucks, and KDE still didn't quite do it, so I'm back to enlightenment), but simple things don't work:

For some obscure reason, Fn+F4 calls acpi_fakekey, which then does nothing (apparently, it might still be talking to the wrong /dev/input/event0), instead of just simply calling the sleep script. Why so complicated? I mean this crap:
```
cat /etc/acpi/sleepbtn.sh
#!/bin/bash
. /usr/share/acpi-support/key-constants
#acpi_fakekey $KEY_SLEEP
/etc/acpi/sleep.sh
```
Seriously, WTF is acpi_fakekey, and why is there no documentation for it?
tpb (thinkpad display) just worked, but was replaced by some complicated hotkey-setup package that does autodetection and still did the wrong thing for my laptop, and still doesn't do anything useful on my laptop with enlightenment (I had to hand re-install tpb, which ubuntu nicely made incompatible with hotkey-setup and ubuntu-desktop)
pulseaudio just did not work due to a misbuild (/tmp/.esd vs /tmp/esd-uid), yielding broken sound laptop-wide
but the best one is by far avahi, dhcdbd, and other network autoconfiguration stuff. Long are the days of simple ifplugd autoconfigure and /etc/network/interfaces is simply empty. Keeping up with all this stuff is starting to be really a mess, especially as documentation there is pretty light too.

I suppose that by the time all this is working, I'll still end up with a better config than what I can do on windows, but damn, it seems like it's getting unnecessarly hard...

2007/11/16 Thunderhill: Ariel Atom Experience and Team Racing Track Day

π 2007-11-16 19:53 by Merlin in Cars, Ncars

Jason and I had been keeping track of the weather in Willows while impatiently waiting for the day we would be able to each drive an Ariel Atom at thunderhill, thanks to the folks from Ariel Atom Experience . You can see what that car looks like in top gear
The next day, I was able to line up a Bonni track day, making the trip to Willows, even more worth it. The flight there was interesting since it was pretty dark flying up, but I got there within slightly less than one hour, well in time to get some rest for the next morning.

I was impressed by the setup and organization at the Ariel Atom Experience. The organizers definitely cared about safety, customer service, and offering everything they could. Some people first expressed that $1500 was expensive for a track day, but I personally thought it was a bargain to drive a car I would never be able to drive otherwise, and it was without counting the extras that were offered:

Access to the cars, obviously. A track day in my car tends to be expensive if I factor in the set of pads I'll be wearing out, the high rotor wear and tear, and the tire use (for the Ariels they had to worry about flat spots since they had no ABS).
one on one instruction the whole day
GPS equipped cars to generate a video of your last session, showing you on the track, along with 2 separate camera shots of you, the car and track, and your position on the track
Headset communication with your instructor, and radio communication with the base if they had an emergency or need to shut down the track
Eflags, which are basically a visual representation of flag status inside your car (yellow, black...), so that you can't miss an outside flagger.
Oh, and of course, an entire track for 4 to 8 cars (well, not quite, more on that later)

See the other pictures of that day and the video of my best lap.
The "problem" (from my point of view that is), is that the event was marketed to anyone, including people who had never driven at a track (apparently, they even had someone show up who did not even know how to drive a stick shift). As a result, they not only had to make sure we knew how to drive the car (i.e. no ABS, no traction control, a lot of power and fast response, and tires that gripped well, but did not seem to give a lot of warning when they were at the edge of traction), but they also had to teach people how to drive around a track without going off the next turn, or oversteering, going sideways in a ditch, and flipping the car.
Quite frankly, my kudos to the instructors who had to sit along for the ride, and didn't have much control over the car if things went haywire. My afternoon instructor mostly just sat there silently as I went around lapping faster and faster throughout the sessions, and reminded me that I should not stay too close to the people in front of me because I could not expect them to do rational things or not to spin out (good point indeed).
Now, I thought the main problem, from someone reasonably experienced with track days, and the track in question, was that the morning was mostly a throwaway: 3 sessions of lead follow where I only got to pass a couple of cars on the last session, and that session was still very slow. Again, I can't blame them for doing that when some of the other drivers didn't know what they were doing, but I really wish we had been put in skill groups: we only had 4 cars on a half track, and even with equal power, I would lap each of them at least once during each 20mn session (after being stuck behind them for a while).
While I understand they had a chicane in the straight so that people didn't go 130mph+ before trying to brake and turn (which can be hard with wheels that tend to lock up), I really wish we had had the entire track as opposed to a half track. I just have no idea how fast I was going compared to my F430 because I only had half track times which I could not do much with (also, the full track would have alleviated the problem with slower people getting in the way). Unfortunately, they didn't have enough flaggers to man the whole track. I really hope they do next year, especially as I can imagine the half track being a parking lot with 8 cars on it including slow people, instead of 4 (we were on the last day, and they had fewer people then, hence more room for us).
Maybe in addition to making skill groups on each day, since they had 4 days, they could also recommend a day for beginners who've never been at a track, and a day for experienced people. Within that group, you can even split it between people who know the track in question, and those who don't.

All that said, the cars were fun to drive, definitely required driver skill and gave you a sense for the road. Outside of the very mixed driver skill problem, the event was very well organized, and they all really deserve credit for organizing an event where you get to drive a car that you would not have been able to, otherwise. I think it's an awesome idea and I hope other such events pop up with other cars.

After a fun day of driving the Ariel Atom around, I went to chill out for the evening, and came back the next morning with my F430 that Scott nicely drove over for me.
The Pagids pads worked fine, and the Pirelli Pzero Corsa Rosso tires (tread wear 60), were great. For the first time, I brought my Thunderhill lap time under 2:10 (2:09 three times, including one when I passed someone), and I'm pretty sure I could have gone a bit lower still, but I had a problem with pretty bad vibration due to a missing wheel weight, or some other misbalance problem. Driving the car with that shake was a bit worrisome, but traction control and/or tire stickiness kept it on the road. No idea what it cost me time-wise though.
However, the extra speed cost me a lot more pad wear, and I ran out of pad after the 5th session, down to the plate. Doh! But, eh, I can't complain, I was apparently the fastest street car of the day. The flipside is that I ran my brakes pads down to the backing plates by the 5th session. I guess going faster meant wearing the pads faster too :)

Unfortunately, someone in the beginner group apparently had an ABS fluke/failure or some other problem and had his car shoot sideways when trying to brake before a turn, had the car go sideways off the turn, catch a tire in the dirt, and flip a few times. He was ok, but the car not as much.

See the other pictures of that day .
Since I was out of brakes, I decided to call it a day, which gave me the chance of flying home with daylight, which was good since I was able to keep an eye on the weather which was likely to be overcast and/or foggy, making takeoff and landing potentially harder. I turned out to be lucky at that time, and got clear skies for takeoff and landing, but that could have changed at any time.

2007/11/03 Darpa Urban Challenge

π 2007-11-03 14:46 by Merlin in Cars, Ncars, Public

It was a bit last minute, but I got the option of flying to the Darpa Urban Challenge in Vacaville in Southern California.
It was quite interesting to see those cars driving around by themselves with no one in them, and yet stop at stop signs, yield, pass one another, and everything else. Quite impressive!

the fake city had more than 50 cars pretending to be real traffic

I got to see this contestant being confused about where it was on the course (GPS failure?) and trying to turn in the wrong place until its sensors saw that it was going to hit the center divider, and it stopped. Most other cars had a much better view around them and could see the road layout, even if the GPS signal wasn't working

Stanford's Junior, first to cross the finish line

CMU's caterpillar, which actually won the 2 million dollar price

Carterpillar from Carnegin Mellon ended up winning
and Junior from Stanford finished the circuit first, but ranked second and had to settle for the second price of a mere million dollars :)
The plane ride back was quite interesting as I got the chance to ride in the cockpit and see the takeoff and landing phases.

While it did suck to get up so early to make the flight and spend the day there, it was a quite interesting day and flight. Here are more pictures of the darpa urban challenge .

More pages: May 2025 April 2025 March 2025 February 2025 January 2025 December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 February 2024 January 2024 December 2023 November 2023 October 2023 September 2023 August 2023 July 2023 June 2023 May 2023 April 2023 March 2023 February 2023 January 2023 December 2022 November 2022 October 2022 September 2022 August 2022 July 2022 June 2022 May 2022 April 2022 March 2022 February 2022 January 2022 December 2021 November 2021 October 2021 September 2021 August 2021 July 2021 June 2021 May 2021 April 2021 March 2021 February 2021 January 2021 December 2020 November 2020 October 2020 September 2020 August 2020 July 2020 June 2020 May 2020 April 2020 March 2020 February 2020 January 2020 December 2019 November 2019 October 2019 September 2019 August 2019 July 2019 June 2019 May 2019 April 2019 March 2019 February 2019 January 2019 December 2018 November 2018 October 2018 September 2018 August 2018 July 2018 June 2018 May 2018 April 2018 March 2018 February 2018 January 2018 December 2017 November 2017 October 2017 September 2017 August 2017 July 2017 June 2017 May 2017 April 2017 March 2017 February 2017 January 2017 December 2016 November 2016 October 2016 September 2016 August 2016 July 2016 June 2016 May 2016 April 2016 March 2016 February 2016 January 2016 December 2015 November 2015 October 2015 September 2015 August 2015 July 2015 June 2015 May 2015 April 2015 March 2015 February 2015 January 2015 December 2014 November 2014 October 2014 September 2014 August 2014 July 2014 June 2014 May 2014 April 2014 March 2014 February 2014 January 2014 December 2013 November 2013 October 2013 September 2013 August 2013 July 2013 June 2013 May 2013 April 2013 March 2013 February 2013 January 2013 December 2012 November 2012 October 2012 September 2012 August 2012 July 2012 June 2012 May 2012 April 2012 March 2012 February 2012 January 2012 December 2011 November 2011 October 2011 September 2011 August 2011 July 2011 June 2011 May 2011 April 2011 March 2011 February 2011 January 2011 December 2010 November 2010 October 2010 September 2010 August 2010 July 2010 June 2010 May 2010 April 2010 March 2010 February 2010 January 2010 December 2009 November 2009 October 2009 September 2009 August 2009 July 2009 June 2009 May 2009 April 2009 March 2009 February 2009 January 2009 December 2008 November 2008 October 2008 September 2008 August 2008 July 2008 June 2008 May 2008 April 2008 March 2008 February 2008 January 2008 December 2007 November 2007 October 2007 September 2007 August 2007 July 2007 June 2007 May 2007 April 2007 March 2007 February 2007 January 2007 December 2006 November 2006 October 2006 September 2006 August 2006 July 2006 June 2006 May 2006 April 2006 March 2006 February 2006 January 2006 December 2005 November 2005 October 2005 September 2005 August 2005 July 2005 June 2005 May 2005 April 2005 March 2005 February 2005 January 2005 December 2004 November 2004 October 2004 September 2004 August 2004 July 2004 June 2004 May 2004 April 2004 March 2004 February 2004 January 2004 October 2003 August 2003 July 2003 May 2003 April 2003 March 2003 January 2003 November 2002 October 2002 July 2002 May 2002 April 2002 March 2002 February 2002 November 2001 October 2001 September 2001 August 2001 July 2001 June 2001 May 2001 April 2001 March 2001 February 2001 January 2001 December 2000 November 2000 October 2000 September 2000 August 2000 July 2000 June 2000 April 1999 March 1999 September 1997 August 1997 July 1996 September 1993 July 1991 December 1988 December 1985 January 1980

Contact Email