π
2014-10-05 01:01
in Btrfs, Linux
Simple cronjob to detect btrfs kernel hangs
As of 3.16, btrfs can still deadlock (it happens more frequently with older kernels).
You don't want a pile of stuck processes and having a half working server that you need to debug later. The cronjob below (a template for you to modify) will alert you when your load avarage is too high, show you which processes are blocked (could be something else than btrfs), and also show you if your swap is runnig out (btrfs has memory bugs with quotas and snapshots that can eat all your memory).
SHELL=/bin/bash
# If load average is more than MAXLA, show load average and all blocked processes
# As any time show anything blocked on wait_current_trans.isra.15 (used to be a btrfs hang bug)
# Also show swap if it drops below MINSWAP
# We pipe into bc because shell comparison doesn't do floating point.
*/5 * * * * nobody MAXLA=25; MINSWAP=10; if [ $(echo "$(awk '{print $1}' < /proc/loadavg) > $MAXLA" | bc) 1 ]]; then cat /proc/loadavg; ps -eo state,pid,etime,wchan:30,args | grep W |grep -v "^[RS]" ; fi; ps -eo pid,etime,wchan:30,args | grep -q [w]ait_current_trans.isra.15; if [ $(echo "$(free | grep 'Swap' | awk '{t = $2; f = $4; print (f/t*100)}') < $MINSWAP" | bc) 1 ]]; then free; fi |