I’ve been using Munin for myself to replace Cacti and been generally liking it. As someone who has used Cacti for a long time I never seem to get my head around it totally. Something would go wrong and it would never be easy for me to find out just what was failing as I thought it should be. But really what I liked was the ability to send me some email alerts. My monitoring needs are not grandiose so this was fine for me. Except the default email alerts aren’t the easiest to read, but they are short.
Subject: Munin notification int.inutility.net :: host.int.inutility.net :: Memory usage in percent CRITICALs: Shared memory is 100.00 (outside range [:98]), Cached memory is 100.00 (outside range [:98]). OKs: Swap space is 0.00, Physical memory is 26.54, Virtual memory is 13.65, Memory buffers is 6.72.
What I wanted was something which was easy to glance at and get a quick feeling for whether it was terrible, bad, or ‘eh’. The Munin documentation could be a little easier read to me but for the most part it is instructive. It’s also missing some useful variables. I think they moved their documentation so I wonder if it hasn’t made it across yet or got lost in the transfer.
I found Herman Schistad’s post was a good start and combined with a Serverfault question I came up with the following settings for me /etc/munin/munin.conf:
contact.admin.command mail -s "[munin] ${var:worst} alert for ${var:host} / ${var:graph_title}" user@monitor.inutility.net
contact.admin.text \
${var:worst} alert for "${var:graph_title}" on node "${var:host}"\n\n\
${if:cfields * CRITICAL (${var:numcfields}):\n \
${loop< >:cfields * ${var:label} is ${var:value} (outside range [${var:crange}]) ${if:extinfo : ${var:extinfo}}\n}\n}\
${if:wfields * WARNING (${var:numwfields}):\n \
${loop< >:wfields * ${var:label} is ${var:value} (outside range [${var:wrange}]) ${if:extinfo : ${var:extinfo}}\n}\n}\
${if:ofields * OK (${var:numofields}):\n \
${loop< >:ofields * ${var:label} is ${var:value} ${if:extinfo : ${var:extinfo}}\n}\n}\
Further details: http://monitor.inutility.net/munin/${var:group}/${var:host}/${var:plugin}.html
Which produces output like this:
Subject: [munin] CRITICAL alert for host.int.inutility.net / Memory usage in percent CRITICAL alert for "Memory usage in percent" on node "host.int.inutility.net" * CRITICAL (2): * Cached memory is 100.00 (outside range [:98]) * Shared memory is 100.00 (outside range [:98]) * OK (4): * Swap space is 0.00 * Virtual memory is 13.89 * Memory buffers is 6.76 * Physical memory is 27.01 Further details: http://monitor.inutility.net/int.inutility.net/host.int.inutility.net/snmp_host.int.inutility.net_df_ram.html
This gives me the useful information in the subject such as the level of alert (CRITICAL, WARNING or ok). You also get sufficiently spaced out in the email for a glance to give me the heads up and it hides the alert level when there are none for that level.
In particular the ${var:worst} variable wasn’t on the alerts documentation which was useful in providing an easy way to see if the alert was terrible or just actually just coming back to normal.
You can also use ${fofields} instead of ${ofields} to show you fields which are coming back into OK from warning or critical state. Except that a bug meant it was the same as ${ofields} in my version of Munin on Debian 8/Jessie.
Your mileage may vary but it might be useful to someone.
