March 1, 2010

Setting up Monit on Ubuntu

Posted in Software at 00:18 by graham

Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:

  • When a process dies.
  • When a machine stops responding to network requests
  • When your machine has too high load average, memory consumption, or CPU usage.
  • When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.

It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.

Here’s how to get it working on Ubuntu:

Editing the config file

sudo apt-get install monit
sudo vim /etc/default/monit

Edit the single line to startup=1.

The config file that comes with monit is well commented, but just in case here’s the breakdown.

sudo vim /etc/monit/monitrc

Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log

set daemon 120
set logfile syslog facility log_daemon

Setup email alerts:

set mailserver localhost
set mail-format { from: monit@myserver.domain.com }
set alert sysadmin@domain.com

Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.

set httpd port 2812
    use address myserver.domain.com
    allow 0.0.0.0/0.0.0.0
    allow myusername:mypassword

Monitor the machine itself:

check system myserver.domain.com
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 3 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert

Then monitor all the services running on that box.

If you monitor the totalcpu resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.

For the Apache monitor, the PID file is defined in /etc/apache2/envvars, and is usually /var/run/apache2.pid.

Here are the service monitoring lines from my config:

check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start" with timeout 20 seconds
    stop program  = "/etc/init.d/apache2 stop"
    if totalcpu > 20% for 2 cycles then alert
    if totalcpu > 20% for 5 cycles then restart

check process nginx with pidfile /var/run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"

check process gearmand with pidfile /var/run/gearman/gearmand.pid
    start program = "/etc/init.d/gearman-job-server start"
    stop program = "/etc/init.d/gearman-job-server stop"

check process memcached with pidfile /var/run/memcached.pid
    start program = "/etc/init.d/memcached start"
    stop program = "/etc/init.d/memcached stop"

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"

Start monit, and query it:

sudo /etc/init.d/monit start
sudo monit status

You need the HTTP interface to use the ‘status’ command.

Monitoring MySQL replication

To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.

The idea comes from replication monitoring with monit, where they use Ruby.

I ported the script to Python, as a Django management command.

The crontab:

# m h  dom mon dow   command
* * * * * /usr/local/myproject/mysql-watchdog-cron.sh

The shell script:

#!/bin/bash
cd /usr/local/myproject
/usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1

The Django command:

import os
import logging

from django.core.management.base import NoArgsCommand
from django.db import connection

WATCH = '/usr/local/myproject/mysql_monit_watchdog'

def mysql_fetch_one_dict(cursor):
    "Like DB-API's fetch_one but returns a dict instead of a tuple"
    data = cursor.fetchone()
    if not data:
        return None
    desc = cursor.description

    dict = {}

    for (name, value) in zip(desc, data):
        dict[name[0]] = value

    return dict


class Command(NoArgsCommand):
    'Touch a file if MySQL replication is running'

    help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file'

    def handle_noargs(self, **options):
        'Called by NoArgsCommand'

        cursor = connection.cursor()
        cursor.execute('SHOW SLAVE STATUS')
        row = mysql_fetch_one_dict(cursor)
        if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes':
            with file(WATCH, 'a'):
                os.utime(WATCH, None)
        else:
            logging.error('*ERROR*: Slave IO not running')

Add these lines to /etc/monit/monitrc:

check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog
    if timestamp > 3 minutes then alert

Happy monitoring!

9 Comments »

  1. Muthukumar said,

    July 11, 2013 at 14:29

    Hi every one…

    i need to get email alert if %CPU is above 95% for any current running services can help me for this script…

    Thanks

  2. Ramesh said,

    March 22, 2012 at 11:46

    Hi,

    can i use “monit” to monitor cognos services… we have cognos running on ubuntu-linux, we wanted to do cognos start/stop/restart using monit tool same as other services like apache2, etc …

    Could you please advice me will it suite for monitoring cognos services using monit.

    Because i am not finding any .pid for cognos on /var/run, if not so how can i change my monit configuration to monitor cognos services.

    Thanks in Advance !!

    Ram

  3. Alex Overton said,

    September 28, 2011 at 21:41

    Hi, Thanks for the info, I am about to install monit on my ubuntu 10.3 with plesk 10 should it be ok ….or ….heading for problems,

    Its the plesk bit I am worried about !

    thanks for any help Alex

  4. Arnaud said,

    July 8, 2011 at 11:12

    sweeeeeet !! thanks man your tut really helped me!

  5. Ben said,

    December 21, 2010 at 18:20

    Great post – thanks for that.

  6. Graham King said,

    December 6, 2010 at 19:53

    @hari In the example given the http interface would be at http://myserver.domain.com:2812 You might need to open that port on your firewall. Otherwise it should just work.

    @Rik Bignell I’m not sure, but I’d guess the problem is in your exim config. I’m using postfix. Try telnet-ing to port 25 on your mail server from your monit machine, and seeing if you can send a mail by hand to / from the addresses in your monit config.

  7. hari said,

    September 19, 2010 at 21:23

    thanks for this post. I have followed everything and started monit, but how do I view the http interface that you mentioned?

    thanks

  8. Rik Bignell said,

    August 30, 2010 at 10:43

    Thanks, it seems to be working although i have two issues:

    One, i’m using monit to monitor clam-deamon via its pid

    It seems to be doing as it should and restarts the deamon if it fails although i dont get an email when it fails and the status always says “Connection failed”:

    Process ‘clamd’ status Connection failed monitoring status monitored pid 20937 parent pid 1 uptime 22m childrens 0 memory kilobytes 137512 memory kilobytes total 137512 memory percent 7.1% memory percent total 7.1% cpu percent 0.0% cpu percent total 0.0% data collected Mon Aug 30 10:33:56 2010

    My exim mainlog looks as if its trying to send a mail:

    2010-08-30 10:37:03 1Oq0nR-0006dw-CK SMTP connection lost after final dot H=(scud) [192.168.1.254] P=smtp 2010-08-30 10:37:05 H=(scud) [192.168.1.254] Warning: HELO (scud) is no FQDN (contains no dot) (See RFC2821 4.1.1.1) 2010-08-30 10:37:11 1Oq0nZ-0006e7-Gv H=(scud) [192.168.1.254] Warning: DEBUG load_avgx1000: 179 recipients_count: 1 1 defered_recipients: 0 failed_recipients: 0 spam_score: -97.2 message_size: 513

  9. JSC said,

    March 27, 2010 at 18:47

    Thank you, this was very helpful.

Leave a Comment

Note: Your comment will only appear on the site once I approve it manually. This can take a day or two. Thanks for taking the time to comment.