Setting up Monit on Ubuntu

February 28, 2010 software monit monitoring sysadmin ubuntu

Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:

When a process dies.
When a machine stops responding to network requests
When your machine has too high load average, memory consumption, or CPU usage.
When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.

It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.

Here’s how to get it working on Ubuntu:

Editing the config file

sudo apt-get install monit
sudo vim /etc/default/monit

Edit the single line to startup=1.

The config file that comes with monit is well commented, but just in case here’s the breakdown.

sudo vim /etc/monit/monitrc

Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log

set daemon 120
set logfile syslog facility log_daemon

Setup email alerts:

set mailserver localhost
set mail-format { from: monit@myserver.domain.com }
set alert sysadmin@domain.com

Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.

set httpd port 2812
    use address myserver.domain.com
    allow 0.0.0.0/0.0.0.0
    allow myusername:mypassword

Monitor the machine itself:

check system myserver.domain.com
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 3 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert

Then monitor all the services running on that box.

If you monitor the totalcpu resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.

For the Apache monitor, the PID file is defined in /etc/apache2/envvars, and is usually /var/run/apache2.pid.

Here are the service monitoring lines from my config:

check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start" with timeout 20 seconds
    stop program  = "/etc/init.d/apache2 stop"
    if totalcpu > 20% for 2 cycles then alert
    if totalcpu > 20% for 5 cycles then restart

check process nginx with pidfile /var/run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"

check process gearmand with pidfile /var/run/gearman/gearmand.pid
    start program = "/etc/init.d/gearman-job-server start"
    stop program = "/etc/init.d/gearman-job-server stop"

check process memcached with pidfile /var/run/memcached.pid
    start program = "/etc/init.d/memcached start"
    stop program = "/etc/init.d/memcached stop"

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"

Start monit, and query it:

sudo /etc/init.d/monit start
sudo monit status

You need the HTTP interface to use the ‘status’ command.

Monitoring MySQL replication

To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.

The idea comes from replication monitoring with monit, where they use Ruby.

I ported the script to Python, as a Django management command.

The crontab:

# m h  dom mon dow   command
* * * * * /usr/local/myproject/mysql-watchdog-cron.sh

The shell script:

#!/bin/bash
cd /usr/local/myproject
/usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1

The Django command:

import os
import logging

from django.core.management.base import NoArgsCommand
from django.db import connection

WATCH = '/usr/local/myproject/mysql_monit_watchdog'

def mysql_fetch_one_dict(cursor):
    "Like DB-API's fetch_one but returns a dict instead of a tuple"
    data = cursor.fetchone()
    if not data:
        return None
    desc = cursor.description

    dict = {}

    for (name, value) in zip(desc, data):
        dict[name[0]] = value

    return dict


class Command(NoArgsCommand):
    'Touch a file if MySQL replication is running'

    help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file'

    def handle_noargs(self, **options):
        'Called by NoArgsCommand'

        cursor = connection.cursor()
        cursor.execute('SHOW SLAVE STATUS')
        row = mysql_fetch_one_dict(cursor)
        if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes':
            with file(WATCH, 'a'):
                os.utime(WATCH, None)
        else:
            logging.error('*ERROR*: Slave IO not running')

Add these lines to /etc/monit/monitrc:

check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog
    if timestamp > 3 minutes then alert

Happy monitoring!

Graham King

Editing the config file

Monitoring MySQL replication