Graham King

Solvitas perambulum

Setting up Monit on Ubuntu

Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:

  • When a process dies.
  • When a machine stops responding to network requests
  • When your machine has too high load average, memory consumption, or CPU usage.
  • When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.

It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.

Here’s how to get it working on Ubuntu:

Editing the config file

sudo apt-get install monit
sudo vim /etc/default/monit

Edit the single line to startup=1.

The config file that comes with monit is well commented, but just in case here’s the breakdown.

sudo vim /etc/monit/monitrc

Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log

set daemon 120
set logfile syslog facility log_daemon

Setup email alerts:

set mailserver localhost
set mail-format { from: monit@myserver.domain.com }
set alert sysadmin@domain.com

Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.

set httpd port 2812
    use address myserver.domain.com
    allow 0.0.0.0/0.0.0.0
    allow myusername:mypassword

Monitor the machine itself:

check system myserver.domain.com
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 3 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert

Then monitor all the services running on that box.

If you monitor the totalcpu resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.

For the Apache monitor, the PID file is defined in /etc/apache2/envvars, and is usually /var/run/apache2.pid.

Here are the service monitoring lines from my config:

check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start" with timeout 20 seconds
    stop program  = "/etc/init.d/apache2 stop"
    if totalcpu > 20% for 2 cycles then alert
    if totalcpu > 20% for 5 cycles then restart

check process nginx with pidfile /var/run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"

check process gearmand with pidfile /var/run/gearman/gearmand.pid
    start program = "/etc/init.d/gearman-job-server start"
    stop program = "/etc/init.d/gearman-job-server stop"

check process memcached with pidfile /var/run/memcached.pid
    start program = "/etc/init.d/memcached start"
    stop program = "/etc/init.d/memcached stop"

check process mysqld with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"

Start monit, and query it:

sudo /etc/init.d/monit start
sudo monit status

You need the HTTP interface to use the ‘status’ command.

Monitoring MySQL replication

To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.

The idea comes from replication monitoring with monit, where they use Ruby.

I ported the script to Python, as a Django management command.

The crontab:

# m h  dom mon dow   command
* * * * * /usr/local/myproject/mysql-watchdog-cron.sh

The shell script:

#!/bin/bash
cd /usr/local/myproject
/usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1

The Django command:

import os
import logging

from django.core.management.base import NoArgsCommand
from django.db import connection

WATCH = '/usr/local/myproject/mysql_monit_watchdog'

def mysql_fetch_one_dict(cursor):
    "Like DB-API's fetch_one but returns a dict instead of a tuple"
    data = cursor.fetchone()
    if not data:
        return None
    desc = cursor.description

    dict = {}

    for (name, value) in zip(desc, data):
        dict[name[0]] = value

    return dict


class Command(NoArgsCommand):
    'Touch a file if MySQL replication is running'

    help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file'

    def handle_noargs(self, **options):
        'Called by NoArgsCommand'

        cursor = connection.cursor()
        cursor.execute('SHOW SLAVE STATUS')
        row = mysql_fetch_one_dict(cursor)
        if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes':
            with file(WATCH, 'a'):
                os.utime(WATCH, None)
        else:
            logging.error('*ERROR*: Slave IO not running')

Add these lines to /etc/monit/monitrc:

check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog
    if timestamp > 3 minutes then alert

Happy monitoring!