March 1, 2010

Setting up Monit on Ubuntu

Posted in Software at 00:18 by Graham King

Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:

  • When a process dies.
  • When a machine stops responding to network requests
  • When your machine has too high load average, memory consumption, or CPU usage.
  • When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.

It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.

Here’s how to get it working on Ubuntu:

Editing the config file


sudo apt-get install monit
sudo vim /etc/default/monit

Edit the single line to startup=1.

The config file that comes with monit is well commented, but just in case here’s the breakdown.


sudo vim /etc/monit/monitrc

Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log


set daemon 120
set logfile syslog facility log_daemon

Setup email alerts:


set mailserver localhost
set mail-format { from: monit@myserver.domain.com }
set alert sysadmin@domain.com

Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.


set httpd port 2812                                                                     
    use address myserver.domain.com                                                                 
    allow 0.0.0.0/0.0.0.0                                                                           
    allow myusername:mypassword

Monitor the machine itself:


check system myserver.domain.com
    if loadavg (1min) > 4 then alert
    if loadavg (5min) > 3 then alert
    if memory usage > 75% then alert
    if cpu usage (user) > 70% then alert
    if cpu usage (system) > 30% then alert
    if cpu usage (wait) > 20% then alert

Then monitor all the services running on that box.

If you monitor the totalcpu resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.

For the Apache monitor, the PID file is defined in /etc/apache2/envvars, and is usually /var/run/apache2.pid.

Here are the service monitoring lines from my config:

Quick Code


check process apache with pidfile /var/run/apache2.pid
    start program = "/etc/init.d/apache2 start" with timeout 20 seconds
    stop program  = "/etc/init.d/apache2 stop"
    if totalcpu > 20% for 2 cycles then alert
    if totalcpu > 20% for 5 cycles then restart
 
check process nginx with pidfile /var/run/nginx.pid
    start program = "/etc/init.d/nginx start"
    stop program = "/etc/init.d/nginx stop"
 
check process gearmand with pidfile /var/run/gearman/gearmand.pid
    start program = "/etc/init.d/gearman-job-server start"
    stop program = "/etc/init.d/gearman-job-server stop"
 
check process memcached with pidfile /var/run/memcached.pid
    start program = "/etc/init.d/memcached start"
    stop program = "/etc/init.d/memcached stop"
 
check process mysqld with pidfile /var/run/mysqld/mysqld.pid
    start program = "/etc/init.d/mysql start"
    stop program = "/etc/init.d/mysql stop"

Start monit, and query it:


sudo /etc/init.d/monit start
sudo monit status

You need the HTTP interface to use the ’status’ command.

Monitoring MySQL replication

To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.

The idea comes from replication monitoring with monit, where they use Ruby.

I ported the script to Python, as a Django management command.

The crontab:


# m h  dom mon dow   command
* * * * * /usr/local/myproject/mysql-watchdog-cron.sh

The shell script:


#!/bin/bash
cd /usr/local/myproject
/usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1

The Django command:

Quick Code


import os
import logging
 
from django.core.management.base import NoArgsCommand
from django.db import connection
 
WATCH = '/usr/local/myproject/mysql_monit_watchdog'
 
def mysql_fetch_one_dict(cursor):
    "Like DB-API's fetch_one but returns a dict instead of a tuple"
    data = cursor.fetchone()
    if not data:
        return None
    desc = cursor.description
 
    dict = {}
 
    for (name, value) in zip(desc, data):
        dict[name[0]] = value
 
    return dict
 
class Command(NoArgsCommand):
    'Touch a file if MySQL replication is running'
    
    help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file'
    
    def handle_noargs(self, **options):
        'Called by NoArgsCommand'
        
        cursor = connection.cursor()
        cursor.execute('SHOW SLAVE STATUS')
        row = mysql_fetch_one_dict(cursor)
        if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes':
            with file(WATCH, 'a'):
                os.utime(WATCH, None)
        else:
            logging.error('*ERROR*: Slave IO not running')

Add these lines to /etc/monit/monitrc:


check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog                  
    if timestamp > 3 minutes then alert      

Happy monitoring!

1 Comment »

  1. JSC said,

    March 27, 2010 at 18:47

    Thank you, this was very helpful.

Leave a Comment

Note: Your comment will only appear on the site once I approve it manually. This can take a day or two. Thanks for taking the time to comment.