March 1, 2010
Setting up Monit on Ubuntu
Monit tells you if something goes wrong on your server, and tries to fix it. It can, for example, alert you:
- When a process dies.
- When a machine stops responding to network requests
- When your machine has too high load average, memory consumption, or CPU usage.
- When a file changes, hasn’t changed for a period of time, or grows beyond a certain size.
It can run a script of your choosing to attempt to fix the problem. It has an HTTP interface that shows you essential stats about the services you are monitoring. For detailed graphs, I recommend Munin.
Here’s how to get it working on Ubuntu:
Editing the config file
sudo apt-get install monit sudo vim /etc/default/monit
Edit the single line to startup=1
.
The config file that comes with monit is well commented, but just in case here’s the breakdown.
sudo vim /etc/monit/monitrc
Set Monit to check services every two minutes (120 seconds), and log to /var/log/daemon.log
set daemon 120 set logfile syslog facility log_daemon
Setup email alerts:
set mailserver localhost set mail-format { from: monit@myserver.domain.com } set alert sysadmin@domain.com
Switch on the HTTP interface, allow access from anywhere, and require a username and password. Make it a decent password, because the HTTP interface allows you to stop and start services.
set httpd port 2812 use address myserver.domain.com allow 0.0.0.0/0.0.0.0 allow myusername:mypassword
Monitor the machine itself:
check system myserver.domain.com if loadavg (1min) > 4 then alert if loadavg (5min) > 3 then alert if memory usage > 75% then alert if cpu usage (user) > 70% then alert if cpu usage (system) > 30% then alert if cpu usage (wait) > 20% then alert
Then monitor all the services running on that box.
If you monitor the totalcpu
resource, note that is a percentage of all CPUs. On a 4 CPU machine, 25% represents a process consuming 100% of one CPU.
For the Apache monitor, the PID file is defined in /etc/apache2/envvars
, and is usually /var/run/apache2.pid
.
Here are the service monitoring lines from my config:
check process apache with pidfile /var/run/apache2.pid start program = "/etc/init.d/apache2 start" with timeout 20 seconds stop program = "/etc/init.d/apache2 stop" if totalcpu > 20% for 2 cycles then alert if totalcpu > 20% for 5 cycles then restart check process nginx with pidfile /var/run/nginx.pid start program = "/etc/init.d/nginx start" stop program = "/etc/init.d/nginx stop" check process gearmand with pidfile /var/run/gearman/gearmand.pid start program = "/etc/init.d/gearman-job-server start" stop program = "/etc/init.d/gearman-job-server stop" check process memcached with pidfile /var/run/memcached.pid start program = "/etc/init.d/memcached start" stop program = "/etc/init.d/memcached stop" check process mysqld with pidfile /var/run/mysqld/mysqld.pid start program = "/etc/init.d/mysql start" stop program = "/etc/init.d/mysql stop"
Start monit, and query it:
sudo /etc/init.d/monit start sudo monit status
You need the HTTP interface to use the ‘status’ command.
Monitoring MySQL replication
To monit MySQL replication, create a script to touch a file if replication is still running. Put that script in cron. Get monit to check that file.
The idea comes from replication monitoring with monit, where they use Ruby.
I ported the script to Python, as a Django management command.
The crontab:
# m h dom mon dow command * * * * * /usr/local/myproject/mysql-watchdog-cron.sh
The shell script:
#!/bin/bash cd /usr/local/myproject /usr/bin/python manage.py mysql_replication_monit >> /dev/null 2>&1
The Django command:
import os import logging from django.core.management.base import NoArgsCommand from django.db import connection WATCH = '/usr/local/myproject/mysql_monit_watchdog' def mysql_fetch_one_dict(cursor): "Like DB-API's fetch_one but returns a dict instead of a tuple" data = cursor.fetchone() if not data: return None desc = cursor.description dict = {} for (name, value) in zip(desc, data): dict[name[0]] = value return dict class Command(NoArgsCommand): 'Touch a file if MySQL replication is running' help = 'Touch a file if MySQL replication is running. Call from cron. Monit checks that file' def handle_noargs(self, **options): 'Called by NoArgsCommand' cursor = connection.cursor() cursor.execute('SHOW SLAVE STATUS') row = mysql_fetch_one_dict(cursor) if row['Slave_IO_Running'] == 'Yes' and row['Slave_SQL_Running'] == 'Yes': with file(WATCH, 'a'): os.utime(WATCH, None) else: logging.error('*ERROR*: Slave IO not running')
Add these lines to /etc/monit/monitrc:
check file mysql_replication with path /usr/local/myproject/mysql_monit_watchdog if timestamp > 3 minutes then alert
Happy monitoring!
Muthukumar said,
July 11, 2013 at 14:29
Hi every one…
i need to get email alert if %CPU is above 95% for any current running services can help me for this script…
Thanks
Ramesh said,
March 22, 2012 at 11:46
Hi,
can i use “monit” to monitor cognos services… we have cognos running on ubuntu-linux, we wanted to do cognos start/stop/restart using monit tool same as other services like apache2, etc …
Could you please advice me will it suite for monitoring cognos services using monit.
Because i am not finding any .pid for cognos on /var/run, if not so how can i change my monit configuration to monitor cognos services.
Thanks in Advance !!
Ram
Alex Overton said,
September 28, 2011 at 21:41
Hi, Thanks for the info, I am about to install monit on my ubuntu 10.3 with plesk 10 should it be ok ….or ….heading for problems,
Its the plesk bit I am worried about !
thanks for any help Alex
Arnaud said,
July 8, 2011 at 11:12
sweeeeeet !! thanks man your tut really helped me!
Ben said,
December 21, 2010 at 18:20
Great post – thanks for that.
Graham King said,
December 6, 2010 at 19:53
@hari In the example given the http interface would be at http://myserver.domain.com:2812 You might need to open that port on your firewall. Otherwise it should just work.
@Rik Bignell I’m not sure, but I’d guess the problem is in your exim config. I’m using postfix. Try telnet-ing to port 25 on your mail server from your monit machine, and seeing if you can send a mail by hand to / from the addresses in your monit config.
hari said,
September 19, 2010 at 21:23
thanks for this post. I have followed everything and started monit, but how do I view the http interface that you mentioned?
thanks
Rik Bignell said,
August 30, 2010 at 10:43
Thanks, it seems to be working although i have two issues:
One, i’m using monit to monitor clam-deamon via its pid
It seems to be doing as it should and restarts the deamon if it fails although i dont get an email when it fails and the status always says “Connection failed”:
Process ‘clamd’ status Connection failed monitoring status monitored pid 20937 parent pid 1 uptime 22m childrens 0 memory kilobytes 137512 memory kilobytes total 137512 memory percent 7.1% memory percent total 7.1% cpu percent 0.0% cpu percent total 0.0% data collected Mon Aug 30 10:33:56 2010
My exim mainlog looks as if its trying to send a mail:
2010-08-30 10:37:03 1Oq0nR-0006dw-CK SMTP connection lost after final dot H=(scud) [192.168.1.254] P=smtp 2010-08-30 10:37:05 H=(scud) [192.168.1.254] Warning: HELO (scud) is no FQDN (contains no dot) (See RFC2821 4.1.1.1) 2010-08-30 10:37:11 1Oq0nZ-0006e7-Gv H=(scud) [192.168.1.254] Warning: DEBUG load_avgx1000: 179 recipients_count: 1 1 defered_recipients: 0 failed_recipients: 0 spam_score: -97.2 message_size: 513
JSC said,
March 27, 2010 at 18:47
Thank you, this was very helpful.