August 10, 2009

Choosing a message queue for Python on Ubuntu on a VPS

Posted in Software at 06:05 by Graham King

More and more, my web apps need to run things in the background: Sending email, re-calculating values, fetching website thumbnails, etc. In short, I need a message queue in my toolbox.

Luckily for me, message queues are this years Hot New Thing, so there’s some good options. I looked at RabbitMQ, Gearman, Beanstalkd and StompServer.

I’d like the message queue to play nice with Python, with Ubuntu, and take almost no memory, as I’m on a Virtual Private Server, and I’d like it to stay up forever. I want small and solid.

Summary

RabbitMQ Gearman Beanstalkd StompServer
Language Erlang C C Ruby
In Ubuntu? Yes: rabbitmq-server Yes: gearman-job-server No No, it’s a Ruby gem
Python lib amqplib gearman pybeanstalk stomp-py
In PyPI? Yes Yes No No
Memory 9Mb 1.4Mb 0.5Mb 7Mb
Protocol AMQP Custom Custom STOMP
License MPL BSD GPL MIT

Memory size is the resident set size, obtained like so: ps -Ao pid,rsz,args | grep <name>. If there is a better way of estimating memory please let me know in the comments.

RabbitMQ

An all-singing all-dancing “complete and highly reliable Enterprise Messaging system”. With language like that you’d expect horrible bloat and per-cpu licensing, but happily that’s not the case. It’s straightforward to setup and relatively lean.

The protocol, AMQP, comes from the financial world, and is intended to replace Tibco’s RendezVous, the backbone of most investment banks. There’s lots of documentation, lots of users, a healthy ecosystem, and it looks good on your CV.
I tried RabbitMQ first, and liked it so much I almost stopped my evaluation right there and deployed it.

The best tutorial for using it from Python is here: Rabbits and Warrens

Publisher

Quick Code


import sys
import time
 
from amqplib import client_0_8 as amqp
 
conn = amqp.Connection(host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False)
chan = conn.channel()
 
i = 0
while 1:
    msg = amqp.Message('Message %d' % i)
    msg.properties["delivery_mode"] = 2
    chan.basic_publish(msg,exchange="sorting_room",routing_key="testke y")
    i += 1
    time.sleep(1)
 
chan.close()
conn.close()

Consumer

Quick Code


from amqplib import client_0_8 as amqp
 
conn = amqp.Connection(host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False)
chan = conn.channel()
 
chan.queue_declare(queue="po_box", durable=True, exclusive=False, auto_delete=False)
chan.exchange_declare(exchange="sorting_room", type="direct", durable=True, auto_delete=False,)
 
chan.queue_bind(queue="po_box", exchange="sorting_room", routing_key="testkey")
 
def recv_callback(msg):
    print msg.body
 
chan.basic_consume(queue='po_box', no_ack=True, callback=recv_callback, consumer_tag="testtag")
 
while True:
    chan.wait()
 
#chan.basic_cancel("testtag")
#chan.close()
#conn.close()

Gearman

Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.

Developed by Danga Interactive (essentially Brad Fitzpatrick, who brought us Memcached and Perlbal). Used by LiveJournal, Digg and Yahoo.

Ubuntu users: Make sure you install package gearman-job-server, which is the newer leaner C version of Gearman. Don’t install gearman-server, that is the old Perl version. Also install package gearman-tools to get the command line tool.

Client


import sys
import time
 
from gearman import GearmanClient, Task
 
client = GearmanClient(["127.0.0.1"])
 
i = 0
while 1:
    client.dispatch_background_task('speak', i)
    print 'Dispatched %d' % i
    i += 1
    time.sleep(1)

Worker


import time
 
from gearman import GearmanWorker
 
def speak(job):
    r = 'Hello %s' % job.arg
    print r
    return r
 
worker = GearmanWorker("[127.0.0.1]")
worker.register_function('speak', speak, timeout=3)
worker.work()

Beanstalkd

Beanstalkd is a fast, distributed, in-memory workqueue service. Its interface is generic, but was designed for use in reducing the latency of page views in high-volume web applications by running most time-consuming tasks asynchronously.

Developed for a very popular Facebook Application. The smallest memory footprint: after startup, connecting, sending a few messages, it’s resident memory size (rsz) was still only 0.5 Mb!

To install the server:

  • sudo apt-get install libevent-dev
  • wget http://xph.us/dist/beanstalkd/beanstalkd-1.3.tar.gz
  • tar xvzf beanstalkd-1.3.tar.gz
  • ./configure
  • make (there’s no install step, it just generates the file ‘beanstalkd’)

To install the Python library:

  • wget http://pybeanstalk.googlecode.com/files/pybeanstalk-0.11.1.tar.gz
  • extract it
  • sudo python setup.py install

There’s a good tutorial here: http://parand.com/say/index.php/2008/10/12/beanstalkd-python-basic-tut orial/

Producer

Quick Code


import time
 
from beanstalk import serverconn
from beanstalk import job
 
def producer_main(connection):
    i = 0
    while True:
        data = 'This is data to be consumed (%s)!' % (i,)
        print data
        data = job.Job(jid=i,data=data, conn=connection)
        data.Queue()
        time.sleep(1)
        i += 1;
 
connection = serverconn.ServerConn('localhost', 11300)
#connection.job = job.Job
producer_main(connection)

Consumer

Quick Code


from beanstalk import serverconn
from beanstalk import job
 
def consumer_main(connection):
    while True:
        j = connection.reserve()
        print 'got work: %s' % j.data
        j.Finish()
 
connection = serverconn.ServerConn('localhost', 11300)
connection.job = job.Job
consumer_main(connection)

StompServer

StompServer is a lightweight pure Ruby STOMP server.

To install the server on Ubuntu:

  • sudo apt-get install ruby-dev rubygems
  • sudo gem install stompserver

To install the Python library:

  • wget http://stomppy.googlecode.com/files/stomp.py-2.0.1.tar.gz
  • extract it
  • sudo python setup.py install

There’s a good Python / Stompserver tutorial here: http://morethanseven.net/2008/09/14/using-python-and-stompserver-get-s tarted-message-q/

Sender

Quick Code


import time
 
import stomp
 
conn = stomp.Connection()
conn.start()
conn.connect()
 
i = 0
while 1:
    conn.send('Message %d' % i, destination='/queue/test')
    i += 1
    time.sleep(1)
 
conn.disconnect()

Listener

Quick Code


import time
import sys
 
import stomp
 
class MyListener(object):
    def on_error(self, headers, message):
        print 'received an error %s' % message
 
    def on_message(self, headers, message):
        print 'received a message %s' % message
 
conn = stomp.Connection()
conn.set_listener('', MyListener())
conn.start()
conn.connect()
 
conn.subscribe(destination='/queue/test', ack='auto')
 
while 1:
    time.sleep(2)

Results and Conclusions

I’d be happy working with any of these four. All four were easy to setup, fast, decent in memory consumption, and had good Python libraries.

RabbitMQ has the most mindshare (it is the only one which registers on Google Trends), but it took the most memory and is the most complex to use. It looks like a great product, but it’s Message Oriented Middleware, not an in-memory job queue, so it’s not what I’m looking for.

StompServer had the least documentation, and took several times more memory than Gearman and Beanstalkd. In seems the most immature project, but would probably be a good choice for someone working in Ruby.

Beanstalkd is great. I would like to see it in the Ubuntu repositories, and it’s Python lib in PyPI, but aside from that, I can’t fault it. I’m not choosing it, because Gearman is even better.

Gearman was designed for exactly the problem I have, takes almost no memory (1.4Mb), has a great pedigree (Danga), is widely deployed (LiveJournal, Digg, Yahoo), is in Ubuntu, has a Python library in PyPI, and someone helped me out on the #gearman IRC channel straight away. It even has queue persistence and clustering. So, Gearman it is.

3 Comments »

  1. Graham King said,

    October 20, 2009 at 19:58

    @Rich

    I’ve been using Gearman in production for two months now and it just works. Nothing to report. It sits there, shunting messages between my Django and my workers. Memory usage has hardly changed, I’ve never had to restart it, it’s perfect.

  2. Björn Lindqvist said,

    October 19, 2009 at 13:16

    Thanks a lot for the information! I have the exact same setup with django + lighttpd + python on a vps and your analysis is very helpful.

  3. Rich said,

    October 14, 2009 at 04:22

    Hi Graham.
    Thanks for posting this; it’s exactly what we are looking for right now!

    I wonder, have you got any further with gearman and are you still happy with it?

    Cheers

    rich

Leave a Comment

Note: Your comment will only appear on the site once I approve it manually. This can take a day or two. Thanks for taking the time to comment.