August 10, 2009
Choosing a message queue for Python on Ubuntu on a VPS
More and more, my web apps need to run things in the background: Sending email, re-calculating values, fetching website thumbnails, etc. In short, I need a message queue in my toolbox.
Luckily for me, message queues are this years Hot New Thing, so there’s some good options. I looked at RabbitMQ, Gearman, Beanstalkd and StompServer.
I’d like the message queue to play nice with Python, with Ubuntu, and take almost no memory, as I’m on a Virtual Private Server, and I’d like it to stay up forever. I want small and solid.
Summary
| RabbitMQ | Gearman | Beanstalkd | StompServer | |
|---|---|---|---|---|
| Language | Erlang | C | C | Ruby |
| In Ubuntu? | Yes: rabbitmq-server | Yes: gearman-job-server | No | No, it’s a Ruby gem |
| Python lib | amqplib | gearman | pybeanstalk | stomp-py |
| In PyPI? | Yes | Yes | No | No |
| Memory | 9Mb | 1.4Mb | 0.5Mb | 7Mb |
| Protocol | AMQP | Custom | Custom | STOMP |
| License | MPL | BSD | GPL | MIT |
Memory size is the resident set size, obtained like so: ps -Ao pid,rsz,args | grep <name>. If there is a better way of estimating memory please let me know in the comments.
RabbitMQ
An all-singing all-dancing “complete and highly reliable Enterprise Messaging system”. With language like that you’d expect horrible bloat and per-cpu licensing, but happily that’s not the case. It’s straightforward to setup and relatively lean.
The protocol, AMQP, comes from the financial world, and is intended to replace Tibco’s RendezVous, the backbone of most investment banks. There’s lots of documentation, lots of users, a healthy ecosystem, and it looks good on your CV.
I tried RabbitMQ first, and liked it so much I almost stopped my evaluation right there and deployed it.
The best tutorial for using it from Python is here: Rabbits and Warrens
Publisher
import sys
import time
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False)
chan = conn.channel()
i = 0
while 1:
msg = amqp.Message('Message %d' % i)
msg.properties["delivery_mode"] = 2
chan.basic_publish(msg,exchange="sorting_room",routing_key="testke y")
i += 1
time.sleep(1)
chan.close()
conn.close()
Consumer
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672", userid="guest", password="guest", virtual_host="/", insist=False)
chan = conn.channel()
chan.queue_declare(queue="po_box", durable=True, exclusive=False, auto_delete=False)
chan.exchange_declare(exchange="sorting_room", type="direct", durable=True, auto_delete=False,)
chan.queue_bind(queue="po_box", exchange="sorting_room", routing_key="testkey")
def recv_callback(msg):
print msg.body
chan.basic_consume(queue='po_box', no_ack=True, callback=recv_callback, consumer_tag="testtag")
while True:
chan.wait()
#chan.basic_cancel("testtag")
#chan.close()
#conn.close()
Gearman
Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.
Developed by Danga Interactive (essentially Brad Fitzpatrick, who brought us Memcached and Perlbal). Used by LiveJournal, Digg and Yahoo.
Ubuntu users: Make sure you install package gearman-job-server, which is the newer leaner C version of Gearman. Don’t install gearman-server, that is the old Perl version. Also install package gearman-tools to get the command line tool.
Client
import sys
import time
from gearman import GearmanClient, Task
client = GearmanClient(["127.0.0.1"])
i = 0
while 1:
client.dispatch_background_task('speak', i)
print 'Dispatched %d' % i
i += 1
time.sleep(1)
Worker
import time
from gearman import GearmanWorker
def speak(job):
r = 'Hello %s' % job.arg
print r
return r
worker = GearmanWorker("[127.0.0.1]")
worker.register_function('speak', speak, timeout=3)
worker.work()
Beanstalkd
Beanstalkd is a fast, distributed, in-memory workqueue service. Its interface is generic, but was designed for use in reducing the latency of page views in high-volume web applications by running most time-consuming tasks asynchronously.
Developed for a very popular Facebook Application. The smallest memory footprint: after startup, connecting, sending a few messages, it’s resident memory size (rsz) was still only 0.5 Mb!
To install the server:
- sudo apt-get install libevent-dev
- wget http://xph.us/dist/beanstalkd/beanstalkd-1.3.tar.gz
- tar xvzf beanstalkd-1.3.tar.gz
- ./configure
- make (there’s no install step, it just generates the file ‘beanstalkd’)
To install the Python library:
- wget http://pybeanstalk.googlecode.com/files/pybeanstalk-0.11.1.tar.gz
- extract it
- sudo python setup.py install
There’s a good tutorial here: http://parand.com/say/index.php/2008/10/12/beanstalkd-python-basic-tut orial/
Producer
import time
from beanstalk import serverconn
from beanstalk import job
def producer_main(connection):
i = 0
while True:
data = 'This is data to be consumed (%s)!' % (i,)
print data
data = job.Job(jid=i,data=data, conn=connection)
data.Queue()
time.sleep(1)
i += 1;
connection = serverconn.ServerConn('localhost', 11300)
#connection.job = job.Job
producer_main(connection)
Consumer
from beanstalk import serverconn
from beanstalk import job
def consumer_main(connection):
while True:
j = connection.reserve()
print 'got work: %s' % j.data
j.Finish()
connection = serverconn.ServerConn('localhost', 11300)
connection.job = job.Job
consumer_main(connection)
StompServer
StompServer is a lightweight pure Ruby STOMP server.
To install the server on Ubuntu:
- sudo apt-get install ruby-dev rubygems
- sudo gem install stompserver
To install the Python library:
- wget http://stomppy.googlecode.com/files/stomp.py-2.0.1.tar.gz
- extract it
- sudo python setup.py install
There’s a good Python / Stompserver tutorial here: http://morethanseven.net/2008/09/14/using-python-and-stompserver-get-s tarted-message-q/
Sender
import time
import stomp
conn = stomp.Connection()
conn.start()
conn.connect()
i = 0
while 1:
conn.send('Message %d' % i, destination='/queue/test')
i += 1
time.sleep(1)
conn.disconnect()
Listener
import time
import sys
import stomp
class MyListener(object):
def on_error(self, headers, message):
print 'received an error %s' % message
def on_message(self, headers, message):
print 'received a message %s' % message
conn = stomp.Connection()
conn.set_listener('', MyListener())
conn.start()
conn.connect()
conn.subscribe(destination='/queue/test', ack='auto')
while 1:
time.sleep(2)
Results and Conclusions
I’d be happy working with any of these four. All four were easy to setup, fast, decent in memory consumption, and had good Python libraries.
RabbitMQ has the most mindshare (it is the only one which registers on Google Trends), but it took the most memory and is the most complex to use. It looks like a great product, but it’s Message Oriented Middleware, not an in-memory job queue, so it’s not what I’m looking for.
StompServer had the least documentation, and took several times more memory than Gearman and Beanstalkd. In seems the most immature project, but would probably be a good choice for someone working in Ruby.
Beanstalkd is great. I would like to see it in the Ubuntu repositories, and it’s Python lib in PyPI, but aside from that, I can’t fault it. I’m not choosing it, because Gearman is even better.
Gearman was designed for exactly the problem I have, takes almost no memory (1.4Mb), has a great pedigree (Danga), is widely deployed (LiveJournal, Digg, Yahoo), is in Ubuntu, has a Python library in PyPI, and someone helped me out on the #gearman IRC channel straight away. It even has queue persistence and clustering. So, Gearman it is.
RSS / Atom feed
Graham King said,
October 20, 2009 at 19:58
@Rich
I’ve been using Gearman in production for two months now and it just works. Nothing to report. It sits there, shunting messages between my Django and my workers. Memory usage has hardly changed, I’ve never had to restart it, it’s perfect.
Björn Lindqvist said,
October 19, 2009 at 13:16
Thanks a lot for the information! I have the exact same setup with django + lighttpd + python on a vps and your analysis is very helpful.
Rich said,
October 14, 2009 at 04:22
Hi Graham.
Thanks for posting this; it’s exactly what we are looking for right now!
I wonder, have you got any further with gearman and are you still happy with it?
Cheers
rich