August 10, 2009

Choosing a message queue for Python on Ubuntu on a VPS

Posted in Software at 06:05 by graham

Updated Sep 13, 2011 to include redis, remove stompserver, and update beanstalkd

More and more, my web apps need to run things in the background: Sending email, re-calculating values, fetching website thumbnails, etc. In short, I need a message queue in my toolbox.

Luckily for me, message queues are plentiful, so there’s some excellent options. I looked at RabbitMQ, Gearman, Beanstalkd, and Redis.

I’d like the message queue to play nice with Python, with Ubuntu, and take almost no memory, as I’m on a Virtual Private Server, and I’d like it to stay up forever. I want small and solid.

All of the python packages listed in the table are in PyPI. They can be installed with pip install <package>.

Summary

RabbitMQ Gearman Beanstalkd Redis
Language Erlang C C C
Ubuntu package rabbitmq-server gearman-job-server beanstalkd redis-server
Python lib amqplib gearman beanstalkc redis
Memory 9Mb 1.4Mb 0.7Mb 1.3Mb
License MPL BSD GPL BSD

Memory size is the resident set size, obtained like so: ps -Ao pid,rsz,args | grep <name>. If there is a better way of estimating memory please let me know in the comments.

RabbitMQ

An all-singing all-dancing “complete and highly reliable Enterprise Messaging system”. With language like that you’d expect horrible bloat and per-cpu licensing, but happily that’s not the case. It’s straightforward to setup and relatively lean.

The protocol, AMQP, comes from the financial world, and is intended to replace Tibco’s RendezVous, the backbone of most investment banks. There’s lots of documentation, lots of users, a healthy ecosystem, and it looks good on your CV. I tried RabbitMQ first, and liked it so much I almost stopped my evaluation right there and deployed it.

The best tutorial for using it from Python is here: Rabbits and Warrens

Publisher

import sys
import time

from amqplib import client_0_8 as amqp

conn = amqp.Connection(
    host="localhost:5672",
    userid="guest",
    password="guest",
    virtual_host="/",
    insist=False)
chan = conn.channel()

i = 0
while 1:
    msg = amqp.Message('Message %d' % i)
    msg.properties["delivery_mode"] = 2

    chan.basic_publish(msg,
        exchange="sorting_room",
        routing_key="testkey")
    i += 1
    time.sleep(1)

chan.close()
conn.close()

Consumer

from amqplib import client_0_8 as amqp

conn = amqp.Connection(
    host="localhost:5672",
    userid="guest",
    password="guest",
    virtual_host="/",
    insist=False)
chan = conn.channel()

chan.queue_declare(
    queue="po_box",
    durable=True,
    exclusive=False,
    auto_delete=False)
chan.exchange_declare(
    exchange="sorting_room",
    type="direct",
    durable=True,
    auto_delete=False,)

chan.queue_bind(
    queue="po_box",
    exchange="sorting_room",
    routing_key="testkey")

def recv_callback(msg):
    print msg.body

chan.basic_consume(
    queue='po_box',
    no_ack=True,
    callback=recv_callback,
    consumer_tag="testtag")

while True:
    chan.wait()

#chan.basic_cancel("testtag")
#chan.close()
#conn.close()

Gearman

Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.

Developed by Danga Interactive (essentially Brad Fitzpatrick, who brought us Memcached and Perlbal). Used by LiveJournal, Digg and Yahoo.

Ubuntu users: Make sure you install package gearman-job-server, which is the newer leaner C version of Gearman. Don’t install gearman-server, that is the old Perl version. Also install package gearman-tools to get the command line tool.

Client

import sys
import time

from gearman import GearmanClient, Task

client = GearmanClient(["127.0.0.1"])

i = 0
while 1:
    client.dispatch_background_task('speak', i)
    print 'Dispatched %d' % i
    i += 1
    time.sleep(1)

Worker

import time

from gearman import GearmanWorker

def speak(job):
    r = 'Hello %s' % job.arg
    print(r)
    return r

worker = GearmanWorker("[127.0.0.1]")
worker.register_function('speak', speak, timeout=3)
worker.work()

Beanstalkd

Beanstalkd is a fast, distributed, in-memory workqueue service. Its interface is generic, but was designed for use in reducing the latency of page views in high-volume web applications by running most time-consuming tasks asynchronously.

Developed for a very popular Facebook Application. The smallest memory footprint: after startup, connecting, sending a few messages, it’s resident memory size (rsz) was only 0.7 Mb!

The Python library depends on PyYAML, so you need:

pip install pyyaml beanstalkc

Andreas Bolka has a beanstalk tutorial here

Producer

import time
import beanstalkc

beanstalk = beanstalkc.Connection(host='localhost', port=11300)
i = 0
while True:
    beanstalk.put('Message %d' % i)
    i += 1
    time.sleep(1)

Consumer

import beanstalkc

beanstalk = beanstalkc.Connection(host='localhost', port=11300)
while True:
    job = beanstalk.reserve()
    print(job.body)
    job.delete()

Redis

Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets. It also works effectively as either a message bus or a message queue.

Simon Willison has an excellent Redis tutorial, which covers all the other things it can do.

Publisher

import redis
import time

r = redis.Redis()

i = 0
while True:
    r.rpush('queue', 'Message %d' % i)
    i += 1
    time.sleep(1)

Consumer

import redis

r = redis.Redis()
while True:
    val = r.blpop('queue')
    print(val)

Results and Conclusions

I’d be happy working with any of these. All of them were easy to setup, fast, decent in memory consumption, and had good Python libraries.

RabbitMQ is popular, but it took the most memory and is the most complex to use. It looks like a great product, but it’s Message Oriented Middleware, not an in-memory job queue, so it’s not what I’m looking for.

Beanstalkd is great. It can do persistent queues, and it’s the only true work server here, in that you tell it when a job completes. For the others you’d need to implement your own protocol. And beanstalkd takes almost no memory.

Gearman was designed for exactly the problem I have, takes little memory (1.4Mb), has a great pedigree (Danga), is widely deployed (LiveJournal, Digg, Yahoo), and someone helped me out on the #gearman IRC channel straight away. It even has queue persistence and clustering.

Until Sep 2011, I was happily using Gearman. Unfortunately it seems to not have much mindshare lately, and Redis is emerging as the new winner. Luckily, Redis is fantastic too.

Redis has by far the most mindshare Google Trends. It does all sorts of other things as well as being a message queue. Most likely it will be in your stack anyway. Queues persist automatically. It take a very small amount of memory. It take exactly two lines of code to send or receive a message. Redis is the new winner.

19 Comments »

  1. Share a post on message queue. | I'm a software engineer… said,

    May 8, 2013 at 06:48

    [...] post about message queue to be very useful. I have attached the link with this post below. http://www.darkcoding.net/software/choosing-a-message-queue-for-python-on-ubuntu-on-a-vps/ This entry was posted in Repost and tagged Image Crawler, Message queue, Python, RabbitMQ, Redis [...]

  2. graham said,

    October 19, 2012 at 22:20

    @Geoffrey I updated the post in Sep 2011. I now use Redis as my message queue, for all the reasons described in the (updated) conclusion.

  3. Geoffrey Hoffman said,

    October 19, 2012 at 21:27

    I’d like to know if you have any changes to report since 2009, when this awesome post was original published.

  4. JackeyZ said,

    September 25, 2012 at 03:20

    Great comments on these queue/messaging products and it brings me a fresh understanding of current popular ones.

    Thank Graham!

    BTW: I’m using beanstalkd and it works well.

  5. Hello, Gearman « Web Developer Notes said,

    June 11, 2012 at 15:20

    [...] Choosing a message queue for Python on Ubuntu on a VPS [...]

  6. David said,

    September 26, 2011 at 11:04

    @graham I didn’t choose 0MQ for few reasons: 1) python extensions quality was poor, troubles with portability MSW/linux 2) for duplex communication you need 2 connections 3) I ran into troubles with automatic reconnect 4) missing persistence

    I believe 0MQ is great and very fast for LAN/local computations where each component runs simultaneously all the time. My system is geographically distributed where components run in arbitrary intervals. So reliable automatic reconnect and delivery was primary requirement.

  7. Graham King said,

    September 24, 2011 at 01:46

    @David snakeMQ looks cool. Why didn’t you use ZeroMQ? I’m guessing you wanted a pure-python solution.

    Currently I’m still using Gearman on the original project I did this study for (fablistic.com), and I’m using Redis on a more recent project. I’d like to get a chance to use beanstalkd – it’s so lightweight and also a true work queue (it has ‘job completed’ semantics). I’m also loving Redis, because it does, well, everything.

    (BTW, 2s2b.com is an awesome idea!)

  8. David said,

    September 23, 2011 at 08:27

    Hi!

    I created a message queuing library snakeMQ http://www.snakemq.net for similar purposes. I needed a reliable communication between components in my project. My goal was to create a brokerless (no other third party components) and easy to use library with persistent queues where you don’t need to worry about connectivity. If you need pub-sub pattern then you can create a very simple broker above the library.

    What solution are you using now?

    David

  9. Graham King said,

    August 19, 2011 at 15:29

    @Doug ZeroMQ does sound interesting, thanks. On my list to investigate.

    @Abe Wow, that’s high volume, over 800 jobs / second. If that’s on a single machine, you may be hitting some OS or hardware limitation. I found an article [1] saying Digg moves 300k jobs a day through Gearman, which is nothing in comparison.

    I tried emailing you, but the email you used for your comment is invalid. I’d like to know more about your setup, if you really are moving that amount of data.

    [1] http://highscalability.com/blog/2009/1/13/product-gearman-open-source-message-queuing-system.html

  10. Abe Chen said,

    August 18, 2011 at 18:21

    We’ve struggled getting gearman consistently working in a high volume enviornment (25M+ jobs/day).

    We’ve had to death-watch queue consumers because of a particular behavior where trying to retrieve a job will block and not timeout. It seems run more reliably under quarter of the volume.

  11. » links for 2010-11-21 (Dhananjay Nene) said,

    November 21, 2010 at 21:03

    [...] Graham King » Choosing a message queue for Python on Ubuntu on a VPS Message Queues for #python http://ff.im/-u5QHx (tags: via:packrati.us python) [...]

  12. Doug said,

    November 20, 2010 at 22:18

    0MQ? http://www.zeromq.org/

    Looks pretty cool, sorry to see it didn’t make your bake-off.

  13. frymaster said,

    November 19, 2010 at 04:40

    one advantage of STOMP is that you can use activemq as your message server. You can then access it not only via STOMP, but also via other protocols that activemq uses. That being said, the second time around I just installed the ruby STOMP server as well, since that was the only protocol I used (and I saved the relatively high per-process overhead that java has)

  14. Jake said,

    October 19, 2010 at 04:13

    You don’t know how many articles I’ve perused trying to decide between BeanStalk and Gearman and for some reason yours just made my decision much easier :)

  15. Une API asynchrone avec Gearman, Sinatra et mongoID « Je code donc je suis said,

    June 22, 2010 at 09:17

    [...] il existe un « vieux » (1 an) comparatif de quelques serveurs de job Choosing a message queue for Python on Ubuntu on a VPS. Pour ajouter au trouble on peut trouver parmi eux des serveurs NoSQL type clé/valeur comme redis [...]

  16. Dhruv said,

    April 18, 2010 at 07:20

    You mentioned the memory for each of these queue on startup. However, during operation, the memory requirements may vary quite considerably, especially if your producers are faster than your consumers and the queue is caching messages(or is an in-memory queue). I too had a queue requirement that needed to be easily manageable and monitorable. Also, debugging apps. using this queue should be simple without requiring any special code in the app. to do rate limiting, etc… I also wanted to be able to close the queue in one direction at will, dump/load queues from files and load balance across different storage media. Hence, I came up with pymq (http://code.google.com/p/pymq/) which talks HTTP (anyone can write a client) and offers all the above mentioned features. You might want to have a look.

  17. Graham King said,

    October 20, 2009 at 19:58

    @Rich

    I’ve been using Gearman in production for two months now and it just works. Nothing to report. It sits there, shunting messages between my Django and my workers. Memory usage has hardly changed, I’ve never had to restart it, it’s perfect.

  18. Björn Lindqvist said,

    October 19, 2009 at 13:16

    Thanks a lot for the information! I have the exact same setup with django + lighttpd + python on a vps and your analysis is very helpful.

  19. Rich said,

    October 14, 2009 at 04:22

    Hi Graham. Thanks for posting this; it’s exactly what we are looking for right now!

    I wonder, have you got any further with gearman and are you still happy with it?

    Cheers

    rich

Leave a Comment

Note: Your comment will only appear on the site once I approve it manually. This can take a day or two. Thanks for taking the time to comment.