Monday, May 03, 2010

Consistent Hashing in Python

Update: This is not really a consistent hash. I'll fix the issues then repost. (Fixed here.)

Consistent Hashing is useful. I'm using it to build a Redis client with support for redundancy, and failover / node insertion.

import hashlib


class ConsistentHash(object):
def __init__(self, partition_size=2):
self.partition_size = partition_size
self.hashes = []

def partition(self, arg):
md5 = hashlib.md5(str(arg)).hexdigest()
partition = int(md5[:self.partition_size], 16)
return partition

def add(self, hash):
self.hashes.append(hash)
self.hashes.sort(cmp=
lambda a,b: cmp(self.partition(a), self.partition(b)))

def __getitem__(self, i):
return self.hashes[i%len(self.hashes)]

def hash(self, key):
i = self.partition(key) % len(self.hashes)
return self[i], self[i+1], self[i+2]




Testing with the below code produces this beautiful graph. Perfect. :-) If you want to add more weight to a particular hash (server address) just add it more than once.
if __name__ == "__main__":
from pygooglechart import PieChart2D
c = ConsistentHash()
keys = "ABCDEFG"
print len(keys)
for k in keys:
c.add(k)
d = {}
for i in xrange(10000):
h = c.hash(i)[0]
count = d.get(h, 0)
d[h] = count + 1
chart = PieChart2D(128, 128)
chart.add_data([d[i] for i in keys])
print chart.get_url()

2 comments:

Kent Johnson said...

Take a look at http://amix.dk/blog/post/19367

Simon Wittber said...

The algorithm in this post is incorrect. I've fixed it here.

I think the hash_ring implementation is overly complicated and hard to follow.

Popular Posts