Monday, May 03, 2010

Consistent Hashing in Python, Redux.

import hashlib
import bisect


class ConsistentHash(object):
def __init__(self, D=5):
self.keyspace = []
self.D = D

def partition(self, arg):
md5 = hashlib.md5(str(arg)).hexdigest()
return int(md5[:4], 16)

def add(self, hash):
for i in xrange(self.D):
k = self.partition("%s.%s"%(hash,i))
bisect.insort(self.keyspace, (k, hash))

def remove(self, hash):
self.keyspace = [i for i in self.keyspace if i[1] != hash]

def __getitem__(self, i):
return self.keyspace[i%len(self.keyspace)][1]

def hash(self, key):
p = self.partition(key)
i = bisect.bisect_left(self.keyspace, (p,None))
return self[i-1]

1 comment:

Jim Wilcoxson said...

Very beautiful! A few suggestions:

- if even distribution is important, the number of replicas must be very high, like 4K, 8K, or even higher. For memcached, I think they use 160 replicas, and other sites have recommended 100-200. I think of default of 5 is a little misleading, although for millions of keys, it may not be important

- instead of using only 4 bytes of the hash, you may want to use 8 or 16, especially if using a high number of replicas to get a more even distribution

- I think there may be an edge case problem with bisect. See:
http://michaelnielsen.org/blog/consistent-hashing/
When I compared your version with Michael's, both using the same number of hash bits, yours required many more replicas to get similarly even distribution of keys

Popular Posts