import hashlib
import bisect
class ConsistentHash(object):
def __init__(self, D=5):
self.keyspace = []
self.D = D
def partition(self, arg):
md5 = hashlib.md5(str(arg)).hexdigest()
return int(md5[:4], 16)
def add(self, hash):
for i in xrange(self.D):
k = self.partition("%s.%s"%(hash,i))
bisect.insort(self.keyspace, (k, hash))
def remove(self, hash):
self.keyspace = [i for i in self.keyspace if i[1] != hash]
def __getitem__(self, i):
return self.keyspace[i%len(self.keyspace)][1]
def hash(self, key):
p = self.partition(key)
i = bisect.bisect_left(self.keyspace, (p,None))
return self[i-1]
Monday, May 03, 2010
Consistent Hashing in Python, Redux.
Subscribe to:
Post Comments (Atom)
Popular Posts
-
These are the robots I've been working on for the last 12 months. They each weigh about 11 tonnes and have a 17 meter reach. The control...
-
So, you've created a car prefab using WheelCollider components, and now you can apply a motorTorque to make the whole thing move along. ...
-
The procedural planet package has been updated to version 1.4, and you can see the new demo here . It features better city light control, be...
-
Why would I ask that question? Python 3 has been available for some time now, yet uptake is slow. There aren't a whole lot of packages i...
-
I've just finished refactoring an awful C# class. I had been delaying the job for a while because I didn't want to do it. Then, whil...
-
Summary: NodeJS wins. Test Program ab -n 10000 -c 5 http://localhost/ Gevent Code from gevent import wsgi class WebServer(object): def a...
-
Often, when building a game, you need to test if objects are colliding. The objects could be spaceships, rocks, mouse pointers, laser beams....
-
Dear Lazyweb. Imagine a nice RESTful interface for working with Tags. The URL: /tags/ will return a list of all the tags. The URL: /tags/fo...
-
I have just spent an hour trying to track down a weird bug in some Javascript interpolation code. The offending code looks like this: var n ...
-
I've built sites with Django, TurboGears and Pylons. I've come to prefer Pylons. Why? Pylons gets out of the way, and stays out of t...
2 comments:
Very beautiful! A few suggestions:
- if even distribution is important, the number of replicas must be very high, like 4K, 8K, or even higher. For memcached, I think they use 160 replicas, and other sites have recommended 100-200. I think of default of 5 is a little misleading, although for millions of keys, it may not be important
- instead of using only 4 bytes of the hash, you may want to use 8 or 16, especially if using a high number of replicas to get a more even distribution
- I think there may be an edge case problem with bisect. See:
http://michaelnielsen.org/blog/consistent-hashing/
When I compared your version with Michael's, both using the same number of hash bits, yours required many more replicas to get similarly even distribution of keys
Could you explain in layman's terms what each function does please? I'd really appreciate that! Thanks
Post a Comment