import hashlib
import bisect
class ConsistentHash(object):
def __init__(self, D=5):
self.keyspace = []
self.D = D
def partition(self, arg):
md5 = hashlib.md5(str(arg)).hexdigest()
return int(md5[:4], 16)
def add(self, hash):
for i in xrange(self.D):
k = self.partition("%s.%s"%(hash,i))
bisect.insort(self.keyspace, (k, hash))
def remove(self, hash):
self.keyspace = [i for i in self.keyspace if i[1] != hash]
def __getitem__(self, i):
return self.keyspace[i%len(self.keyspace)][1]
def hash(self, key):
p = self.partition(key)
i = bisect.bisect_left(self.keyspace, (p,None))
return self[i-1]
Monday, May 03, 2010
Consistent Hashing in Python, Redux.
Subscribe to:
Post Comments (Atom)
Popular Posts
-
These are the robots I've been working on for the last 12 months. They each weigh about 11 tonnes and have a 17 meter reach. The control...
-
This hard-to-see screenshot is a Generic Node Graph Editing framework I'm building. I'm hoping it can be used for any kind of node...
-
So, you've created a car prefab using WheelCollider components, and now you can apply a motorTorque to make the whole thing move along. ...
-
MiddleMan: A Pub/Sub and Request/Response server in Go. This is my first Go project. It is a rewrite of an existing Python server, based o...
-
Why would I ask that question? Python 3 has been available for some time now, yet uptake is slow. There aren't a whole lot of packages i...
-
It is about 8 degrees C this morning. So cold, especially when last week we had high twenties. To help solve the problem, a friend suggeste...
-
I'm now using bzr instead of svn. I'm pushing my repositories to: http://exactlysimilar.org/bzr/ I'm also auto publishing docume...
-
After my last post, I decided to benchmark the scaling properties of Stackless, Kamaelia, Fibra using the same hackysack algorithm. Left axi...
-
Possibly slightly more correct lighting. The rim light is now only applied in the direction of the sun, rather than being purely based on vi...
-
I've just read a newspaper article (courtesy of Kranzky ) from WA Business News documenting the malfeasance, gross negligence and misc...
1 comment:
Very beautiful! A few suggestions:
- if even distribution is important, the number of replicas must be very high, like 4K, 8K, or even higher. For memcached, I think they use 160 replicas, and other sites have recommended 100-200. I think of default of 5 is a little misleading, although for millions of keys, it may not be important
- instead of using only 4 bytes of the hash, you may want to use 8 or 16, especially if using a high number of replicas to get a more even distribution
- I think there may be an edge case problem with bisect. See:
http://michaelnielsen.org/blog/consistent-hashing/
When I compared your version with Michael's, both using the same number of hash bits, yours required many more replicas to get similarly even distribution of keys
Post a Comment