import hashlib
import bisect
class ConsistentHash(object):
def __init__(self, D=5):
self.keyspace = []
self.D = D
def partition(self, arg):
md5 = hashlib.md5(str(arg)).hexdigest()
return int(md5[:4], 16)
def add(self, hash):
for i in xrange(self.D):
k = self.partition("%s.%s"%(hash,i))
bisect.insort(self.keyspace, (k, hash))
def remove(self, hash):
self.keyspace = [i for i in self.keyspace if i[1] != hash]
def __getitem__(self, i):
return self.keyspace[i%len(self.keyspace)][1]
def hash(self, key):
p = self.partition(key)
i = bisect.bisect_left(self.keyspace, (p,None))
return self[i-1]
Monday, May 03, 2010
Consistent Hashing in Python, Redux.
Subscribe to:
Post Comments (Atom)
Popular Posts
-
These are the robots I've been working on for the last 12 months. They each weigh about 11 tonnes and have a 17 meter reach. The control...
-
This hard-to-see screenshot is a Generic Node Graph Editing framework I'm building. I'm hoping it can be used for any kind of node...
-
Unfortunately I've not secured a venue for the GGJ. With 9 days left, things are not looking hopeful. It could be that GGJ Perth will no...
-
So, you've created a car prefab using WheelCollider components, and now you can apply a motorTorque to make the whole thing move along. ...
-
MiddleMan: A Pub/Sub and Request/Response server in Go. This is my first Go project. It is a rewrite of an existing Python server, based o...
-
Often, when building a game, you need to test if objects are colliding. The objects could be spaceships, rocks, mouse pointers, laser beams....
-
I've just read a newspaper article (courtesy of Kranzky ) from WA Business News documenting the malfeasance, gross negligence and misc...
-
Space is awesome. Especially when it is generated using Perlin noise, and some cool shaders. You can try it out over here.
-
I made something which lets you render very large worlds with a small farClipPlane. https://github.com/simonwittber/scaled-origin The d...
-
After my last post, I decided to benchmark the scaling properties of Stackless, Kamaelia, Fibra using the same hackysack algorithm. Left axi...
1 comment:
Very beautiful! A few suggestions:
- if even distribution is important, the number of replicas must be very high, like 4K, 8K, or even higher. For memcached, I think they use 160 replicas, and other sites have recommended 100-200. I think of default of 5 is a little misleading, although for millions of keys, it may not be important
- instead of using only 4 bytes of the hash, you may want to use 8 or 16, especially if using a high number of replicas to get a more even distribution
- I think there may be an edge case problem with bisect. See:
http://michaelnielsen.org/blog/consistent-hashing/
When I compared your version with Michael's, both using the same number of hash bits, yours required many more replicas to get similarly even distribution of keys
Post a Comment