Sunday, June 14, 2009

Stackless vs GIL: Benchmarks.

Following on from my last post, I decided I should check my assertion that "Stackless will outperform CPython, even with CPU bound tasks." I'm not saying the GIL is bad, I'm just pointing out how the same behavior can be achieved with Stackless. If these tasks called some random C function which blocked, but still yielded the GIL, it is likely that CPython would come out on top.
import time
from threading import Thread
import stackless

def factorial(c):
T = 1
for i in xrange(1, c):
T *= i
return T

#Benchmark stackless using 500 tasklets
T = time.clock()
for i in xrange(500):
stackless.tasklet(factorial)(1024)
while stackless.getruncount() > 1:
task = stackless.run(100)
if task:
task.insert()
print time.clock() - T

#Benchmark OS threads using 500 threads
T = time.clock()
threads = []
for i in xrange(500):
thread = Thread(target=factorial, args=(1024,))
thread.start()
threads.append(thread)
for thread in threads: thread.join()
print time.clock() - T

>>> 0.5
>>> 0.77
On my dual core machine, Stackless performs around 30%-40% faster than regular threads. This is usually not a suprise, we all know that IO bound threads always come with a penalty in CPython. However, these Stackless tasklets are being pre-empted in the same way that the GIL works. This is something my Fibra framework and other similar frameworks which are based on generators, can never achieve.

Very Interesting!

Update: Fixed factorial funcion and timings.

1 comment:

Lawouach said...

I'm not sure I understand your post.

> This is usually not a suprise, we all know that IO bound threads always come with a penalty in CPython.

I thought you were discussing CPU bound threads rather than IO. Also I thought IO bound threads weren't usually impaired by the GIL in practice.

> However, these Stackless tasklets are being pre-empted in the same way that the GIL works.

Could you expand as it's a bit unclear to me.

Popular Posts