Sunday, June 14, 2009

Stackless vs GIL: It's a draw.

After thinking some more about David Beazley's research into the GIL, I realized the gravity of his comment that the GIL provides "Basically a kind of "cooperative" multitasking". I've spent 10 minutes knocking up some code in Stackless that demonstrates something similar to what David describes. I doubt many people knew Stackless has this capability.
import stackless

#A contrived CPU bound task
def factorial(c):
T = 1
for i in xrange(1, c):
T *= i
return T

#create two tasklets
stackless.tasklet(factorial)(512)
stackless.tasklet(factorial)(1024)

#used to track of how many task switches happen
switches = {}

#while there is more than the main task running...
while stackless.getruncount() > 1:
#run the schedule for 100 ticks
task = stackless.run(100)
#if we have a pre-empted task
if task:
#increment it's switch counter
C = switches.setdefault(task, 0)
switches[task] += 1
#insert it at the end of the schedule
task.insert()

print switches.values()

>>> [13, 26]

What is happening here?

Firstly, notice the factorial function. It is a contrived example of a CPU bound task that does not yield to the Stackless scheduler (by calling stackless.schedule) anywhere in it's body.

Next, two tasklets are installed into the schedule. One computes the factorial of 512, the other computes the factorial of 1024. If we call stackless.run() at this point, both tasks would run to completion, one after the other. This is because the factorial function never yields control back to the Stackless scheduler.

However, in this example, when we call stackless.run, we pass it an argument of 100. This asks Stackless to run each tasklet for a maximum of 100 'ticks'. If the tasklet exceeds this, it is interrupted and returned to main tasklet. This forces our factorial tasklet to yield control back to our main loop, at which point we can decide what to do with it. In this case, the best thing to do is insert at the back of the run queue to give other tasklets a chance to execute.

In the example above, the tasklets were forced to switch 13 and 26 times respectively. These few lines of code show how Stackless can almost achieve the same kind of threading that CPython provides, without using OS threads. I think that is pretty neat.

Why do I say "almost"? The Stackless solution does not provide everything the GIL does. For example, if a tasklet makes some C library call which blocks for a long time, all tasklets will block and wait, whereas in CPython other threads will get some CPU time. Still, the task switching in Stackless is much faster than OS thread switches, so if you're using mostly Python code, Stackless will outperform CPython, even with CPU bound tasks.

2 comments:

Anonymous said...

Function named factorial which doesn't calculate factorial? Weird...

Simon Wittber said...

Heh. Right. Duh. Fixing now. I did post this at 1AM! :-)

Popular Posts