import stacklessA normal sequential execution takes about 17 seconds.
import time
import threading
def count(n):
while n > 0:
n -= 1
T = time.clock()
count(100000000)
count(100000000)
print time.clock() - T
>>> 17.37
T = time.clock()This confirms David's experiment. It takes much longer due to threads fighting over the GIL.
t1 = threading.Thread(target=count,args=(100000000,))
t1.start()
t2 = threading.Thread(target=count,args=(100000000,))
t2.start()
t1.join(); t2.join()
print time.clock() - T
>>> 53.22
T = time.clock()Twice as fast as the threaded solution, and roughly 50% slower than than the sequential solution. Cool.
stackless.tasklet(count)(100000000)
stackless.tasklet(count)(100000000)
while stackless.getruncount() > 1:
task = stackless.run(100)
if task:
task.insert()
print time.clock() - T
>>> 25.34
We can get much better results by increasing the granularity of the scheduler. If we change the tick count to 1000, the Stackless solution takes 17.71 seconds, and the threaded solution takes 22.25 seconds. This is interesting behavior, which I can't yet explain.
5 comments:
Here is something random, but.. in python2.6 there is also multiprocessing. I wanted to try this, but I only have a single core so it's a bit ridiculous I guess. And yes, the ranking comes out quite randomly, but Multiprocessing could be faster than Threading. How that changes when adding IPC is a big issue of course.
Two runs:
$ python2.6 vsthread.py
Sequential: 12.8867890835
Threaded: 13.4743509293
Multiproc: 13.142551899
$ python2.6 vsthread.py
Sequential: 13.8414778709
Threaded: 13.215129137
Multiproc: 13.1815469265
Code snippet http://python.pastebin.com/m7e0e5273
I stumbled upon a 'trivial solution' to the slowdown on multicore systems. What do you think?
For Zope (a threaded server) benchmarks showed that as a rule-of-thumb pystones/50 is a good value for checkinterval. Today it's mostly a value > 1000.
Some details and links can be found here:
http://plone.org/products/jarn.checkinterval
Surely the increased granularity simply reduces the number of context switches (whether involving locks or not) which would lead to a predictable speed-up?
Nice article !
Post a Comment