Most Python coders know that the Global Interpreter Lock (GIL) in CPython effectively serializes execution of threaded programs, even on systems with multiple CPU cores.
Despite this, Python threads are still useful in some situations, especially when working with blocking IO. Or so we thought.
The GIL problem is much, much worse than I expected.
David Beazley has published some slides which summarize his research into the GIL, which demonstrate how the implementation is fundamentally broken. The slides are clear and easy to understand and it is definitely worth taking the time to read and digest.
To summarize, the implementation of the GIL causes threads (on multicore machines) to compete with each other as they attempt to acquire the GIL. The scary part is, as you add more cores and more threads, the problem will grow exponentially, causing a massive degradation in performance when compared to serial execution. This isn't the only issue. Read the slides if you ever intend to use threads in CPython.
Subscribe to:
Post Comments (Atom)
Popular Posts
-
These are the robots I've been working on for the last 12 months. They each weigh about 11 tonnes and have a 17 meter reach. The control...
-
This hard-to-see screenshot is a Generic Node Graph Editing framework I'm building. I'm hoping it can be used for any kind of node...
-
So, you've created a car prefab using WheelCollider components, and now you can apply a motorTorque to make the whole thing move along. ...
-
MiddleMan: A Pub/Sub and Request/Response server in Go. This is my first Go project. It is a rewrite of an existing Python server, based o...
-
Often, when building a game, you need to test if objects are colliding. The objects could be spaceships, rocks, mouse pointers, laser beams....
-
I've just read a newspaper article (courtesy of Kranzky ) from WA Business News documenting the malfeasance, gross negligence and misc...
-
After my last post, I decided to benchmark the scaling properties of Stackless, Kamaelia, Fibra using the same hackysack algorithm. Left axi...
-
Possibly slightly more correct lighting. The rim light is now only applied in the direction of the sun, rather than being purely based on vi...
-
Update: This is another planet shader, with more physical fidelity. Shader "Planet" { Properties { _MainTex ("Diff...
-
At the last few GameJams, I've seen an increase in the use of RAD game tools, some of them even being developed by the participants them...
14 comments:
Simon; his slides only show CPU bound execution, and everyone already knew adding threads to CPU bound apps within python caused a slowdown.
However, adding threads to I/O bound applications still gives a speedup, despite GIL contention.
@Jesse,
It appears from the slides that even 1 CPU bound thread could potentially kill the performance for IO bound threads and cause lots of signal noise between threads as the OS tries to give the IO threads priority but the GIL refuses to give up on the CPU bound thread.
So it sounds like 1 misplaced "nonexistant_needle in big_haystack" statement could bring an otherwise IO bound multithreaded process to its knees.
Technically, 1 misplaced "nonexistant_needle in big_haystack" statement could bring any _single_ _threaded_ process to its knees (forgetting to update an index variable in a loop, for instance). That is the nature of coding, multi or single threaded or whatnot. But you knew that already.
It would be nice to have support for "green threads" for CPU bound threading, on top of GIL/OS-threads for IO bound threading. The green threads would run in a single process, and multiple process support would have to be explicitly (for exampling, importing a module like "multiprocessing"), depending on the size and nature of the shared state of the threads.
It's a good thing that Google is helping us get rid of the GIL then: Unladen Swallow Project Plan
Thanks for the slides, they are great! Now I have something to link to when telling people to not use threads in python.
@manuelg: Agreed. Thats exactly what Stackless provides. Application controlled threading, built into the language runtime.
@Jesse: I understand what you are saying, its true, we always knew that there was _some_ slowdown. What astounded me from the slides was the way the problem grew exponentially as more cores were added.
This is pretty fatal for threading in CPython.
The green threading provided by stackless is not suitable for applications with very coarse grained units of work (i.e. nowhere suitable to yield to the other threads).
Thankfully the threading code I've written has been in IronPython, where we don't have the GIL, but the project I've been working on would be severely hampered by the GIL if we had started in CPython. (Threads are used on large shared datasets so multi-processes aren't suitable.)
@Michael: Right, just to be clear I'm not suggesting green threads are useful for solving problems which require genuine concurrency.
However, Stackless does have features which force a tasklet to yield after a number of ticks, which appears to be very similar to how regular Python works.
You might also be interested in David's talk, which he did at the ChiPy meeting this past Thursday. There's a bit more detail in his explanation than what you might get directly from the slides:
http://blip.tv/file/2232410
(props to Carl Karsten and Cosmin Stejerean for recording and getting the video up quickly :-)
Just to be clear, it's important to stress that my talk was about a very specific aspect of thread-programming involving CPU-bound processing. This obviously doesn't apply to every conceivable use of threads. For instance, if everything is I/O bound, you won't run into the problems I described. So, to say that threading is fundamentally broken is probably a bit extreme.
However, I think there is a class of problems where threads will be used with a certain mix of CPU-bound processing--even in applications with a lot of I/O. It's hard to say how the GIL impacts these applications, but it's probably worthy of some further study.
If anything, I think the main take-away from the talk should be just how difficult the underlying problem of the GIL really is--especially when multiple cores enter the picture. It will be interesting to see what the unladen swallow guys have up their sleeve.
I'm intrigued as to why the GIL is considered so difficult to remove in CPython when IronPython seems to not have it. What safety is the .NET runtime providing, and at what cost?
Actually, it's not that bad. The 'slowdown' disappears when you set sys.setcheckinterval(n) to a sane value. If you're not convinced, take a peek at my benchmark results.
Seun, you can see from my other posts that I've noted the same results.
I agree that a greater tick interval mitigates the problem.
However, a greater tick interval simply pushes multithreading closer to sequential execution. This is the same thing as saying: "My multi-threaded program is slow, but my sequential program is fast. Therefore, I can make my multi-threaded program faster by making it more sequential."
It sounds rather silly when you think about it like that.
Post a Comment