Saturday, June 13, 2009

CPython Threading is Fundamentally Broken.

Most Python coders know that the Global Interpreter Lock (GIL) in CPython effectively serializes execution of threaded programs, even on systems with multiple CPU cores.

Despite this, Python threads are still useful in some situations, especially when working with blocking IO. Or so we thought.

The GIL problem is much, much worse than I expected.

David Beazley has published some slides which summarize his research into the GIL, which demonstrate how the implementation is fundamentally broken. The slides are clear and easy to understand and it is definitely worth taking the time to read and digest.

To summarize, the implementation of the GIL causes threads (on multicore machines) to compete with each other as they attempt to acquire the GIL. The scary part is, as you add more cores and more threads, the problem will grow exponentially, causing a massive degradation in performance when compared to serial execution. This isn't the only issue. Read the slides if you ever intend to use threads in CPython.

14 comments:

Jesse said...

Simon; his slides only show CPU bound execution, and everyone already knew adding threads to CPU bound apps within python caused a slowdown.

However, adding threads to I/O bound applications still gives a speedup, despite GIL contention.

schmichael said...

@Jesse,

It appears from the slides that even 1 CPU bound thread could potentially kill the performance for IO bound threads and cause lots of signal noise between threads as the OS tries to give the IO threads priority but the GIL refuses to give up on the CPU bound thread.

So it sounds like 1 misplaced "nonexistant_needle in big_haystack" statement could bring an otherwise IO bound multithreaded process to its knees.

manuelg said...

Technically, 1 misplaced "nonexistant_needle in big_haystack" statement could bring any _single_ _threaded_ process to its knees (forgetting to update an index variable in a loop, for instance). That is the nature of coding, multi or single threaded or whatnot. But you knew that already.

It would be nice to have support for "green threads" for CPU bound threading, on top of GIL/OS-threads for IO bound threading. The green threads would run in a single process, and multiple process support would have to be explicitly (for exampling, importing a module like "multiprocessing"), depending on the size and nature of the shared state of the threads.

Ycros said...

It's a good thing that Google is helping us get rid of the GIL then: Unladen Swallow Project Plan

Nosklo said...

Thanks for the slides, they are great! Now I have something to link to when telling people to not use threads in python.

Simon Wittber said...

@manuelg: Agreed. Thats exactly what Stackless provides. Application controlled threading, built into the language runtime.

Simon Wittber said...

@Jesse: I understand what you are saying, its true, we always knew that there was _some_ slowdown. What astounded me from the slides was the way the problem grew exponentially as more cores were added.

This is pretty fatal for threading in CPython.

Michael Foord said...

The green threading provided by stackless is not suitable for applications with very coarse grained units of work (i.e. nowhere suitable to yield to the other threads).

Thankfully the threading code I've written has been in IronPython, where we don't have the GIL, but the project I've been working on would be severely hampered by the GIL if we had started in CPython. (Threads are used on large shared datasets so multi-processes aren't suitable.)

Simon Wittber said...

@Michael: Right, just to be clear I'm not suggesting green threads are useful for solving problems which require genuine concurrency.

However, Stackless does have features which force a tasklet to yield after a number of ticks, which appears to be very similar to how regular Python works.

James Snyder said...

You might also be interested in David's talk, which he did at the ChiPy meeting this past Thursday. There's a bit more detail in his explanation than what you might get directly from the slides:

http://blip.tv/file/2232410

(props to Carl Karsten and Cosmin Stejerean for recording and getting the video up quickly :-)

David Beazley said...

Just to be clear, it's important to stress that my talk was about a very specific aspect of thread-programming involving CPU-bound processing. This obviously doesn't apply to every conceivable use of threads. For instance, if everything is I/O bound, you won't run into the problems I described. So, to say that threading is fundamentally broken is probably a bit extreme.

However, I think there is a class of problems where threads will be used with a certain mix of CPU-bound processing--even in applications with a lot of I/O. It's hard to say how the GIL impacts these applications, but it's probably worthy of some further study.

If anything, I think the main take-away from the talk should be just how difficult the underlying problem of the GIL really is--especially when multiple cores enter the picture. It will be interesting to see what the unladen swallow guys have up their sleeve.

Ben Sizer said...

I'm intrigued as to why the GIL is considered so difficult to remove in CPython when IronPython seems to not have it. What safety is the .NET runtime providing, and at what cost?

Seun Osewa said...

Actually, it's not that bad. The 'slowdown' disappears when you set sys.setcheckinterval(n) to a sane value. If you're not convinced, take a peek at my benchmark results.

Simon Wittber said...

Seun, you can see from my other posts that I've noted the same results.

I agree that a greater tick interval mitigates the problem.

However, a greater tick interval simply pushes multithreading closer to sequential execution. This is the same thing as saying: "My multi-threaded program is slow, but my sequential program is fast. Therefore, I can make my multi-threaded program faster by making it more sequential."

It sounds rather silly when you think about it like that.

Popular Posts