Site maintenance between Nov 4th - 11th. You might experience downtime this week.

What every programmer should know about the GIL

adminguy's picture
Posted January 28th, 2016 by adminguy

                        

Image Credit

 
Global interpreter lock (GIL) is a mechanism used in computer language interpreters to synchronize the execution of threads so that only one native thread can execute at a time. An interpreter that uses GIL always allows exactly one thread to execute at a time, even if run on a multi-core processor. Some popular interpreters that have GIL are CPython and Ruby MRI. -- The Wikipedia page on GIL
 
We are living in a world of multi-core CPU's which means that true parallelism is actually possible. This is both good and bad. It's good because our programs can run much faster if they are built to use concurrent threads, but its bad because multi-threaded code is dangerous, difficult to test, and even good programmers often get it wrong. 
 
The reference implementation for Python (CPython) and Ruby (MRI) both use the GIL to ensure safety of the interpreter's global state. 
 
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) -- Python Wiki on the GIL
 
Presumably  one of the reasons they need the GIL is because the garbage collector uses a reference counting algorithm which would need a major overhaul to get it to work in a thread safe way without the GIL. Greg Stein did submit some patches for the CPython interpreter in 1999 to remove the GIL but the new implementation was found to reduce the speed of single threaded code by about 40% which was clearly a non starter. Those patches were eventually rejected. Python 3.2 did see some significant improvements to the GIL, but it is still a global lock. At least for now it seems like the Python community learned to live with the GIL. See this blog post for more details.
 
The presence of a global lock doesn't mean we can't use parallelism, but it does mean that we have to use multiple processes instead of multiple threads. There are a few disadvantages of having to do this because threads share the same memory while processes do not. So processes have to use some external message passing mechanism to communicate with each other. Also there is a greater overhead involved in spawning new processes, compared to forking new threads. Nevertheless, in practice these issues often don't amount to much and multiprocessing does have some definite benefits:
  • The code is easier to write because it does not need to protect shared memory with mutex's.
  • Since multi-threading is difficult to test, writing single threaded code which runs in multiple processes often results in  code that is easily testable and has fewer (thread related) bugs.
  • Multiple processes can take advantage of a distributed architecture by executing processes on multiple machines, which can providing extremely high scalability.
  • Multiple processes can be started and stopped from outside the program, giving us more control of how much parallelism we want to achieve.
Here's an interesting Stack Overflow thread discussing the pros and cons of multiple threads and processes.
 
However, it is undeniable that multiple threads are the better solution for certain problems such as GUI code and where spawning multiple processes is a clear overhead. For such cases you can always use Python and Ruby interpreters which do not have the GIL. Jython and Iron Python do not have a GIL, neither does JRuby. So even though the reference implementations of Python and Ruby have a GIL, not all implementations suffer from that restriction, making life simpler for projects where the GIL is a real bottleneck. Here's a very good Stack Overflow thread with code samples for how to write multi threaded code in Python. And here's a beginners guide to concurrency and parallelism in Python.
 
If you would like to know more, you must read this excellent article which explains the GIL in a very detailed way. Also check out this video by David Beazley for everything you ever wanted to know about the GIL.