Powerful, flexible, and easy to program, Python is widely used for everything from web development to machine learning. By the two most cited measures, Python has even surpassed Java and C to become the most popular programming language of all. After years of huge popularity, Python might well seem unstoppable.
But Python faces at least one major hurdle to its future growth as a programming language. It’s called GIL, the global interpreter lock, and Python developers have been trying to remove it from the default Python implementation for decades.
Although GIL has a fundamental purpose, namely to ensure thread safety, it also creates a serious bottleneck for multithreaded programs. In short, GIL prevents Python from taking full advantage of multiprocessor systems. For Python to be a first-class language for concurrent programming, many believe that GIL has to go.
So far, attempts to remove the GIL have failed. But a new wave of efforts is building to make GIL a thing of the past and to make Python even more equipped to meet the programming requirements of the future.
Why Python has a GIL
Strictly speaking, the global interpreter lock is not part of Python in the abstract. It is a most widely used Python component. implementationCPython, which is maintained by the Python Software Foundation.
GIL ensures thread safety in CPython by allowing only one running thread at a time to execute Python bytecode. CPython’s memory management systems are not thread-safe, so the GIL is used to serialize access to objects and memory to avoid race conditions. If CPython didn’t have a GIL, it would have to handle concurrency and race conditions some other way.
What makes GIL such a big problem? For one, it prevents true multithreading in the CPython interpreter. That makes a whole class of code speedups, optimizations that are readily available in other programming languages, much more difficult to implement in Python.
Most developers circumvent the GIL in one way or another. He multiprocessing
The module, for example, allows you to run concurrent instances of the Python interpreter (each on its own physical thread) and share the work between them. However, because sharing data between Python instances creates a lot of overhead, multiprocessing
it only works well for certain classes of problems.
Another solution is to use Python extensions, usually written in C. These are executed outside of the Python interpreter, so their processing does not depend on the GIL. The problem is that this is only true as long as the work doesn’t involve Python objects, just C code and C data structures. So like multiprocessing
C extensions only solve a small class of problems.
As Python’s popularity grows, so does shame over a deficit like the GIL in the language. And so various efforts, past and present, have been launched to crack down on the GIL.
Get rid of the GIL
The problem, as you might guess, is that getting rid of the GIL is much easier said than done. The GIL has an important purpose. Its replacement must not only ensure the safety of the thread, but also meet a number of other requirements.
Among the many goals a GIL replacement must meet, these are the most crucial:
-
Enable concurrency. The big reward for having a GIL-less Python is true concurrency in the language. Replacing the GIL with another mechanism that does not allow concurrency is not progress.
-
Do not slow down single-threaded programs. Any GIL replacement that makes single-threaded programs run slower is a net loss, because the vast majority of Python software is single-threaded.
-
Do not break compatibility with previous versions. Existing Python software should not only run as fast as before, it should behave as expected.
-
Do not incur a significant maintenance cost. The Python development team does not have infinite resources or manpower. A GIL-less Python would have to be at least as maintainable as the existing interpreter.
Given the high demand for a GIL replacement, it is not surprising that all previous attempts to remove the GIL have stalled or failed.
Pablo Galindo, one of the five members of the Python Board of Directors who determine the direction of Python development, believes that removing the GIL is a realistic goal for Python, “but also very difficult.”
“The question is not really if it’s possible (we know it’s certainly possible),” Galindo said in an email interview. “The question is what is the real price? and if we as a community want to pay that price. This is also a complicated matter, because the price to pay is not evenly distributed either.”
The price of removing the GIL is paid not only by core Python developers, but also by all developers who use Python and those who maintain packages for the Python language.
Previous efforts to eliminate the GIL
Getting rid of the GIL is not a new idea. Previous efforts to de-GIL Python offer examples of the difficulties Galindo talks about.
The first formal attempts to get rid of GIL date back to 1996, when Python was at version 1.4. Greg Stein created a patch to remove the GIL, mostly as an experiment. It worked, but single threaded programs suffered a significant performance hit. Not only was the patch not adopted, but experience made it clear that removing the GIL was difficult. It would come at a huge development cost.
In recent years, as Python’s popularity has skyrocketed, more GIL removal projects have emerged. One widely discussed effort was Larry Hastings’ Gilectomy project, a fork of Python that employs several significant changes to reference counting and other internal mechanisms. The Gilectomy showed some promise, but it broke most of the existing CPython API, and even the most valiant work on Hastings’s part couldn’t make the Gilectomy perform as well as CPython.
Several other projects involved forking Python and rewriting it to better support parallelism. PyParallel, one such project, removed the GIL as a limitation to better parallelism without actually removing the GIL. PyParallel added a new module, parallel
, which allowed objects to communicate with each other over the TCP stack. While PyParallel successfully circumvented the GIL, the approach had limitations. For one thing, the parallel code had to communicate over the TCP stack (slow), rather than a shared memory mechanism (fast). PyParallel hasn’t been updated since 2016.
PyPy, the JIT compiler Python alternative, not only has its own GIL, but also a GIL removal project. The goal of STM (Software Transactional Memory) was to speed up multiple threads in parallel in PyPy, but here too the cost was a significant impact on single threaded performance, anywhere from 20% to 2x slower. The STM branch of PyPy is no longer in active development either.
Current efforts to eliminate the GIL
The poor record of previous attempts to remove the GIL has stimulated new thinking about the way forward. Perhaps the best approach is not eliminate the GIL, but, as PyParallel tried, to do it less of a hindrance to parallelism by circumventing it, then offer that functionality to the average Python developer.
In theory, Python modules like multiprocessing
and third-party projects like Dask already do it. One spins up several different copies of the interpreter, splits a task between them, and serializes object data between them if necessary. But multiprocessing
it comes with a lot of overhead, and third-party projects are just that: third-party offerings, not native components built into Python.
Some Python proposals are being worked on to improve this situation. None of them by themselves constitutes a solution; they are all just proposals. But collectively, they hint at the direction in which Python is moving.
Removing the GIL with subinterpreters
One project, PEP 684, is the “GIL per interpreter” project. The idea is to have multiple Python interpreters, each with their own GIL, running in a single process. In fact, Python supports doing this since version 1.5, but shells in the same process have always shared too much global state to achieve true parallelism. PEP 684 moves as much of the shared state as possible to each shell, so they can run in parallel with minimal interdependence.
But a big problem with this approach is how to share Python objects between interpreters. Sharing raw data, such as byte streams, isn’t difficult, but it’s also not very useful. Sharing rich Python objects is much more useful, but also much more difficult. However, any plan to allow true concurrency must include a way to share Python objects.
Galindo says that the subinterpreter approach (as it is also called) is a prime candidate for working around GIL and for providing a strategy for handling Python objects between interpreters. As Galindo told me in an email:
One of the attractive prospects of multiple interpreters is that it may be possible to pipe objects between these interpreters in the same memory space, without needing to order them through processes. This May it also helps with some aspects of the copy-on-write problem CPython has with multiple interpreters, but this remains to be seen as we are missing a complete implementation with a fully defined surface API.
In other words, there is a lot more work to be done on the internals of CPython before a per-interpreter GIL can happen.
Another proposal, originally raised in 2017, goes hand-in-hand with PEP 684. PEP 554 exposes multi-interpreter functionality to the average Python user as part of the standard library, rather than requiring them to write a C extension. Thus , as various interpreters become more useful, Python developers will have a standard way of working with them.
Other ideas to remove the GIL
Yet another proposal, raised in January 2023 and currently under active discussion, provides a way for developers to work on a GIL-less Python alongside existing Python.
PEP 703 adds a compile option to CPython to allow the interpreter to be compiled without a GIL. The default would still be to include the GIL, but Python developers could work to remove the GIL as part of CPython development directly, rather than in a separate project. Over time, and with enough work, the GIL-less version of Python could become the default build mode.
But this approach comes with multiple drawbacks. A major one is higher maintenance cost, not just for CPython but also for extensions that might break due to assumptions about CPython’s internals. Also, as with all previous attempts to remove the GIL, the changes in PEP 703 would result in a performance impact for single-threaded programs.
Whether Python makes the GIL optional, adopts subinterpreters, or takes another approach, the long history of effort and experimentation shows that there is no easy way to remove the GIL, not without huge development costs or rolling Python back from other sources. ways. But as data sets grow ever larger, and AI, machine learning, and other data processing workloads demand greater parallelism, finding an answer to GIL will be a key element in making Python a language for the future and not just for the present.
Copyright © 2023 IDG Communications, Inc.
Be First to Comment