Guide to improve Python performance.

Python is an amazing programming language, but it has two huge handicaps compared to compiled languages.

Luckily there are several ways to speed up your Python code.

Bypass GIL (Concurrency/GIL persist)

In this approach, the execution of the target code is performed in such a way that data is processed in parallel or concurrently. Essentially, it simply means breaking a single task into multiple separate sub-tasks and processing each in a different thread or process. This is also known as multithreading or multiprocessing. This approach is very effective, only if your task can be separated into different tasks.

Before moving on it is important to distinguish thread and process. A thread and a process are two different things, which simply can be used in a similar fashion, but for different purposes. A single process can create multiple threads, which are limited by OS. All of the threads of any single process share the same memory heap. However, in Python new thread does not mean new CPU core; hence, the thread does not give you an actual performance boost and rather helps you focus on several tasks simultaneously. Therefore, threads are cheap to create and mostly applied for I/O operations. In contrast, when you create a new process the original memory heap is copied and a new one is created. Two processes cannot see each other’s memory heap, but do work on separate CPU cores. Creating a new process is expensive and ends up in larger overall memory usage. However, the new process provides you with additional CPU cores (up to maximum) and can be created locally or in different computers within a cluster. [3,4]

Here are some examples:

Python provides several built-in packages for concurrent processing.

Concurrency vs True Parallelism: Parallelism based on concurrency is not true parallelism. A piece of code is executed in a truly parallel way when all of the CPU cores literally work on the same task simultaneously. True parallelism is based on the actual hardware and in general a more complicated topic than concurrency. Python’s GIL prevents true parallel execution of the code. However, fortunately, there are ways to release the GIL and achieve truly parallel processing. [1]

No GIL (True parallelism) + Optional Static Typing

In order to release GIL and boost your Python code, you are going to need to get your hands at least a little bit dirty. Python is a great high-level programming language, which is also written in other low-level programming languages. The original and most popular Python is in fact is implemented in C and this reference implementation is called CPython. However, there are other implementations of Python with its own advantages and issues. Each is developed by a different community; hence, every Python implementation is as strong as its community. [5]

What about Python packages? At this point, you may ask yourself a question. Are Python packages that I currently use are also available in other Python implementations? Unfortunately, the answer is ambiguous, since it really depends on the implementation. It is definite that there are going to be some incompatibilities. In order to make sure, always check the documentation material.

Here are some popular alternative implementations of Python.

Following are not Python implementations

Important Note: Every approach described in this section has comparable advantages and disadvantages, best usage applications, best practices, learning curves. However, you will need to get your hands dirty if you want to get the best performance out of any approach.

No GIL (True parallelism) + Mandatory Static Typing

The following methods are different from all the methods above because they require sufficient knowledge of a second low-level programming language like C/C++. This section rather describes tools/methods that act as a “bridge” between Python and extension written in a low-level programming language. Therefore, performance is only limited by the low-level programming language and the bridge. At this point, two important questions must be stated before we continue. [5]

There is no single answer to these questions because it depends on what you want to do. Both compiler configuration and automation can be managed using a built-in Python module called distutils, which is very easy to use. However, it really depends on how you design your extension if you want to use it cross-OS. Nevertheless, when it comes to data types there are three logical approaches.

Let C side handle data types:

Let Python side handle data types

Let the bridge handle data types.


To summarize, let’s make very rough generalizations. Bypassing GIL is very easy to implement and can significantly improve overall performance. In addition, multithreading allows simultaneous I/O without blocking your main code. However, concurrency does not mean true parallelism and can be applied only for specific tasks. Nevertheless, in the second section, we learned about different ways on how to release GIL and still keep it Python. Alternative Python implementations can achieve true parallelism by releasing GIL without losing the flexibility of Python. On the other hand, Cython may require some experience to get the most out of it. Similarly, data analysis packages like Pandas can produce lightning performance if used correctly, but its application depends on the nature of the task. Finally, for those who are not afraid to get hands dirty, GIL can be released via C extensions. C extensions can provide total control over your code and the best execution speed. However, C knowledge is required because too much control is too much responsibility.


  1. “Concurrency vs. Parallelism,” HowToDoInJava. [Online]. Available: [Accessed: 18-Mar-2020].
  2. K. W. Smith, Cython: a guide for python programmers, First Edition. Beijing Cambridge Farnham Köln Sebastopol Tokyo: O’Reilly, 2015.
  3. A. J. J. Davis, “Grok the GIL: Write Fast and Thread-Safe Python.” [Online]. Available: [Accessed: 18-Mar-2020].
  4. “Java Multi-threading Tutorials,” HowToDoInJava. [Online]. Available: [Accessed: 18-Mar-2020].
  5. M. Summerfield, Python in practice: create better programs using concurrency, libraries, and patterns. Addison-Wesley, 2013.

[programming python concurrency parallelism ]