asyncio - Python
intro to async Python
What is asynchronous programming? It is a programming paradigm that allows a computer program to execute other independent tasks while waiting for long-running operations, like file or network I/O, to complete. Instead of blocking the program’s execution during these operations, as seen in traditional ‘synchronous programming’, the program can continue executing other tasks and resume the waiting operation once it’s done. To cite a very simplified scenario, given a program of 5 lines, if line 1 is a long-running operation and lines 2 to 5 are independent of line 1, they will be evaluated while awaiting line 1’s completion.
Why does ‘async’ matter in Python? It is generally believed - and empirically asserted - that Python is not (always) the best choice for writing highly performant and scalable programs due to the Global Interpreter Lock (GIL), and its relatively slower execution speed. However, these reasons do not hinder the use of the language in many systems and services across various industries. Besides, there are ongoing advancements to make this less of an obstacle. Since Python version 3.5, the asyncio package has been an integral part of the language, has seen significant improvements, and continues to be actively enhanced. It enables developers to write non-blocking code that scales better and by extension, use fewer resources. The content of this post is based on asyncio as of Python 3.12.
asyncio
Asyncio is a library for writing concurrent code using the async/await syntax.1 Basically, it is Python’s built-in package that makes asynchronous programming possible. It is the backbone behind many Python asynchronous frameworks that provide high-performance network and web servers, database connection libraries, distributed task queues, etc. It provides a set of high-level APIs to make asynchronous programming less of a herculean task in Python. As mentioned in the official documentation, it is often a perfect fit for IO-bound and high-level structured network code. It shines in the event of handling multiple I/O-bound tasks simultaneously, like database queries or API requests.
I used asyncio in a closely related scenario recently. I was interacting with an API that required a filter to return results. The filter param could be type A or type B, but in my case, I needed to retrieve both types A and B. I created two coroutines, one with a filter to get items of type A and the other to fetch items of type B, and with the help of the gather function in asyncio, I was able to execute both coroutines concurrently which returned results having items of both type A and B instead of having to do it one after the other!
key features and concepts in asyncio
the event loop
An event loop can be likened to a chef who takes orders, cooks, and serves. In this case, the chef takes one order and starts working on it, but while the water is boiling or something is frying, they quickly pick up the second order and get started with that as well in tandem with the first, and the process goes on in that order till all orders are served. The event loop coordinates tasks or coroutines for a program. It ensures that while a long-running task is waiting (e.g., for a network response or a file to be read), other tasks can continue running. It does this by keeping track of tasks that are ready to run and those that are waiting.
coroutines
Coroutines are a more generalized form of subroutines. Subroutines are entered at one point and exited at another point. Coroutines can be entered, exited, and resumed at many different points.
They can be implemented with the async def
statement.2 On the surface, they are just functions defined with the async def
statement indicating that they are ‘awaitables’ and should be invoked with the await
statement in order to return the desired results.
I think this helps the event loop to schedule other tasks while the coroutine is being awaited.
tasks
A task is a Future-like object that is used to run a Python coroutine in an event loop.
The create_task
high-level API in asyncio allows for the creation of tasks. It takes a coroutine as the main parameter.
Creating tasks to run multiple coroutines rather than directly using the asyncio.gather(...)
API allows for managing tasks individually if needed.
The individual tasks can either be canceled, have their status checked or have results retrieved outside gather
.
futures
Future objects are used to bridge low-level callback-based code with high-level async/await code.3 In asyncio, an object is a future if it satisfies either of these three conditions:
1. an instance of asyncio.Future
2. an instance of asyncio.Task
3. Future-like object with a _asyncio_future_blocking attribute
Simply put, an asyncio Future is an object that represents a placeholder for a result that hasn’t been computed yet. It’s like a promise that an operation will be executed in the future.
subprocesses
A subprocess in Python is a separate process that a Python script can start, to run an external program, command, or script. Sub-processes have diverse use cases which include executing shell commands, starting or running non-Python scripts, automating repetitive tasks by running external tools, and even offloading tasks to other processes without blocking your main program. The asyncio package provides high-level APIs for creating and managing sub-processes.
asyncio primitives
Asyncio provides primitives for synchronization. They are designed to be similar to the Python threading
module but have a couple of gotchas.
First, they are not thread-safe and hence not suitable for OS thread synchronization; resort to the threading module in that case.
The other is that methods of the synchronization primitives do not accept the timeout argument, meaning you can’t set a timeout value for its method which can result in perhaps an infinitely running process.
The alternative is to use asyncio.wait_for(...)
function to perform operations with timeouts.
The synchronization primitives are:
1. asyncio.Lock
2. asyncio.Event
3. asyncio.Condition
4. asyncio.Semaphore
5. asyncio.BoundedSemaphore
6. asyncio.Barrier
You can find detailed explanations and sample usage for each in the documentation.
runners
These are high-level asyncio primitives provided to run asyncio code.
Lifting from the documentation; they are built on top of an event loop with the aim to simplify async code usage for common wide-spread scenarios.
Usually, the main entry point for most asyncio programs is asyncio.run(...)
. Runners create and start a new event loop specifically for running the given coroutine.
Hence, you might run into a problem if you attempt to execute a coroutine using the run method in an environment that has an already running event loop (e.g. jupyter notebooks).
In such environments, you could just use the await
statement.
advantages of async programming
There are advantages to async programming. Some of which include:
1. improved I/O performance
2. better resource utilization
3. reduced CPU and memory usage for specific workloads
4. scalable applications
common use cases for asyncio
Some common use cases for asyncio are:
1. asynchronous HTTP server/client
2. making and managing concurrent API requests
3. web socket communication
4. async data connectors (e.g. database, message broker connectors)
5. custom async applications such as real-time dashboards
challenges and considerations
The following are some of the challenges and considerations regarding the use of the asyncio module:
- Debugging: async programs may have a non-linear/non-deterministic execution sequence or some unexpected side effects, hence, might be a bit more tricky to debug
- Blocking calls: by default, asyncio runs a single event loop on a single thread, so any blocking operation in that loop will pause the entire process until it finishes
- Thread safety: asyncio primitives are not thread-safe and one has to fall back to the threading module for OS thread synchronization
popular Python libraries built on asyncio
This section lists some popular Python frameworks and packages built on asyncio.
1. aiohttp: Async HTTP client and server for asyncio and Python
2. quart: A fast Python web microframework
3. FastAPI: A high-performance async web framework
4. asyncpg: Async PostgreSQL database library
5. aiormq: A pure python AMQP 0.9.1 asynchronous client library
6 websockets: A library for building WebSocket servers and clients in Python
Of course, the list is not exhaustive, there are many more frameworks and packages built on asyncio or for use with asyncio.
These packages enable you to maximize the async paradigm in your code as much as possible.
best practices for writing async Python code
Some best practices for writing async Python code include:
1. Leverage async third-party libraries and data connectors
2. Ensure blocking code doesn’t interfere with async code (i.e. block the event loop)
3. Use timeouts for long-running tasks using asyncio.wait_for(...)
4. Write tests using asynsc-supported test libraries, e.g. pytest-asyncio
conclusion
Naturally, this article only scratches the surface of what asyncio has to offer, the official documentation has more in-depth information on the package and provides code examples for reference. The package provides high-level APIs for immediate use without worrying too much about the workings underneath, but also avails low-level APIs to build upon (e.g. for library and framework development). Be sure to use it for operations it is a perfect fit for, i.e. IO-bound operations and high-level structured network code.
To wrap up, as lifted from the official Python docs, “If you want your application to make better use of the computational resources of multi-core machines,
you are advised to use multiprocessing
or concurrent.futures.ProcessPoolExecutor
”. It also advises the use of the threading
package for OS thread synchronization.