Introduction to Asynchronous Operations
Welcome back, future AI architect! In our journey with any-llm, we’ve learned how to connect to various LLM providers and get intelligent responses. So far, our interactions have been synchronous, meaning one operation completes entirely before the next one begins. While this is straightforward, it’s not always the most efficient, especially when dealing with tasks that involve waiting.
Think about ordering coffee. If you order, then wait for your coffee to be made, then order a pastry, then wait for that to be ready, that’s synchronous. What if you could order both at once, and while the coffee is brewing, the barista starts preparing your pastry? That’s closer to asynchronous!
In this chapter, we’ll dive into the exciting world of asynchronous programming with Python’s asyncio and how any-llm leverages it. This is crucial for building high-performance, responsive applications that interact with large language models, especially when you need to make multiple LLM calls concurrently. By the end, you’ll understand why and how to make your any-llm interactions non-blocking and super efficient!
Before we begin, make sure you’re comfortable with the basics of any-llm from previous chapters, including setting up providers and making basic completion calls. A foundational understanding of Python functions and variables will also be helpful.
Core Concepts: Why Go Asynchronous?
Why should we bother with async and await? The main reason is efficiency, particularly when your program spends a lot of time “waiting.” LLM API calls, like many network requests, are inherently I/O-bound tasks. This means your program sends a request to a server (e.g., OpenAI, Mistral, Ollama) and then waits for that server to process the request and send back a response. During this waiting period, your program isn’t doing much useful work.
Synchronous vs. Asynchronous
Let’s visualize the difference:
As you can see, in synchronous execution, each request blocks the program until its response arrives. In asynchronous execution, we can initiate multiple requests and then await their completion concurrently. This doesn’t mean they run in parallel on separate CPU cores (that’s multi-threading/multi-processing), but rather that Python can switch to doing other useful work (like initiating another request) while waiting for an I/O operation to complete. This is often called “concurrent” execution.
Python’s asyncio Basics
Python’s standard library provides asyncio for writing concurrent code using the async/await syntax. This allows a single-threaded program to manage multiple I/O-bound tasks efficiently.
async def: This keyword defines a coroutine. A coroutine is a special kind of function that can be paused and resumed. It doesn’t run immediately when called; instead, it returns a coroutine object.await: This keyword can only be used inside anasync deffunction. Whenawaitis used before anawaitable(like another coroutine or an I/O operation), it tells Python: “Hey, I’m going to wait here for this task to finish. While I’m waiting, you can go run otherasynctasks!”
Let’s look at a simple asyncio example:
import asyncio
import time
# Define an asynchronous function (coroutine)
async def say_hello(delay, message):
print(f"[{time.time():.2f}] Starting: {message}")
await asyncio.sleep(delay) # Simulate an I/O bound task (like an LLM call)
print(f"[{time.time():.2f}] Finished: {message}")
return f"Result for {message}"
# This is the entry point for running async code
async def main():
print(f"[{time.time():.2f}] Main function started synchronously.")
# Call coroutines - they don't run immediately, they return coroutine objects
task1 = say_hello(2, "Task 1")
task2 = say_hello(1, "Task 2")
# Await them concurrently using asyncio.gather
# This means both tasks start, and we wait for both to complete
results = await asyncio.gather(task1, task2)
print(f"[{time.time():.2f}] All tasks finished. Results: {results}")
# Run the main asynchronous function
if __name__ == "__main__":
# Python 3.7+ uses asyncio.run() to execute the top-level async function
asyncio.run(main())
Explanation:
- We define
say_helloas anasync deffunction. Inside it,await asyncio.sleep(delay)simulates a network call where the program waits fordelayseconds. - The
mainfunction is also anasync deffunction. task1 = say_hello(2, "Task 1")andtask2 = say_hello(1, "Task 2")don’t execute thesay_hellofunction immediately. Instead, they create coroutine objects.await asyncio.gather(task1, task2)is where the magic happens. It schedules bothtask1andtask2to run concurrently. Python will starttask1, thentask2, and whenever one hits anawait(likeasyncio.sleep), it can switch to the other.asyncio.run(main())is the standard way to run the top-levelasyncfunction in a Python script.
If you run this code, you’ll notice that “Task 1” and “Task 2” start almost at the same time, and the total execution time is closer to the longest task (2 seconds) rather than the sum of both (3 seconds). This is the power of asyncio!
Step-by-Step Implementation: Asynchronous any-llm Calls
Now, let’s apply this to any-llm. The any-llm library is designed with asyncio in mind, offering asynchronous versions of its core functions. Typically, the main completion or chat interface will have async methods or directly support await.
1. Setting Up Your Environment (Revisit)
First, ensure you have any-llm installed with the providers you intend to use. For this example, let’s assume we’re using mistral and ollama.
pip install 'any-llm-sdk[mistral,ollama]'
Remember to set your API keys as environment variables. For local models like Ollama, ensure it’s running.
export MISTRAL_API_KEY="YOUR_MISTRAL_API_KEY"
# For Ollama, no specific API key is needed, just ensure the server is running.
2. Making a Single Asynchronous Call
any-llm’s completion function can often be awaited directly if you’re within an async context.
Let’s create a new Python file, say async_llm_example.py:
import asyncio
import os
from any_llm import completion
from any_llm.schemas import CompletionRequest, ProviderConfig
# Ensure environment variables are set for the provider you want to use
# For Mistral: MISTRAL_API_KEY
# For Ollama: No API key, but ensure Ollama server is running
# For OpenAI: OPENAI_API_KEY
# ...and so on for other providers.
async def get_llm_response_async(prompt: str, provider: str) -> str:
"""
Makes an asynchronous call to any-llm for a completion.
"""
print(f"[{asyncio.current_task().get_name()}] Requesting from {provider} with prompt: '{prompt[:30]}...'")
try:
# The completion function from any_llm is designed to be awaitable
response = await completion(
CompletionRequest(
prompt=prompt,
model="mistral/mistral-large-latest" if provider == "mistral" else "ollama/llama3", # Example models
temperature=0.7,
max_tokens=50
),
provider=provider
)
print(f"[{asyncio.current_task().get_name()}] Received response from {provider}.")
return response.choices[0].text.strip()
except Exception as e:
print(f"[{asyncio.current_task().get_name()}] Error from {provider}: {e}")
return f"Error getting response from {provider}"
async def main():
print("Starting asynchronous LLM calls...")
# A single asynchronous call
prompt1 = "Explain the concept of quantum entanglement in a sentence."
response1 = await get_llm_response_async(prompt1, "mistral")
print(f"\nMistral Response: {response1}")
print("\nAll asynchronous calls completed.")
if __name__ == "__main__":
asyncio.run(main())
Explanation:
- We import
asyncio,os,completion,CompletionRequest, andProviderConfigfromany_llm. get_llm_response_asyncis anasync deffunction. Inside it,await completion(...)pauses the execution of this coroutine until the LLM provider responds, allowing otherasynctasks to run.- We use
asyncio.current_task().get_name()for clearer output, showing which task is printing. - In
main(), we callget_llm_response_asyncwithawaitto get a single response. asyncio.run(main())kicks off ourasyncioevent loop.
Run this script. You’ll see the output for the single request. This is the foundation.
3. Making Multiple Concurrent Asynchronous Calls
Now, let’s make multiple calls concurrently using asyncio.gather(). This is where the performance benefits truly shine.
Modify your async_llm_example.py main function to look like this:
# ... (imports and get_llm_response_async function remain the same) ...
async def main():
print("Starting multiple concurrent asynchronous LLM calls...")
prompts = [
("Explain the concept of quantum entanglement in a sentence.", "mistral"),
("Write a short, encouraging haiku about learning Python.", "ollama"),
("What is the capital of France?", "mistral"),
("Generate a random number between 1 and 100.", "ollama")
]
# Create a list of coroutine objects
tasks = [
get_llm_response_async(prompt, provider)
for prompt, provider in prompts
]
# Run all tasks concurrently and await their results
# asyncio.gather preserves the order of results corresponding to the tasks list
responses = await asyncio.gather(*tasks)
print("\n--- All concurrent calls completed ---")
for i, (prompt, provider) in enumerate(prompts):
print(f"Prompt {i+1} ({provider}): '{prompt[:30]}...'")
print(f"Response {i+1}: {responses[i]}\n")
if __name__ == "__main__":
asyncio.run(main())
Explanation:
- We define a list of
prompts, each paired with aprovider. - We then create a
taskslist, where each element is a coroutine object returned by callingget_llm_response_async. Remember, these functions don’t run yet. await asyncio.gather(*tasks)is the key. It takes all the coroutine objects intasks, schedules them to run concurrently, and waits until all of them have completed. The*unpacks the list of tasks into individual arguments forgather.- The
responseslist will contain the results from each coroutine, in the same order as thetaskslist.
Run this updated script. You should observe that the “Starting” messages for different prompts appear very quickly, and the “Finished” messages arrive as responses come back, likely overlapping significantly. The total time taken will be closer to the longest individual LLM call, rather than the sum of all of them, demonstrating the power of concurrency!
4. Handling Timeouts
LLM providers can sometimes be slow or unresponsive. It’s good practice to add timeouts to your asynchronous calls to prevent your application from hanging indefinitely. asyncio.wait_for is perfect for this.
Let’s wrap our completion call with asyncio.wait_for.
Modify get_llm_response_async slightly:
# ... (imports remain the same) ...
async def get_llm_response_async(prompt: str, provider: str, timeout_seconds: int = 15) -> str:
"""
Makes an asynchronous call to any-llm for a completion with a timeout.
"""
task_name = asyncio.current_task().get_name()
print(f"[{task_name}] Requesting from {provider} with prompt: '{prompt[:30]}...' (Timeout: {timeout_seconds}s)")
try:
response_task = completion(
CompletionRequest(
prompt=prompt,
model="mistral/mistral-large-latest" if provider == "mistral" else "ollama/llama3",
temperature=0.7,
max_tokens=50
),
provider=provider
)
# Await the response with a timeout
response = await asyncio.wait_for(response_task, timeout=timeout_seconds)
print(f"[{task_name}] Received response from {provider}.")
return response.choices[0].text.strip()
except asyncio.TimeoutError:
print(f"[{task_name}] Timeout occurred after {timeout_seconds} seconds for {provider}.")
return f"Error: Request to {provider} timed out."
except Exception as e:
print(f"[{task_name}] Error from {provider}: {e}")
return f"Error getting response from {provider}"
# ... (main function remains the same, but you can add timeout_seconds to your calls if desired) ...
Explanation:
- We added a
timeout_secondsparameter toget_llm_response_async. response_task = completion(...)now creates the coroutine object for theany-llmcall without immediately awaiting it.await asyncio.wait_for(response_task, timeout=timeout_seconds)then awaits this specific coroutine, but it will raise anasyncio.TimeoutErrorif it doesn’t complete within the specified time.- We added a
try...except asyncio.TimeoutErrorblock to gracefully handle these timeouts.
This is a critical best practice for robust production systems!
Mini-Challenge: Asynchronous Provider Comparison
Let’s put your new async skills to the test!
Challenge:
Write an async Python script that performs the following:
- Define a single, challenging prompt (e.g., “Summarize the history of artificial intelligence in 100 words.”).
- Make concurrent
any-llmcalls to at least three different providers (e.g.,mistral,ollama,openaiif you have keys for all). - Each call should have a reasonable timeout (e.g., 20 seconds).
- Print the response from each provider, clearly indicating which provider gave which response. Also, note if any provider timed out.
- Observe the execution time. Does it feel faster than doing them sequentially?
Hint:
- You’ll need
asyncio.gather()for concurrent execution. - Remember to set up environment variables for all your chosen providers.
- You can reuse the
get_llm_response_asyncfunction we just modified!
What to Observe/Learn:
- How efficiently
asynciocan handle multiple I/O-bound tasks. - The potential differences in response times and quality across various LLM providers for the same prompt when called concurrently.
- The importance of timeouts in managing external API dependencies.
Common Pitfalls & Troubleshooting
Working with asyncio can sometimes be tricky. Here are some common issues and how to resolve them:
Forgetting
await: This is probably the most common mistake. If you call anasync deffunction withoutawaiting it, you’re not actually running the coroutine; you’re just creating a coroutine object.- Symptom: Your
asyncfunction doesn’t seem to do anything, or the output appears instantly without waiting. - Fix: Ensure you
awaitall coroutine calls within anasync deffunction (e.g.,response = await get_llm_response_async(...)).
- Symptom: Your
Mixing
asyncandsyncincorrectly: You cannot directlyawaitanasync deffunction from a regulardeffunction. Similarly, you shouldn’t call blockingsynccode directly inside a busyasyncloop, as it will block the entire event loop.- Symptom:
RuntimeError: 'coroutine' object is not awaitableor yourasyncprogram becomes unresponsive. - Fix:
- Use
asyncio.run()as the entry point for your top-levelasyncfunction. - If you must run blocking code in an
asynccontext, useawait asyncio.to_thread(blocking_function, *args). - If you need to call
asynccode from asynccontext (e.g., in a web framework that’s not fully async), you might useasyncio.run()orasyncio.get_event_loop().run_until_complete()cautiously, but it’s generally better to make the entire call stack asynchronous.
- Use
- Symptom:
Incorrect
asyncio.run()usage:asyncio.run()should typically only be called once, at the very top level of your application. Calling it multiple times can lead toRuntimeError: Event loop is already running.- Symptom:
RuntimeError: Event loop is already running. - Fix: Structure your application so that a single
asyncio.run(main_async_function())call starts everything. Insidemain_async_function, you thenawaitall other coroutines.
- Symptom:
Unhandled Exceptions in
asyncio.gather: If one of the coroutines passed toasyncio.gatherraises an exception, by default,gatherwill immediately raise that exception, potentially leaving other tasks incomplete.- Symptom: Your application crashes if one LLM call fails, even if others might succeed.
- Fix: Use the
return_exceptions=Trueparameter inasyncio.gather. This will makegatherreturn the exception objects themselves instead of raising them, allowing you to process results and errors for all tasks.
# Example for return_exceptions=True results_or_exceptions = await asyncio.gather(*tasks, return_exceptions=True) for res in results_or_exceptions: if isinstance(res, Exception): print(f"Task failed with: {res}") else: print(f"Task succeeded with: {res}")
Summary
Phew! You’ve just unlocked a superpower for building responsive and efficient AI applications. Here’s what we covered:
- Synchronous vs. Asynchronous: Understood the fundamental difference and why asynchronous execution is vital for I/O-bound tasks like LLM API calls.
- Python
asyncioCore: Learned aboutasync deffor defining coroutines andawaitfor pausing execution to allow other tasks to run. any-llmAsynchronous API: Saw howany-llmintegrates seamlessly withasyncio, allowing you toawaitits core functions likecompletion.- Concurrent Calls with
asyncio.gather: Mastered making multiple LLM calls simultaneously, significantly speeding up applications that require many interactions. - Robustness with
asyncio.wait_for: Learned to implement timeouts to prevent indefinite hangs, making your applications more resilient. - Common Pitfalls: Explored typical mistakes and how to troubleshoot them effectively.
By leveraging asynchronous operations, you can build any-llm powered applications that are faster, more scalable, and provide a much better user experience. In the next chapter, we’ll delve deeper into performance tuning and advanced integration patterns, building on these asynchronous foundations. Get ready to optimize!
References
- Mozilla any-llm GitHub Repository
- Mozilla any-llm Documentation
- Python
asyncioDocumentation - Python
asyncio.gatherDocumentation - Python
asyncio.wait_forDocumentation
This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.