Extra - AsyncOpenAI - Calling OpenAI APIs Asynchronously with Pandas

  • ✦ The AsyncOpenAI class is a Python wrapper for the OpenAI API that allows users to perform asynchronous requests to the API. The class inherits from the OpenAI class and overrides some of its methods to use the asyncio library for concurrency. The AsyncOpenAI class provides the following benefits:

    • Faster performance: Users can send multiple requests to the API in parallel and wait for the results without blocking the main thread. This can improve the efficiency and responsiveness of the application.
    • Easier error handling: Users can use the try/except syntax to catch and handle any exceptions raised by the API requests. The class also provides a cancel method to cancel any pending requests.
    • Simpler code: Users can use the async/await syntax to write concise and readable code that works with the API. The class also supports the async with context manager to automatically close the session when done.
  • ✦ Here is the brief documentation from the README.md from OpenAI’s official GitHub repositoryopenai-python (openai/openai-python: The official Python library for the OpenAI API (github.com)

import asyncio  
from openai import AsyncOpenAI  
  
client = AsyncOpenAI(  
# defaults to os.environ.get("OPENAI_API_KEY")  
api_key="My API Key",  
)  
  
  
async def main() -> None:  
chat_completion = await client.chat.completions.create(  
messages=[  
		{  
		"role": "user",  
		"content": "Say this is a test",  
		}  
	],  
	model="gpt-4o-mini",  
)  
  
  
asyncio.run(main())

Integrating with Pandas (with Example)

Pandas DataFrames are a staple for data manipulation and analysis. However, when it comes to making API calls for each row in a DataFrame, things can get a bit tricky. Traditional methods of looping through a DataFrame and making synchronous API calls can be time-consuming, especially when dealing with large datasets. This is where asynchronous operations come into play.

Asynchronous operations allow multiple tasks to be executed concurrently, rather than sequentially. This means that while one task is waiting for a response (such as an API call), other tasks can continue to execute. This can significantly reduce the overall time required to process large datasets.

So now let’s us sum up all these into the working code.

  • ✦ Install the packages
!pip install openai  
!pip install nest_asyncio

Import the packages
  • ✦ Import the packages
import pandas as pd 
import seaborn as sns 
from openai import AsyncOpenAI 
import asyncio import nest_asyncio
  • ✦ Import and Preview the Dataset
  • ✦ Set up the “AsyncOpenAI” client object
Set up the “AsyncOpenAI” client object
  • ✦ Define the Async Functions
nest_asyncio.apply() 

EMBEDDING_ENGINE = 'text-embedding-3-small' 

async def get_embedding(text): 
		response = await client.embeddings.create(input=text, model=EMBEDDING_ENGINE) 
		return response.data[0].embedding 
		
def apply_async_get_embedding(dfi): 
	loop = asyncio.get_event_loop() 
	tasks = [loop.create_task(get_embedding(row['Review'])) for _, row in dfi.iterrows()] 
	return loop.run_until_complete(asyncio.gather(*tasks))
  • ✦ Apply the Async Function to the DataFrame (df)
    • Now we can see the returned embeddings are stored in the last column embedding of the DataFrame df.