Chapter 2: Connecting to LLM Providers

Welcome back, aspiring data extractor! In Chapter 1, you successfully set up your development environment and installed LangExtract. That’s a fantastic first step! But right now, LangExtract is like a powerful car without an engine. It has the structure, but it can’t do anything until we give it the “brain” – a Large Language Model (LLM).

In this chapter, we’re going to connect LangExtract to a real LLM provider. This is where the magic happens! You’ll learn how to securely manage your API keys, configure LangExtract to use different LLM services (like Google’s Gemini or OpenAI’s GPT models), and understand why these steps are absolutely crucial for your extraction tasks. By the end of this chapter, LangExtract will be ready to tap into the intelligence of cutting-edge AI models, setting the stage for some truly amazing data extraction.

The Brain Behind the Extraction: LLM Providers

Think of LangExtract as a sophisticated orchestrator. It doesn’t contain the massive intelligence of a Large Language Model itself. Instead, it acts as a skilled conductor, sending your text and instructions to external LLM services (the “providers”) and then interpreting their responses. These LLM providers host powerful AI models that can understand, reason, and generate human-like text, making them perfect for structured data extraction.

Why Do We Need API Keys? Your Access Pass to AI Power!

When you use an LLM provider like Google or OpenAI, you’re accessing their cloud-based services. To ensure security, track usage, and manage billing, these providers require an API Key. An API Key is essentially a secret token that authenticates your requests, proving that you have permission to use their services.

Why is it a secret? Just like your house key, you wouldn’t leave your API key lying around for anyone to find! If someone gets hold of your API key, they could use your account, potentially incurring significant costs or misusing the AI services under your name. Therefore, managing your API keys securely is paramount.

Environment Variables: The Gold Standard for Secrets

So, how do we use API keys in our code without exposing them directly in our scripts (which would be a massive security risk if you ever shared your code online)? The answer is environment variables.

Environment variables are dynamic named values that can affect the way running processes behave on a computer. They live outside your code, making them a secure and flexible way to store sensitive information like API keys. LangExtract, like many modern Python libraries, is designed to look for these keys in your environment.

Here’s a simplified view of the connection process:

Step-by-Step Implementation: Getting Connected

Let’s get hands-on and connect LangExtract to an LLM provider. We’ll focus on using environment variables for security and demonstrate with both Google’s Gemini and OpenAI’s GPT models, as these are common choices as of early 2026.

Step 1: Install `python-dotenv`

First, if you haven’t already, install the python-dotenv library. This handy package makes it super easy to load environment variables from a .env file into your Python application.

pip install python-dotenv==1.0.1

(Version 1.0.1 is a stable, widely used version as of 2026-01-05. Always check for the absolute latest if you encounter issues, but this is a reliable starting point.)

Step 2: Obtain Your LLM Provider API Key

You’ll need an API key from your chosen LLM provider.

For Google Gemini:
- Visit the Google AI Studio or Google Cloud Console.
- Create a new project or select an existing one.
- Navigate to the “API keys” section (often under “Credentials” or “API & Services”).
- Generate a new API key. It will usually start with AIza....
- Official Documentation: Google Cloud API Keys (search for “API Keys” on cloud.google.com if the direct link changes).
For OpenAI GPT:
- Go to the OpenAI platform website (platform.openai.com).
- Log in or sign up.
- Navigate to “API keys” under your user settings.
- Create a new secret key. It will usually start with sk-....
- Official Documentation: OpenAI API Keys

Important: Copy your API key immediately after creation, as some platforms (like OpenAI) only show it once.

Step 3: Create a `.env` File

In the root directory of your LangExtract project (the same folder where your Python scripts will live), create a new file named .env.

Open this .env file and add your API key(s) like this:

# .env file content
GOOGLE_API_KEY="YOUR_GOOGLE_GEMINI_API_KEY_HERE"
OPENAI_API_KEY="YOUR_OPENAI_GPT_API_KEY_HERE"

Replace "YOUR_GOOGLE_GEMINI_API_KEY_HERE" and "YOUR_OPENAI_GPT_API_KEY_HERE" with your actual keys! If you’re only using one provider, you only need to include that provider’s key.

Step 4: Load Environment Variables and Initialize LangExtract

Now, let’s write some Python code to load these environment variables and prepare LangExtract. Create a new Python file, for example, app.py.

# app.py
from dotenv import load_dotenv
import os
import langextract as lx

# --- Step 1: Load environment variables from .env file ---
# This line looks for a .env file in the current directory and loads
# any key-value pairs found there into your script's environment.
load_dotenv()

print("Environment variables loaded.")

# --- Step 2: Access API keys from environment variables ---
google_api_key = os.getenv("GOOGLE_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

if not google_api_key and not openai_api_key:
    print("WARNING: No LLM API keys found in environment variables. Extraction will fail.")
else:
    if google_api_key:
        print("Google API Key found.")
    if openai_api_key:
        print("OpenAI API Key found.")

# --- Step 3: Configure LangExtract with your chosen LLM provider ---
# LangExtract is designed to be provider-agnostic. You typically specify
# the model you want to use directly in the extraction call.
# LangExtract will then automatically look for the corresponding
# API key in your environment variables (e.g., GOOGLE_API_KEY for Google models,
# OPENAI_API_KEY for OpenAI models).

# Let's try to initialize a dummy extractor to confirm setup
# Note: LangExtract itself doesn't have a global 'init' for providers.
# Instead, the provider and model are specified when you call `lx.extract`.
# We'll just confirm our keys are loaded for now.

# You're all set! The actual LLM integration happens when you call lx.extract()
# with a specific model name, which we'll cover in the next chapter.
# For example:
# result = lx.extract("Some text", schema, model="gemini-pro")
# or
# result = lx.extract("Some text", schema, model="gpt-4")

print("\nLangExtract is now ready to use LLM providers based on available API keys.")
print("Proceed to the next chapter to define your first extraction task!")

Explanation of the code:

from dotenv import load_dotenv: Imports the function we need to load our .env file.
import os: Imports the os module, which allows us to interact with the operating system’s environment variables.
load_dotenv(): This is the magic line! It reads your .env file and makes its contents available as environment variables within your Python script.
os.getenv("VARIABLE_NAME"): This function retrieves the value of an environment variable. We use it to fetch our GOOGLE_API_KEY and OPENAI_API_KEY.
The if/else block simply checks if the keys were successfully loaded and prints a helpful message.
Important Note on LangExtract Initialization: Unlike some libraries that require an explicit init() call for a provider, LangExtract is designed to be highly flexible. It often determines which provider to use based on the model name you provide during the lx.extract() call (e.g., model="gemini-pro" implies Google, model="gpt-4" implies OpenAI). It then automatically uses the corresponding API key found in your environment variables. This simplifies the setup process!

Run this script from your terminal:

python app.py

You should see output similar to this (depending on which keys you set):

Environment variables loaded.
Google API Key found.
OpenAI API Key found.

LangExtract is now ready to use LLM providers based on available API keys.
Proceed to the next chapter to define your first extraction task!

If you only set one key, you’d only see that one reported. If you forgot to set any, you’d see the warning.

Mini-Challenge: Connect to a Specific Provider

Let’s ensure you’ve got the hang of connecting.

Challenge: Modify your app.py file to explicitly check for only the OPENAI_API_KEY and print a success message if found, otherwise print a message indicating it’s missing. Remove the Google API key check for this challenge.

Hint: You’ll only need to change the if/else block in app.py to focus on openai_api_key.

What to observe/learn: This exercise reinforces how os.getenv() works and how you can target specific environment variables. It also helps you confirm that your API key for OpenAI (or your chosen provider) is correctly loaded.

Common Pitfalls & Troubleshooting

Connecting to external services often comes with a few common hiccups. Here’s how to debug them:

“WARNING: No LLM API keys found…” or None for os.getenv():
- Cause: The .env file is missing, not in the correct directory (it should be in the same folder you run your Python script from), or the variable name in .env doesn’t match what you’re looking for (e.g., GOOGLE_API_KEY vs. GOOGLE_APIKEY).
- Fix: Double-check the .env file’s name and location. Verify variable names are exact (GOOGLE_API_KEY, OPENAI_API_KEY). Ensure there are no leading/trailing spaces around the variable name or value in the .env file. Also, make sure you’ve installed python-dotenv.
KeyError or AuthenticationError (when you start using lx.extract later):
- Cause: Your API key is incorrect, expired, revoked, or has insufficient permissions for the model you’re trying to use.
- Fix: Go back to your LLM provider’s console (Google AI Studio/Cloud, OpenAI Platform) and regenerate a new API key. Update your .env file with the new key. Also, ensure you have sufficient credits or a valid subscription with the provider.
Network Issues:
- Cause: Your internet connection is down, or there’s a temporary issue with the LLM provider’s service.
- Fix: Check your internet connection. You can also visit the provider’s status page (e.g., Google Cloud Status, OpenAI Status) to see if there are any ongoing outages.

Summary

Phew! You’ve successfully laid the groundwork for powerful AI-driven extraction. Let’s recap what we’ve learned:

LLM Providers are the brains: LangExtract leverages external LLM services (like Google Gemini and OpenAI GPT) for its intelligence.
API Keys are your access: These secret tokens authenticate your requests to LLM providers.
Environment Variables are for security: Storing API keys in .env files and loading them with python-dotenv is the secure and recommended practice.
LangExtract’s flexible setup: It automatically uses the appropriate API key from your environment variables based on the LLM model you specify during the lx.extract() call.

You’re now ready to move beyond just connecting. In the next chapter, we’ll dive into the exciting world of defining your extraction schema – telling LangExtract exactly what kind of structured data you want it to pull from your unstructured text!

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.