Chapter 2: Understanding Large Language Models (LLMs) & AI APIs

Welcome back, future Applied AI Engineer! In Chapter 1, we laid the groundwork with foundational programming and system thinking. Now, it’s time to dive into the exciting world of Large Language Models (LLMs) – the brainpower behind most modern AI applications, including the sophisticated AI agents we’ll be building.

This chapter will equip you with a solid understanding of what LLMs are, how they work at a high level, and, crucially, how to interact with them programmatically using AI APIs. This isn’t just theory; we’ll get hands-on with Python, making your very first calls to an LLM, setting the stage for building intelligent applications. Understanding this interaction is paramount, as AI agents rely heavily on these models to reason, plan, and execute tasks.

By the end of this chapter, you’ll be able to explain core LLM concepts, confidently set up your environment, and send instructions to an LLM, receiving and processing its responses. Ready to unlock the power of AI? Let’s go!

What are Large Language Models (LLMs)?

Imagine a super-intelligent autocomplete system that has read a significant portion of the internet. That’s a simplified, yet surprisingly accurate, way to think about a Large Language Model.

At their core, LLMs are advanced neural networks trained on colossal datasets of text and code. Their primary function is to predict the next word or token in a sequence, based on the input they receive. This seemingly simple task enables them to perform an astonishing array of language-related functions:

Generating text: Writing articles, stories, emails, or code snippets.
Summarizing information: Condensing long documents into concise summaries.
Translating languages: Converting text from one language to another.
Answering questions: Providing informative responses based on their training data.
Reasoning and problem-solving: Engaging in logical thought processes to tackle complex prompts.

The “Large” in LLM refers to the sheer number of parameters (billions, even trillions) in their neural network architecture and the massive scale of their training data. This scale allows them to develop a deep understanding of language patterns, context, and even some aspects of world knowledge.

How Do We Interact with LLMs? The Role of AI APIs

While the underlying models are incredibly complex, interacting with them is surprisingly straightforward, thanks to AI Application Programming Interfaces (APIs). Most developers don’t host or train LLMs themselves; instead, we access powerful, pre-trained models provided by companies like OpenAI, Anthropic, or Google via their APIs.

Think of an AI API as a well-defined communication channel. You send a request (your instructions or “prompt”) to the API, and the API forwards it to the LLM. The LLM processes your request, generates a response, and sends it back to you through the API.

This interaction typically follows a standard request-response pattern over HTTPS, often using JSON for data exchange. To access these APIs, you usually need an API key for authentication, which identifies you and tracks your usage.

Let’s visualize this interaction:

Key Concepts in LLM Interaction

Before we start coding, let’s clarify some fundamental terms you’ll encounter when working with LLMs:

Tokens

What’s a “token”? It’s not always a whole word! LLMs break down text into smaller units called tokens. For English, a token can be a word, a part of a word (like “un” or “ing”), a punctuation mark, or even a space. For example, the phrase “Hello, world!” might be broken into “Hello”, “,”, " world", “!”.

Why are tokens important?

Cost: Most LLM APIs charge based on the number of tokens processed (both input and output).
Context Window: LLMs have a limited “memory” or context window, measured in tokens.

Context Window

Every LLM has a maximum number of tokens it can process in a single interaction – this is its context window. It includes both your input prompt and the LLM’s generated response. If your input is too long, or if the conversation history exceeds this limit, the LLM won’t be able to “remember” earlier parts of the conversation, or it might simply reject the input. Modern LLMs are constantly increasing their context windows (e.g., Gemini 1.5 Pro offers 1 million tokens!), but it’s still a critical constraint to manage.

Temperature

Temperature is a parameter that controls the “creativity” or “randomness” of the LLM’s output.

A higher temperature (e.g., 0.8-1.0) leads to more varied, surprising, and potentially less coherent responses. It’s great for creative writing or brainstorming.
A lower temperature (e.g., 0.0-0.2) makes the output more deterministic, focused, and repeatable. Ideal for tasks requiring accuracy, like summarization or code generation.
A value of 0.0 often means the model will try to pick the most probable token every time.

Top-P (Nucleus Sampling)

Similar to temperature, top_p also influences the diversity of the output. Instead of picking tokens based on a probability distribution (like temperature), top_p selects the most probable tokens whose cumulative probability exceeds a certain threshold p. For example, if p=0.9, the model considers only the smallest set of tokens whose sum of probabilities is greater than 0.9. This can offer a different way to control randomness, often used in conjunction with or as an alternative to temperature.

Model Selection

AI providers offer various LLM models, each with different capabilities, performance characteristics, and costs.

General-purpose models: (e.g., OpenAI’s GPT-4o, Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro) are powerful and versatile.
Faster/cheaper models: (e.g., OpenAI’s GPT-3.5 Turbo) are good for quick tasks or when cost is a major concern.
Specialized models: Some providers might offer models fine-tuned for specific tasks like coding or translation.

Choosing the right model depends on your application’s requirements for intelligence, speed, and budget.

Step-by-Step Implementation: Your First LLM API Call

Let’s get practical! We’ll use Python and the OpenAI API as a concrete example, as it’s widely adopted and offers robust capabilities. The principles learned here are transferable to other AI providers.

1. Setting Up Your Environment

First, ensure you have Python installed. As of January 2026, Python 3.12.x is the latest stable release and is highly recommended. You can download it from python.org. pip, Python’s package installer, should come bundled with it.

Next, we need to install the OpenAI Python client library.

pip install openai==1.10.0

(Note: 1.10.0 is a hypothetical stable version for Jan 2026, always check the official OpenAI Python library documentation for the very latest stable release.)

2. Get Your API Key

To make API calls, you’ll need an API key from your chosen AI provider. For OpenAI, you can generate one by visiting the OpenAI API Keys page after creating an account.

CRITICAL SECURITY NOTE: Never hardcode your API key directly into your scripts! This is a major security risk. Instead, store it as an environment variable.

How to set an environment variable:

On Linux/macOS (for your current terminal session):

export OPENAI_API_KEY="your_actual_api_key_here"

On Windows (Command Prompt):

set OPENAI_API_KEY="your_actual_api_key_here"

On Windows (PowerShell):

$env:OPENAI_API_KEY="your_actual_api_key_here"

For a more permanent solution, you’d add this to your shell’s profile file (.bashrc, .zshrc on Linux/macOS, or system environment variables on Windows).

3. Making Your First API Call

Now, let’s write some Python code! Create a new file named first_llm_call.py.

We’ll build this step-by-step.

Step 3.1: Import necessary libraries and load your API key.

import os
from openai import OpenAI

# 1. Load your API key from an environment variable
#    It's crucial for security not to hardcode keys.
api_key = os.environ.get("OPENAI_API_KEY")

if not api_key:
    raise ValueError("OPENAI_API_KEY environment variable not set.")

# 2. Initialize the OpenAI client
#    The client automatically picks up the API key if set as OPENAI_API_KEY
#    in your environment, but explicit passing is also possible.
client = OpenAI(api_key=api_key)

print("OpenAI client initialized successfully!")

Explanation:

import os: Allows us to interact with the operating system, specifically to read environment variables.
from openai import OpenAI: Imports the OpenAI class from the installed library.
os.environ.get("OPENAI_API_KEY"): Safely retrieves the API key. get() returns None if the variable isn’t set, preventing errors.
The if not api_key: block is a good practice to ensure the key is present before proceeding.
client = OpenAI(api_key=api_key): Creates an instance of the OpenAI client, which we’ll use to make API calls.

Step 3.2: Define your prompt and make the API call.

Now, let’s add the code to actually send a message to the LLM. Append this to your first_llm_call.py file:

# ... (previous code for imports and client initialization) ...

# 3. Define the message(s) to send to the LLM
#    LLMs often use a 'chat' format, where you specify roles:
#    - 'system': Sets the overall behavior/persona of the assistant.
#    - 'user': The user's input/question.
#    - 'assistant': The LLM's previous responses (for conversational context).
messages = [
    {"role": "system", "content": "You are a helpful assistant that explains complex concepts simply."},
    {"role": "user", "content": "Explain the concept of 'tokens' in Large Language Models in one sentence."}
]

# 4. Make the API call to create a chat completion
#    We use 'gpt-4o' as a powerful, general-purpose model as of Jan 2026.
#    Temperature controls creativity (0.0 for deterministic, 1.0 for creative).
try:
    response = client.chat.completions.create(
        model="gpt-4o", # Model choice (e.g., gpt-4o, gpt-3.5-turbo, claude-3-5-sonnet)
        messages=messages,
        temperature=0.7, # Adjust for more (higher) or less (lower) randomness/creativity
        max_tokens=50 # Optional: Limit the length of the response in tokens
    )

    # 5. Extract and print the LLM's response
    llm_response_content = response.choices[0].message.content
    print("\nLLM's Response:")
    print(llm_response_content)

except Exception as e:
    print(f"An error occurred: {e}")

Explanation:

messages list: This is how you provide your input to chat-based LLMs. It’s a list of dictionaries, where each dictionary represents a message with a role and content.
- "role": "system": This message helps prime the model, giving it instructions on how to behave throughout the conversation. It’s like setting the persona or guidelines.
- "role": "user": This is your actual question or instruction to the LLM.
client.chat.completions.create(...): This is the core function call to the OpenAI API.
- model="gpt-4o": Specifies which LLM model to use. gpt-4o is OpenAI’s flagship model as of early 2026, known for its advanced capabilities. You could also try gpt-3.5-turbo for a faster, cheaper alternative.
- messages=messages: Passes our defined list of messages.
- temperature=0.7: Sets the creativity level. We chose 0.7 for a balanced, slightly creative response.
- max_tokens=50: An optional parameter to limit the length of the LLM’s response. This can help control cost and ensure conciseness.
response.choices[0].message.content: The API response is an object. We navigate through it to get the actual text content generated by the LLM. choices is a list (usually one choice unless you request more), message contains the role (assistant) and content of the LLM’s reply.
try...except block: Good practice for handling potential network issues or API errors.

To run your script:

Save the first_llm_call.py file.
Ensure your OPENAI_API_KEY environment variable is set in your terminal.
Navigate to the directory where you saved the file in your terminal.
Run: python first_llm_call.py

You should see the LLM’s explanation of tokens! How cool is that? You just had your first conversation with a powerful AI.

Mini-Challenge: Get Creative with Temperature!

Now that you’ve made your first call, let’s experiment a bit.

Challenge: Modify the first_llm_call.py script to ask the LLM to write a short, whimsical poem about a coding bug. Run it twice:

With temperature=0.2 (low creativity).
With temperature=0.9 (high creativity).

Observe the differences in the poems generated.

Hint:

Change the content of the "user" message to your new prompt.
Change the temperature parameter in the client.chat.completions.create() call.
You might want to increase max_tokens (e.g., to 100 or 150) to allow for a longer poem.

What to observe/learn: Pay close attention to the vocabulary, sentence structure, and overall “feel” of the poems. Does the higher temperature result in more imaginative language, perhaps even some unexpected turns of phrase? Does the lower temperature produce something more predictable or structured? This exercise helps you intuitively grasp how temperature influences LLM output.

Common Pitfalls & Troubleshooting

Working with APIs can sometimes throw curveballs. Here are a few common issues and how to tackle them:

ValueError: OPENAI_API_KEY environment variable not set.
- Problem: Your script can’t find your API key.
- Solution: Double-check that you’ve set the environment variable correctly in your terminal before running the script. Remember, export (Linux/macOS) or set (Windows) only applies to the current terminal session. If you open a new terminal, you’ll need to set it again or make it permanent in your shell profile.
- Official Docs: OpenAI Authentication
openai.AuthenticationError: Incorrect API key provided...
- Problem: Your API key is incorrect or revoked.
- Solution: Go back to your OpenAI API Keys page and generate a new key, then update your environment variable. Ensure there are no extra spaces or characters.
openai.BadRequestError: This model's maximum context length is X tokens...
- Problem: Your input messages (and potentially the expected output) exceed the model’s context window limit.
- Solution: Shorten your input prompt. For longer interactions, you’ll need strategies like summarization or “sliding windows” (which we’ll cover in later chapters when discussing memory).
openai.RateLimitError: Rate limit exceeded...
- Problem: You’re sending too many requests too quickly.
- Solution: Wait a bit and try again. For production applications, you’ll need to implement retry logic with exponential backoff. Check your API provider’s usage limits.
Unexpected or Incoherent Responses:
- Problem: The LLM isn’t giving you what you want.
- Solution: This is often a prompt engineering issue (the art of crafting effective prompts), which is what we’ll dive into next! Also, check your temperature and top_p settings.

Summary

You’ve taken a significant leap forward! In this chapter, you’ve learned:

What LLMs are: Powerful neural networks predicting text, capable of generation, summarization, translation, and more.
The role of AI APIs: How we interact with LLMs remotely through well-defined interfaces.
Core LLM concepts:
- Tokens: The fundamental units of text LLMs process.
- Context Window: The limited “memory” or input/output capacity of an LLM.
- Temperature & Top-P: Parameters to control the creativity and diversity of LLM outputs.
- Model Selection: Choosing the right LLM for your task based on capability, speed, and cost.
Hands-on API interaction: You successfully set up your environment, secured your API key, and made your first programmatic call to an LLM using Python!

This foundational understanding of LLMs and API interaction is critical. You now have the basic tools to make an AI model respond to your commands. But how do you get it to respond exactly how you want? That’s the art and science of Prompt Engineering, which is precisely what we’ll explore in the next chapter! Get ready to become a master communicator with AI.

References

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.