Beyond Single Agents: Orchestrating Multi-Agent Workflows and AI-Discoverable Skills

Introduction

Welcome back, intrepid command-line explorer! In previous chapters, we’ve journeyed into the exciting world of CLI-first AI systems, understanding how a single AI agent can perceive, reason, and act directly within your terminal. We’ve seen how these agents can automate tasks, interact with shell tools, and even generate code. Pretty cool, right?

But what if a task is too big, too complex, or requires different specializations that a single agent can’t easily handle alone? Imagine a team of highly skilled individuals, each with their own expertise, collaborating to achieve a grander goal. This is precisely the power of multi-agent workflows. In this chapter, we’ll dive into how to orchestrate multiple AI agents to tackle more intricate challenges, turning your terminal into a collaborative AI hub.

We’ll also explore a crucial concept that empowers agents to truly understand and utilize the vast ecosystem of existing CLI tools: AI-discoverable skills. These are structured definitions that help AI agents know what a command does, how to use it, and what kind of input/output it expects. By the end of this chapter, you’ll have a solid grasp of how to move beyond isolated agents and build more sophisticated, cooperative AI systems right from your command line.

Ready to level up your CLI AI game? Let’s go!

Core Concepts

What are Multi-Agent Workflows?

Think of a multi-agent workflow like a project team. Instead of one person trying to do everything, you have specialists: a researcher, a writer, an editor, a designer. Each focuses on their expertise, and together, they produce a high-quality outcome.

In the context of CLI-first AI systems, a multi-agent workflow involves coordinating two or more AI agents to collaboratively complete a complex task. Each agent might have a specific role, a set of tools it’s proficient with, or a particular domain of knowledge.

Why are they powerful in CLI environments?

Tackling Complexity: Break down a massive problem into smaller, manageable sub-problems, each handled by a specialized agent.
Parallelization: Independent sub-tasks can be executed concurrently by different agents, speeding up overall execution.
Specialization: Agents can be optimized for specific functions (e.g., one agent for data parsing, another for code generation, a third for cloud infrastructure management).
Robustness: If one agent fails, others might still proceed or the orchestrator can reassign the task, leading to more resilient systems.

The Challenges (And Why We Need Orchestration):

While powerful, multi-agent systems introduce complexity. They need careful design to prevent:

Communication Breakdown: How do agents share information?
Conflict: What if two agents try to modify the same resource simultaneously?
Redundancy: How do we avoid agents doing the same work twice?
Immaturity: The field of multi-agent AI is still rapidly evolving, meaning robust solutions often require significant architectural iteration.

This is where orchestration comes in!

Orchestration in CLI-First AI Systems

Orchestration is the art and science of coordinating multiple agents to work together harmoniously. It defines the flow, communication, and decision-making processes within a multi-agent system. In a CLI-first context, this often means managing agent sessions, input/output streams, and execution order.

How Agents Communicate

Agents in a CLI environment typically communicate through:

Shared Files: One agent writes to a file, another reads from it.
Pipes (|) and Redirects (>, <): Standard Unix-like communication for passing output from one command (or agent) as input to another.
Environment Variables: Agents can set and read environment variables to share configuration or intermediate results.
Dedicated Message Queues/APIs: For more advanced setups, agents might communicate via a lightweight message broker or a local API.

The Role of an Orchestrator

An orchestrator acts as the conductor of your AI agent symphony. It’s responsible for:

Task Assignment: Deciding which agent handles which part of the task.
Workflow Management: Defining the sequence or parallel execution of agent actions.
Resource Allocation: Ensuring agents have the necessary resources (e.g., access to specific tools).
Error Handling: Managing failures and deciding on recovery strategies.
Output Aggregation: Collecting and presenting the final results from multiple agents.

A great example of a CLI-first orchestrator is the CLI Agent Orchestrator (CAO) from AWS Labs (version N/A, as of 2026-03-20, refer to its GitHub repository: https://github.com/awslabs/cli-agent-orchestrator). CAO leverages tmux, a powerful terminal multiplexer, to manage multiple agent sessions, allowing them to run concurrently and interact within a structured terminal environment.

Orchestration Techniques

Sequential Orchestration: Agents execute tasks one after another. Agent A completes its task, passes its output to Agent B, which then proceeds.
- Example: Agent A fetches data -> Agent B processes data -> Agent C generates a report.
Parallel Orchestration: Agents work on independent sub-tasks concurrently.
- Example: Agent A analyzes logs, Agent B monitors system metrics, both report their findings simultaneously to the orchestrator.
Hierarchical Orchestration: A “manager” agent delegates tasks to “worker” agents and oversees their progress.
- Example: A “Project Manager” agent receives a high-level goal, breaks it down, assigns sub-goals to “Code Generator” and “Test Creator” agents, and then integrates their results.

Let’s visualize these basic orchestration patterns with a simple Mermaid diagram.

flowchart TD subgraph Sequential Workflow Start_Seq[Start Task] --> AgentA_Seq[Agent A: Fetch Data] AgentA_Seq --> AgentB_Seq[Agent B: Process Data] AgentB_Seq --> AgentC_Seq[Agent C: Generate Report] AgentC_Seq --> End_Seq[End Task] end subgraph Parallel Workflow Start_Par[Start Task] --> Fork_Par{Fork Tasks} Fork_Par --> AgentA_Par[Agent A: Analyze Logs] Fork_Par --> AgentB_Par[Agent B: Monitor Metrics] AgentA_Par --> Join_Par{Join Results} AgentB_Par --> Join_Par Join_Par --> End_Par[End Task] end subgraph Hierarchical Workflow Start_Hier[Start Task] --> ManagerAgent[Manager Agent: Decompose Task] ManagerAgent --> AssignCode[Assign: Code Generation] ManagerAgent --> AssignTest[Assign: Test Creation] AssignCode --> CodeAgent[Worker Agent: Generate Code] AssignTest --> TestAgent[Worker Agent: Create Tests] CodeAgent --> ReportCode[Report Code] TestAgent --> ReportTest[Report Tests] ReportCode --> ManagerAgent_Integrate[Manager Agent: Integrate Results] ReportTest --> ManagerAgent_Integrate ManagerAgent_Integrate --> End_Hier[End Task] end

AI-Discoverable Skills

For an AI agent to be truly useful in a CLI-first environment, it needs to understand the vast array of existing command-line tools. How can it know what grep does, or how to use jq to parse JSON, or the parameters for aws s3 cp? This is where AI-discoverable skills come into play.

An AI-discoverable skill is essentially a machine-readable description of a tool’s capabilities. It tells an AI agent:

What the tool does: Its purpose and high-level function.
How to invoke it: The command name and basic syntax.
Its arguments/options: What parameters it accepts, their types, and descriptions.
Expected input: What kind of data it processes.
Expected output: What kind of data it produces.
Examples: Concrete usage scenarios.

The concept often manifests as a structured markdown file, like SKILL.md (as seen in projects like AI-Starter-Kit [https://github.com/richardh8/AI-Starter-Kit] or proflead/how-to-build-ai-agent [https://github.com/proflead/how-to-build-ai-agent]). When an AI agent needs to perform a task, it can “read” these SKILL.md files (or similar structured definitions) to dynamically understand and select the appropriate CLI tool.

Why are they crucial?

Agent Autonomy: Agents can independently learn and utilize new tools without being explicitly programmed for each one.
Flexibility: Easily extend an agent’s capabilities by simply adding new SKILL.md files for different tools.
Standardization: Provides a consistent way to describe tool interfaces for AI consumption.
Reduced Hallucination: By providing explicit tool definitions, agents are less likely to “invent” non-existent commands or misuse existing ones.

Example Structure of a `SKILL.md`

Let’s imagine a simple CLI tool called file_stats that calculates lines, words, and characters in a file. Here’s how its SKILL.md might look:

# file_stats

## Description
A command-line tool to calculate statistics (lines, words, characters) for a given text file.

## Usage
`file_stats <filepath> [options]`

## Arguments
- `filepath` (string, required): The path to the text file to analyze.

## Options
- `--lines` (boolean, optional): If present, only display the number of lines.
- `--words` (boolean, optional): If present, only display the number of words.
- `--chars` (boolean, optional): If present, only display the number of characters.
- `--json` (boolean, optional): Output results in JSON format.

## Output
By default, prints lines, words, and characters to standard output. If `--json` is used, outputs a JSON object.

## Examples
### Example 1: Basic usage
```bash
file_stats my_document.txt

Expected Output:

Lines: 10
Words: 150
Chars: 800

Example 2: Get only word count

file_stats my_document.txt --words

Expected Output:

Words: 150

Example 3: Get JSON output

file_stats my_document.txt --json

Expected Output:

{
  "file": "my_document.txt",
  "lines": 10,
  "words": 150,
  "chars": 800
}


An AI agent, when tasked with "count the words in `report.txt` and output as JSON", could parse this `SKILL.md` for `file_stats`, understand its capabilities, and construct the command `file_stats report.txt --words --json`. Pretty neat, right?

## Step-by-Step Implementation: Conceptualizing an Orchestrated Workflow

While building a full-fledged multi-agent orchestrator is beyond a single chapter, we can conceptualize how one might operate and how `SKILL.md` files enable it. We'll use a simplified Python script to illustrate the orchestrator's role and a Bash script to simulate an agent using a discovered skill.

### Scenario: Automated Log Analysis and Reporting

Imagine we want to:
1.  **Fetch logs:** An `LogFetcher` agent retrieves log files from a specific directory.
2.  **Analyze logs:** An `Analyzer` agent processes these logs, perhaps counting specific error messages using a `log_parser` tool.
3.  **Generate Report:** A `Reporter` agent summarizes the findings and saves them to a file.

We'll focus on how the `Analyzer` agent might use an `AI-discoverable skill` for a fictional `log_parser` tool.

#### Step 1: Define the `log_parser` Skill

First, let's create a `SKILL.md` for our hypothetical `log_parser` tool. Create a file named `skills/log_parser.md`:

```bash
# Create a directory for skills if it doesn't exist
mkdir -p skills

Now, create the skills/log_parser.md file with the following content:

# log_parser

## Description
A command-line tool to parse log files and count occurrences of specific patterns.

## Usage
`log_parser <filepath> --pattern <regex_pattern> [options]`

## Arguments
- `filepath` (string, required): The path to the log file to analyze.

## Options
- `--pattern` (string, required): The regular expression pattern to search for.
- `--case-insensitive` (boolean, optional): Perform a case-insensitive search.
- `--output-json` (boolean, optional): Output results in JSON format.

## Output
By default, prints the count of occurrences to standard output. If `--output-json` is used, outputs a JSON object with the pattern and count.

## Examples
### Example 1: Count errors in a log file
```bash
log_parser app.log --pattern "ERROR"

Expected Output:

Pattern 'ERROR' found 15 times.

Example 2: Count warnings (case-insensitive)

log_parser server.log --pattern "warning" --case-insensitive

Expected Output:

Pattern 'warning' found 7 times.

Example 3: Count critical messages and output as JSON

log_parser system.log --pattern "CRITICAL" --output-json

Expected Output:

{
  "pattern": "CRITICAL",
  "count": 3
}


This `SKILL.md` provides all the necessary information for an AI agent to understand and use `log_parser`.

#### Step 2: Simulate the `log_parser` Tool

For our example, we don't need a complex `log_parser` actual implementation. A simple Python script can simulate its behavior. Create a file named `tools/log_parser.py`:

```bash
mkdir -p tools

Then, add the following Python code to tools/log_parser.py:

# tools/log_parser.py
import argparse
import re
import json
import sys

def main():
    parser = argparse.ArgumentParser(description="Parse log files for patterns.")
    parser.add_argument("filepath", help="Path to the log file.")
    parser.add_argument("--pattern", required=True, help="Regex pattern to search for.")
    parser.add_argument("--case-insensitive", action="store_true", help="Perform case-insensitive search.")
    parser.add_argument("--output-json", action="store_true", help="Output results in JSON format.")

    args = parser.parse_args()

    try:
        with open(args.filepath, 'r') as f:
            content = f.read()
    except FileNotFoundError:
        print(f"Error: File not found at {args.filepath}", file=sys.stderr)
        sys.exit(1)

    flags = re.IGNORECASE if args.case_insensitive else 0
    matches = re.findall(args.pattern, content, flags)
    count = len(matches)

    if args.output_json:
        print(json.dumps({"pattern": args.pattern, "count": count}))
    else:
        print(f"Pattern '{args.pattern}' found {count} times.")

if __name__ == "__main__":
    main()

Make it executable:

chmod +x tools/log_parser.py

Now, create a dummy log file to test:

echo "INFO: Application started." > app.log
echo "ERROR: Database connection failed." >> app.log
echo "WARNING: Low disk space." >> app.log
echo "ERROR: API endpoint not found." >> app.log
echo "info: User logged in." >> app.log

Test our simulated tool:

./tools/log_parser.py app.log --pattern "ERROR"
./tools/log_parser.py app.log --pattern "info" --case-insensitive --output-json

You should see output similar to:

Pattern 'ERROR' found 2 times.
{"pattern": "info", "count": 2}

Great! Our tool and its skill definition are ready.

Step 3: Orchestrator (Conceptual Python Script)

Now, let’s create a conceptual Python script that acts as an orchestrator. This script will:

Load the SKILL.md file.
Pretend to receive a task (e.g., “count errors in app.log”).
“Reason” about which tool to use (by parsing SKILL.md).
Construct and execute the appropriate command.

Create a file named orchestrator.py:

# orchestrator.py
import os
import subprocess
import re
import json

class Orchestrator:
    def __init__(self, skill_dir="skills", tool_dir="tools"):
        self.skill_dir = skill_dir
        self.tool_dir = tool_dir
        self.skills = self._load_skills()

    def _load_skills(self):
        """Loads all SKILL.md files from the skill directory."""
        loaded_skills = {}
        for skill_file in os.listdir(self.skill_dir):
            if skill_file.endswith(".md"):
                skill_name = os.path.splitext(skill_file)[0]
                filepath = os.path.join(self.skill_dir, skill_file)
                with open(filepath, 'r') as f:
                    content = f.read()
                    # A real agent would parse this markdown more robustly
                    # For simplicity, we'll just store the raw content for now
                    loaded_skills[skill_name] = content
        print(f"Loaded skills for: {', '.join(loaded_skills.keys())}")
        return loaded_skills

    def _find_tool_for_task(self, task_description):
        """
        Simulates an AI agent finding the right tool based on task description.
        In a real scenario, this would involve LLM reasoning over skill descriptions.
        For this example, we'll do a simple keyword match.
        """
        if "count errors" in task_description.lower() or "parse log" in task_description.lower():
            if "log_parser" in self.skills:
                print("Orchestrator identified 'log_parser' tool.")
                return "log_parser"
        return None

    def _construct_command(self, tool_name, task_description, log_file):
        """
        Simulates an AI agent constructing a command based on task and skill.
        Again, a real agent would use LLM to parse skill.md and task.
        """
        if tool_name == "log_parser":
            # Based on the log_parser.md, we know it needs filepath and pattern
            # and we can infer options from the task description
            command = [os.path.join(self.tool_dir, "log_parser.py"), log_file]
            if "errors" in task_description.lower():
                command.extend(["--pattern", "ERROR"])
            if "case-insensitive" in task_description.lower():
                command.append("--case-insensitive")
            if "json" in task_description.lower() or "report" in task_description.lower():
                command.append("--output-json")
            return command
        return None

    def execute_task(self, task_description, context={}):
        """
        Orchestrates the execution of a task.
        """
        print(f"\nOrchestrator received task: '{task_description}'")
        log_file = context.get("log_file", "app.log") # Default log file

        tool_to_use = self._find_tool_for_task(task_description)

        if tool_to_use:
            command = self._construct_command(tool_to_use, task_description, log_file)
            if command:
                print(f"Executing command: {' '.join(command)}")
                try:
                    result = subprocess.run(command, capture_output=True, text=True, check=True)
                    print("Tool output:")
                    print(result.stdout.strip())
                    return result.stdout.strip()
                except subprocess.CalledProcessError as e:
                    print(f"Error executing tool: {e}", file=sys.stderr)
                    print(f"Stderr: {e.stderr.strip()}", file=sys.stderr)
                    return None
            else:
                print(f"Could not construct command for tool '{tool_to_use}'.", file=sys.stderr)
                return None
        else:
            print(f"No suitable tool found for task: '{task_description}'.", file=sys.stderr)
            return None

if __name__ == "__main__":
    orchestrator = Orchestrator()

    # --- Scenario 1: Simple log analysis ---
    orchestrator.execute_task("count errors in app.log")

    # --- Scenario 2: More complex request with JSON output ---
    orchestrator.execute_task("find case-insensitive warnings in app.log and report as JSON")

    # --- Scenario 3: Task requiring a different log file (passing context) ---
    # Imagine another agent fetched 'server.log' and put it in context
    with open("server.log", "w") as f:
        f.write("DEBUG: User connected.\n")
        f.write("WARNING: High CPU usage.\n")
        f.write("ERROR: Disk full.\n")
    orchestrator.execute_task("count critical issues in server.log and output json", context={"log_file": "server.log"})

    # --- Scenario 4: Task for which no skill is defined (conceptual) ---
    orchestrator.execute_task("generate a complex neural network model")

Run the orchestrator:

python orchestrator.py

Observe how the orchestrator “discovers” the log_parser tool based on the task description and uses the SKILL.md (conceptually, through our simplified _construct_command logic) to build the correct command.

This simplified example demonstrates the core idea:

AI-Discoverable Skills: The SKILL.md provides the blueprint for the tool.
Orchestration Logic: The Orchestrator script acts as a central brain, interpreting tasks and delegating to tools based on their skills.
CLI Interaction: The orchestrator executes actual CLI commands (log_parser.py) and captures their output.

In a real multi-agent system, the Orchestrator might be a more sophisticated framework like cli-agent-orchestrator (which uses tmux sessions), and the “reasoning” part (_find_tool_for_task, _construct_command) would be powered by a Large Language Model (LLM) that can genuinely understand natural language tasks and skill definitions.

Mini-Challenge: Extend a Skill and an Agent’s Capabilities

Now it’s your turn to get hands-on!

Challenge:

Enhance log_parser.md: Add a new option --top-n <number> to the log_parser tool’s SKILL.md that, when used, displays the top N most frequent patterns found (instead of just counting a single specified pattern). You don’t need to implement this in log_parser.py, just define it in the SKILL.md and add an example.
Update orchestrator.py (Conceptually): Modify the _construct_command method in orchestrator.py to conceptually handle a task like “find the top 5 most frequent patterns in app.log”. You don’t need to make the command actually work; just add an if statement that would recognize such a task and attempt to construct a command using the new --top-n option you defined in SKILL.md. Print the command it would try to execute.

Hint:

For the SKILL.md update, think about how to describe a new option, its type (integer), and an example.
For orchestrator.py, you’ll need to add another if condition within _construct_command to detect keywords like “top N frequent patterns” and then append the --top-n option to the command list.

What to Observe/Learn:

How adding a new skill definition conceptually extends the capabilities of your AI system.
The clear mapping between a task description, a skill definition, and the resulting CLI command.
The iterative process of defining tools and enabling agents to use them.

Common Pitfalls & Troubleshooting

Underestimating Multi-Agent Complexity:
- Pitfall: Thinking that just launching multiple agents will automatically lead to synergy. Multi-agent systems are inherently complex, especially regarding coordination, state management, and debugging.
- Troubleshooting: Start simple. Design clear roles for each agent. Implement robust communication protocols. Utilize hierarchical orchestration for complex tasks. Be prepared for significant architectural iteration.
Vague SKILL.md Definitions:
- Pitfall: Providing ambiguous or incomplete descriptions of tool capabilities, leading to agents misusing tools or “hallucinating” incorrect arguments.
- Troubleshooting: Be precise in your SKILL.md. Define all arguments, options, input/output formats, and provide clear examples. Treat SKILL.md as a contract for the tool.
Lack of Robust Error Handling:
- Pitfall: Agents crashing or producing cryptic errors when a CLI tool fails, making debugging difficult in a terminal environment.
- Troubleshooting: Ensure your agents and orchestrator capture stderr from executed commands. Implement retry mechanisms, fallback strategies, and clear logging. A good terminal UX for AI will highlight errors effectively.
Security Risks with Broad Permissions:
- Pitfall: Granting AI agents extensive permissions (e.g., sudo access, broad file system access) without proper safeguards. An autonomous agent could execute malicious or unintended commands.
- Troubleshooting: Follow the principle of least privilege. Agents should only have the permissions necessary for their specific tasks. Consider sandboxing agents (e.g., using containers or restricted user accounts) for sensitive operations. Carefully review any generated commands before execution, especially in production environments.

Summary

Phew! We’ve covered a lot in this chapter, moving beyond the individual brilliance of a single AI agent to the collaborative power of multi-agent workflows.

Here are the key takeaways:

Multi-agent workflows enable AI systems to tackle more complex, multi-faceted tasks by distributing work among specialized agents.
Orchestration is crucial for coordinating these agents, managing communication, task assignment, and workflow execution, often leveraging tools like tmux (as demonstrated by cli-agent-orchestrator).
AI-discoverable skills, often defined in structured markdown files like SKILL.md, provide agents with the necessary understanding of how to use existing CLI tools.
These skill definitions empower agents with autonomy and flexibility, allowing them to dynamically select and invoke the right tools for a given task.
Designing robust multi-agent systems requires careful consideration of communication, conflict resolution, error handling, and security.

As you continue your journey, remember that the CLI-first AI paradigm is still evolving. The concepts of orchestration and AI-discoverable skills are fundamental building blocks for creating increasingly intelligent and autonomous terminal-based assistants.

In our next chapter, we’ll delve deeper into advanced topics like security considerations, testing, and creating more intuitive terminal user experiences for these powerful AI systems.

References

CLI Agent Orchestrator (CAO) GitHub: Managing multiple AI agent sessions in tmux. https://github.com/awslabs/cli-agent-orchestrator
AI-Starter-Kit GitHub: First Agent Tutorial, illustrating SKILL.md concepts. https://github.com/richardh8/AI-Starter-Kit
How to build your own AI agent easily with Google ADK GitHub: Further examples of AI agent design and tool integration. https://github.com/proflead/how-to-build-ai-agent
Mermaid Documentation: For flowchart syntax and examples. https://mermaid.js.org/syntax/flowchart.html

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.

Beyond Single Agents: Orchestrating Multi-Agent Workflows and AI-Discoverable Skills

Table of Contents

Introduction

Core Concepts

What are Multi-Agent Workflows?

Orchestration in CLI-First AI Systems

How Agents Communicate

The Role of an Orchestrator

Orchestration Techniques

AI-Discoverable Skills

Why are they crucial?

Example Structure of a SKILL.md

Example 2: Get only word count

Example 3: Get JSON output

Example 2: Count warnings (case-insensitive)

Example 3: Count critical messages and output as JSON

Step 3: Orchestrator (Conceptual Python Script)

Mini-Challenge: Extend a Skill and an Agent’s Capabilities

Common Pitfalls & Troubleshooting

Summary

References

Example Structure of a `SKILL.md`