Agentic AI: The Moving Parts

What is an AI agent?

An AI agent is a program built around a large language model (LLM) that can take actions, not just generate text. A regular chatbot receives a prompt and returns a response. An agent receives a goal, decides what steps are needed, and executes those steps by calling external tools, reading data, or modifying systems.

The key difference: a chatbot answers questions. An agent does work.

Tool calling

Agents interact with the outside world through tool calls. A tool is any function the agent can invoke: a web search, a database query, a code interpreter, an API request, a file write.

The LLM does not execute the tool directly. Instead, it outputs a structured request (typically JSON) describing which tool to call and with what arguments. The host program executes the tool and feeds the result back to the LLM, which then decides what to do next.

Example flow:

Agent decides it needs to look up a user's order status
Agent outputs: {"tool": "get_order", "args": {"order_id": "12345"}}
Host program runs the function and returns the result
Agent reads the result and formulates a response or takes another action

Tools are defined ahead of time. The agent picks from a fixed set. It cannot invent new tools on the fly.

The agent loop

The core execution pattern of an agent is a loop:

Prompt -- The agent receives a goal or instruction
Think -- The LLM reasons about what to do next
Act -- The agent calls a tool or produces output
Observe -- The agent reads the result of its action
Repeat -- Back to step 2 until the goal is met or the agent decides to stop

This loop runs until the agent reaches a terminal condition: the task is done, a maximum number of steps is hit, or an error stops execution. Without a termination condition, agents can loop indefinitely.

Context

2k / 128k tokens

Current State▶ InitSession Initialized

Step 1 / 25

Speed

Session Trace: Investigate and Fix Slow API Endpoint

#		State	Description	Tool
1	▶	▶ Init	Session InitializedAgent loaded with 12 tools. Context window: 128k tokens. Usage: 2k/128k tokens.		●
2	✉	✉ User Input	User Request"The /api/users endpoint is slow in production. Can you investigate and fix it?"		○
3	✶	≡ Planning	Entering Plan ModeSwitching to read-only research. Tools restricted to: read_file, search, list_files, profiler. No...		○
4	✶	✶ Reasoning	Analyzing CodebaseNeed to find the route handler, trace the data path, and profile the endpoint.		○
5	✶	⚡ Execution	Read: routes/api.jsFound GET /api/users -> calls UserController.index() -> calls UserRepository.getAll()	read_file	○
6	✶	⚡ Execution	Read: UserRepository.phpgetAll() calls User::all() -- no eager loading. User model has hasMany relations: posts, comments...	read_file	○
7	✶	⚡ Execution	Profile: /api/usersResponse time: 340ms. Database queries: 47 (1 for users + 15x2 for posts and comments + 15x1 for ...	profiler	○
8	✶	◎ Observation	Analysis CompleteRoot cause identified: N+1 query pattern. Each of 15 users triggers 3 additional queries for rela...		○
9	←	← Agent Response	Plan Presented to User"Found N+1 query in UserRepository.getAll(). 47 queries for 15 users. Proposed fix: add eager loa...		○
10	⚑	⚑ Approval Gate	Awaiting Plan ApprovalPlan requires write access to production query files. Waiting for human approval before proceeding.		○
11	✉	✉ User Input	Plan Approved"Looks good, go ahead."		○
12	✶	✶ Reasoning	Entering Execute ModePlan approved. Full tool access restored. Implementing step 1 of 3: fix the query.		○
13	✶	⚡ Execution	Edit: UserRepository.phpChanged User::all() to User::with(['posts', 'comments', 'profile'])->get().	edit_file	○
14	✶	⚡ Execution	Run: Test SuiteRunning phpunit --filter=UserTest... 47/47 tests passed. No regressions.	shell	○
15	✶	◎ Observation	Tests PassingAll 47 tests pass. Fix is safe. Moving to step 3: re-profile.		○
16	✶	⚡ Execution	Profile: /api/users (after fix)Response time: 42ms (was 340ms). Database queries: 4 (was 47). 8.1x improvement.	profiler	○
17	←	← Agent Response	Progress Report"Fixed /api/users: 340ms -> 42ms. Found 2 more N+1 patterns in OrderRepository and ProductReposit...		○
18	✉	✉ User Input	Scope Extended"Yes, same fix for both."		○
19	✶	✶ Reasoning	ReplanningSame pattern, already approved approach. Can fix both concurrently.		○
20	✶	⚡ Execution	Edit: OrderRepository.php + ProductRepository.phpConcurrent: applied eager loading to both repositories simultaneously. OrderRepository: ->with(['...	\|\|edit_file x2	○
21	✶	⚡ Execution	Run: Full Test SuiteRunning phpunit... 124/124 tests passed. All endpoints verified.	shell	○
22	✶	◎ Observation	All Fixes VerifiedOrderRepository: 280ms -> 35ms. ProductRepository: 190ms -> 28ms. No test regressions.		○
23	▶	… Compaction	Context CompactionContext at 101k/128k tokens (79%). Compacting conversation history: preserving key state and deci...		○
24	✶	⟳ Reflection	Storing LessonPattern: N+1 queries from missing eager loading in Eloquent repositories. Diagnostic: query count...		○
25	✶	✔ Completion	Task CompleteFixed 3 N+1 query issues. /api/users 340->42ms, /api/orders 280->35ms, /api/products 190->28ms. 1...		○

ThinkingActionEntryErrorHumanContextGateClick any step to jump. ● active ○ pending ✓ complete

Multi-agent systems

A multi-agent system uses more than one agent to complete a task. Each agent may have a different role, set of tools, or area of expertise.

Examples:

A research agent gathers information, then hands it to a writing agent that drafts a report
A planning agent breaks a task into subtasks and delegates each to a worker agent
Multiple agents work in parallel on independent parts of a problem, then a synthesis agent combines the results

Agents in a multi-agent system communicate by passing messages or sharing a common workspace. The main benefit is specialization: each agent can have a focused set of tools and instructions, which tends to produce better results than one agent trying to do everything.

MCP (Model Context Protocol)

MCP is a standard protocol for connecting AI agents to external tools and data sources. It was created by Anthropic and is an open specification.

Without MCP, every agent framework defines its own way of describing tools, passing arguments, and returning results. MCP provides a common interface so that a tool built once can be used by any agent that supports the protocol.

MCP defines three core primitives:

Tools -- Functions the agent can call (e.g., search a database, send an email)
Resources -- Data the agent can read (e.g., files, documentation, API responses)
Prompts -- Reusable prompt templates the server can expose to the agent

An MCP server exposes tools and resources. An MCP client (the agent) connects to one or more servers and discovers what is available. This is similar to how a web browser connects to any web server using HTTP.

Context windows as a constraint

Every LLM has a context window: a maximum amount of text it can process at once. This includes the system prompt, the conversation history, tool results, and the agent's own reasoning.

When an agent runs a long task, the context window fills up. Once full, older information gets dropped. The agent effectively forgets what it did earlier.

Strategies for managing this:

Summarization -- Periodically compress the conversation history into a shorter summary
Retrieval -- Store information externally and fetch only what is relevant for the current step
Scratchpads -- Write intermediate results to a file or database instead of keeping them in context
Windowing -- Keep only the most recent N messages and a summary of everything before

Context management is one of the hardest practical problems in building agents. An agent that forgets a constraint mentioned 50 messages ago will violate that constraint.

Orchestration

Orchestration is the layer that controls what agents do and in what order. In a single-agent system, orchestration is the agent loop itself. In a multi-agent system, orchestration decides:

Which agent runs next
What information each agent receives
When to hand off between agents
When the overall task is complete

Orchestration can be explicit (a fixed pipeline where agent A always runs before agent B) or dynamic (an orchestrator agent that decides at runtime which agent to call). Common patterns include sequential chains, parallel fan-out/fan-in, and hierarchical delegation.

Agentic drift

When multiple agents work in parallel, they can make conflicting decisions. This is called agentic drift (sometimes called divergence).

Example: two agents are both editing the same codebase. Agent A refactors a function. Agent B, working from the original code, writes new code that calls the old version of that function. When their work is merged, things break.

Drift happens because parallel agents do not share real-time state. Each operates on its own snapshot of the world. Mitigations include locking shared resources, frequent synchronization points, and having a reviewer agent check for conflicts before merging results.

Guardrails

Guardrails are constraints that prevent agents from doing harmful, unauthorized, or unintended things. Without guardrails, an agent with access to a production database could delete data, an agent with email access could send messages to the wrong people, and an agent with code execution could run destructive commands.

Types of guardrails:

Input validation -- Reject or sanitize prompts that attempt to override the agent's instructions (prompt injection defense)
Output filtering -- Check the agent's responses for harmful content, PII leakage, or policy violations before delivering them
Tool restrictions -- Limit which tools an agent can call, or require human approval for high-risk actions
Budget limits -- Cap the number of steps, API calls, or tokens an agent can consume
Sandboxing -- Run code execution tools in isolated environments with no network access or filesystem permissions

Guardrails are not optional. They are a required part of any production agent system.

Observability

Observability means being able to see what an agent is doing and why. Since agents make autonomous decisions, you need logs and traces to understand their behavior after the fact.

Key things to observe:

Trace of actions -- Every tool call, its arguments, and its result
Reasoning -- The LLM's chain-of-thought at each step (if available)
Token usage -- How much context is being consumed and how much each step costs
Latency -- How long each step takes
Errors -- Failed tool calls, timeouts, rate limits
Drift detection -- Whether the agent is staying on task or going off-track

Without observability, debugging an agent that produces wrong results is nearly impossible. You cannot fix what you cannot see.

Durable execution

Agents can run for minutes or hours. Servers crash. Networks drop. If an agent loses its state mid-task, it has to start over unless the system supports durable execution.

Durable execution means persisting the agent's state (conversation history, tool results, current step) so that it can resume from where it left off after a crash or restart. This is the same concept as durable workflows in backend engineering (think Temporal, AWS Step Functions, or Vercel Workflow).

Key requirements:

Checkpoint the agent's state after each step
Store state in a persistent backend (database, object storage)
On restart, reload state and continue from the last checkpoint
Handle idempotency: if a tool call was made but the result was not recorded, decide whether to retry or skip

Without durable execution, long-running agents are fragile. Any interruption means lost work.

Chatbot vs. copilot vs. agent

These three terms describe different levels of autonomy:

Chatbot -- Responds to messages. Has no tools. Cannot take actions. It generates text based on a prompt and conversation history. Examples: a customer support bot that answers FAQs, a basic ChatGPT conversation.

Copilot -- Suggests actions but requires human approval. Has access to tools and context (your code, your documents, your email) but operates in an advisory role. The human decides what to accept. Examples: GitHub Copilot suggesting code completions, an AI assistant that drafts emails for you to review and send.

Agent -- Acts autonomously toward a goal. Has tools, makes decisions, and executes actions without asking for permission at every step. The human sets the goal and constraints, then the agent works independently. Examples: an agent that researches a topic and writes a report, an agent that triages and responds to support tickets.

The boundaries are not sharp. Many real systems blend these modes: an agent that acts autonomously on low-risk tasks but escalates to a human for high-risk decisions.

Current limitations

AI agents are useful but far from reliable. Key limitations as of early 2025:

Hallucination -- Agents still fabricate facts, invent tool arguments that do not exist, and confidently produce wrong answers. This is an inherent property of LLMs, not a bug that will be patched soon.
Planning failures -- Agents struggle with tasks that require long-horizon planning. They can miss steps, go in circles, or pursue dead-end strategies.
Fragile tool use -- Small changes in tool descriptions or argument formats can cause agents to misuse tools or fail to call them at all.
Cost -- Each step in the agent loop costs tokens. Complex tasks with many steps can get expensive fast.
Latency -- Each LLM call takes time. An agent that needs 20 steps to complete a task means 20 round trips to the model.
Security -- Agents that process untrusted input are vulnerable to prompt injection, where adversarial text in the input hijacks the agent's behavior.
Human oversight is still required -- For any task where errors have real consequences (financial transactions, medical advice, legal documents, production deployments), a human needs to review the agent's work before it takes effect.

Agents are a tool for augmenting human work, not replacing human judgment. Build accordingly.