Guide

AI Agent Failure Modes

A field guide to common AI agent failures, the controls that reduce them, and the evidence reviewers need before launch, rollout, or incident review.

AI agent failures are rarely mysterious after the trace is visible. The system chose a bad tool, trusted stale context, skipped a permission boundary, retried the same broken step, or produced a final answer without evidence. The harder part is catching those patterns before the agent is connected to real customers, production data, or irreversible actions.

A practical failure-mode review starts from the agent contract described in what is an AI agent. Once a system can pursue a goal through state and tools, it needs controls that are stronger than prompt wording. The review should map each predictable failure to a prevention control, a detection signal, and a recovery path.

The problem with agent reliability

One-shot LLM errors are usually visible in the response. Agent errors can be distributed across several steps. A bad retrieval result in step two can produce a confident action in step six. A missing permission check can look like a successful completion until the wrong record is updated. A retry loop can create duplicate tickets, emails, or draft files.

This is why “it worked in the demo” is weak evidence. The team needs repeatable fixtures, logs, and review criteria. Use the LLM evaluation framework to create a small but representative test set before the agent handles live work.

Common failure modes

Wrong tool selection happens when the agent calls a tool that is plausible but not authorized for the task. For example, it may search public web results when the task requires internal documentation, or write a draft to a live system when it should create a reviewable artifact.

Bad arguments happen when the tool is correct but the parameters are wrong. The agent may search the wrong date range, use a broad customer filter, or omit a tenant constraint. These failures are common when the model builds tool calls from prose without schema checks.

Stale retrieval happens when the agent uses old or incomplete source material. The output may be polished but no longer accurate. RAG-heavy agents need the same discipline as the RAG evaluation checklist: freshness, source selection, citation quality, and no-answer behavior.

Prompt injection happens when retrieved content tells the agent to ignore instructions, reveal private context, or call unsafe tools. Any agent that reads external or user-generated content needs isolation between source text and system policy.

Unsupported conclusions happen when the agent reaches beyond the evidence. This is especially risky in research, code review, legal, security, and operational workflows. The final answer should separate observed facts, inferences, and open questions.

Controls that reduce risk

Use narrow tool allowlists. Give the agent only the tools needed for the current job. Split read, draft, and publish actions into separate permission classes. The agent permission design pattern keeps convenience from becoming authority.

Validate tool calls before execution. Required fields, enum values, tenant IDs, maximum date ranges, and write scopes should be checked by code, not by the model. Reject invalid calls with a clear error and let the agent recover within a bounded retry limit.

Require evidence in the final answer. If the agent used sources, show them. If it made an inference, label it. If the task cannot be completed, return a no-answer response instead of filling the gap. The RAG no-answer testing method applies to agents as well as retrieval systems.

Add review gates for high-impact actions. Publishing, emailing, deleting, purchasing, changing permissions, or updating customer-visible records should require human approval until production traces justify a narrower exception.

Failure review checklist

Before launch, test at least one fixture for each failure class: irrelevant retrieval, stale retrieval, malicious instruction in source text, missing permission, tool timeout, duplicate action, ambiguous user request, and no-answer case. Each fixture should have an expected behavior and an expected evidence trail.

During operation, sample traces. Look for hidden retries, ungrounded claims, unusually long runs, tool errors that were summarized away, and outputs that do not match the cited evidence. When a failure occurs, run the agent incident review process and change a control, not only the prompt.

Frequently asked questions

What is the most common AI agent failure?

The most common operational failure is not one bug; it is an unsupported final answer that hides weak retrieval, wrong tool use, or missing evidence.

Can prompt changes fix agent failures?

Prompt changes help only when the failure is instructional. Tool permissions, logging, fixtures, and approval gates are needed for systemic agent failures.

Next step

Create a failure-mode table before connecting an agent to production systems. For each row, write the fixture, prevention control, detection signal, and recovery action. If any row has only “make the prompt better” as the control, the design is not ready.