Every production AI agent hits the same wall: the model is stateless, but the task isn't. Managed Agents gives you a runtime — containerized execution environments, append-only event logs, and session lifecycle management — so you can build agents that run for minutes or hours without hacking around API limits.
−60%
p50 TTFT drop
−90%
p95 TTFT drop
4
core primitives
$0.08
per runtime hour
Monthly Cost Calculator — Adjust Sliders to Estimate Your Cost
Sessions per day50
Avg task duration (min)5 min
Avg tokens per session10k
Hover a component to see what it does inside a Managed Agent.
The key insight: Traditional agents stuff everything into one massive prompt and wait for the model to respond. Managed Agents decouples the model (brain) from execution (hands) so the model can start streaming before all tools have run.
Local-First Problem
Traditional LLM calls are stateless round-trips. Long tasks require re-sending the entire conversation each turn, growing the prompt and increasing latency exponentially.
What Changes
Sessions persist state server-side. The model receives only what's new. Tool results stream in as events rather than blocking the next model call.
What Stays Same
The Claude API, prompt engineering, and tool-use patterns you already know. Managed Agents is an orchestration layer on top — not a different model.
You Build the Recipe. Anthropic Manages the Kitchen.
The word "Managed" means Anthropic runs the infrastructure underneath — containers, session state, security, scaling. You only write the agent logic. Here is exactly what that difference looks like.
The Problem: The Growing Prompt Wall
Every time your agent takes a new action, you must re-send the entire conversation history to the model — because it has no memory between API calls. The prompt grows with every turn. Eventually it hits the context limit and the task dies.
Click "Watch it grow" — see how traditional agents re-send the full history every turn, while Managed Agents only send new events.
What "Managed" Actually Means — Click a Row to Compare
Click any row to see what building it yourself looks like vs what Anthropic handles for you.
The restaurant analogy: A self-managed agent is like owning the restaurant building — you buy the equipment, hire the staff, handle maintenance, pay the utilities. Managed Agents is a managed kitchen space — you just bring your recipes (agent logic). The kitchen, staff, and utilities are handled for you. You focus on the food, not the plumbing.
Without Managed Agents — You Build
📄 Session state storage (database)
💻 Tool execution server (VM or container)
🔄 Retry + timeout logic (custom code)
🤖 Multi-agent orchestration (custom scheduler)
🔒 Credential isolation (your own vault/proxy)
📈 Scaling + monitoring (DevOps work)
Managed Agents is not an abstract framework — it's built for tasks that take minutes, not milliseconds. Here are four real patterns, each showing which tools fire and how the session unfolds.
Everything in Managed Agents reduces to four nouns: Agent, Environment, Session, Event. Understand these and the entire API makes sense.
Click a node to explore its role in the runtime.
Agent
The blueprint. Defines which model, system prompt, tool list, and MCP servers an agent will use. Creating an Agent does nothing — it just registers configuration. Think of it as a class definition.
Environment
The container template. Specifies what execution resources a session gets — filesystem, network access, installed binaries, memory limits. Environments are reusable across agents.
Session
The running instance. Combines an Agent with an Environment and starts executing. Sessions have lifecycle states: created → running → paused → complete → failed. This is where computation happens.
Event
The message unit. Everything that happens in a session is an event: user messages, assistant turns, tool invocations, tool results, status changes. The event log is append-only and queryable.
Session lifecycle — click any state to trigger a transition
Agent vs Session: An Agent is like a Docker image — a recipe. A Session is like a running container. You create many Sessions from one Agent, each isolated, each with its own event log.
What is an event log and why is it append-only? ▼
The event log is a sequential record of everything that happened in a session. Append-only means events are never modified or deleted — only new events are added. This gives you a complete audit trail, enables time-travel debugging, and allows resuming a session exactly where it left off after a crash. The model always sees the log as its context.
Can I run multiple agents in one session? ▼
Not directly — a session belongs to one agent. But you can have multiple sessions communicate via the multi-agent API (callable_agents). One session acts as orchestrator and spawns sub-sessions, each with their own agent, environment, and event log. Results flow back as events to the orchestrator.
How does pausing a session work? ▼
A session can be paused mid-execution — for example when it needs human approval before proceeding, or when waiting for an async external event. The environment is preserved (filesystem, process state). When you resume, the agent picks up from the exact event it paused on, as if nothing happened.
Agent → Session → Event: The One-to-Many Relationships
Click any node to understand the cardinality: how one Agent spawns many Sessions, one Environment serves many Sessions, and each Session produces many Events.
Managed Agents achieves its latency gains through a clean three-way split: a stateless model harness (Brain), isolated execution containers (Hands), and a persistent event log (Session). Each can scale independently.
Click a component to see its responsibilities and scaling properties.
🧠 Brain — Stateless Harness
The model layer. Receives events, generates the next assistant turn, emits tool calls. Completely stateless — it reads the event log, produces output, done. Can be scaled horizontally with zero coordination. No session state stored here.
👔 Hands — Execution Env
The container layer. Runs tools: shell commands, file I/O, web fetches, MCP servers. Each session gets its own isolated environment. Results are written back to the event log. The Brain never directly touches the environment.
📄 Session — Event Log
The memory layer. An append-only ledger of every event. Both Brain and Hands read from it; both write to it. Durable, queryable, resumable. This decoupling is what enables the -60%/-90% TTFT improvements.
// Why TTFT drops so dramatically:
//
// BEFORE (traditional agents):
// [collect all tool results] → [build full prompt] → [model starts streaming]
// p50 first token: ~4.2s p95 first token: ~18s
//
// AFTER (managed agents):
// [model reads partial event log] → [starts streaming immediately]
// [tool results arrive as new events while model is running]
// p50 first token: ~1.7s (-60%) p95 first token: ~1.8s (-90%)
Why is statelessness in the Brain an advantage? ▼
Stateless services are trivially scalable — you can add more Brain instances without coordination, routing, or sticky sessions. If a Brain instance crashes mid-generation, the next one picks up from the event log with no data loss. It also simplifies reasoning about correctness: the model always operates on the canonical log.
How are containers isolated between sessions? ▼
Each session gets a fresh container from the Environment template. Filesystem namespaces, network policies, and resource limits (CPU/memory) are applied per-session. Sessions cannot see each other's filesystems. When a session ends, the container is destroyed (though you can snapshot the filesystem before teardown).
The event loop is the heartbeat of a Managed Agent session. User messages, model turns, tool invocations, tool results, and status changes all flow as typed events through the same append-only log.
Session Replay — Drag to Scrub Through a Real Agent Session
Task: "Analyze sales_data.csv and find the top 5 products by revenue." Drag the scrubber to move through every event as it happened.
Drag the scrubber to step through each event in the session.
Click an event type to see its JSON schema and an example payload.
How do I stream events in real time? ▼
Use the GET /v1/sessions/{id}/events?stream=true endpoint with Server-Sent Events (SSE). Each event is delivered as a JSON line as soon as it's written to the log. You can also poll with after_event_id for simpler integrations that don't need sub-second delivery.
Can I inject events from outside the session? ▼
Yes — POST a user_message event to a running session to send a new human turn. You can also post structured tool_result events for tools that run outside the managed environment (e.g., calling your own API). This is how human-in-the-loop approval flows work.
Complex tasks decompose naturally into parallel workstreams. An orchestrator session can spawn multiple subagent sessions, each with its own agent definition, isolated environment, and independent event log. Results flow back as events.
// Multi-agent API pattern:
const agent = await client.beta.agents.create({
model: "claude-opus-4-6",
system: "You are a research orchestrator.",
callable_agents: [
{ agent_id: "web-researcher", alias: "search" },
{ agent_id: "code-analyst", alias: "analyze" },
{ agent_id: "summarizer", alias: "summarize" }
]
});
// The model calls sub-agents like tools:
// { type: "agent_use", agent: "search", input: { query: "..." } }
Orchestrator Pattern
One session acts as coordinator. It decomposes the task, dispatches subtasks to specialist subagents via callable_agents, collects results as agent_result events, and synthesizes the final response.
Parallel Isolation
Subagent sessions run concurrently in fully isolated containers. A crash in one subagent doesn't affect others. The orchestrator receives a failed event and can retry, skip, or escalate.
Session Threading
Each subagent call creates a child session with a full event log. You can inspect, replay, and debug the subagent's reasoning independently — without re-running the entire orchestration.
Rate limits: 60 session create/min · 600 event read/min. For high-throughput orchestration, batch subagent calls and use event streaming rather than polling individual sessions.
How does the orchestrator wait for all subagents? ▼
The orchestrator model emits multiple agent_use calls in a single turn (like parallel tool calls). The runtime dispatches all of them concurrently and writes the results back as agent_result events once each subagent completes. The orchestrator's next turn only begins when all pending agent calls have resolved.
Can subagents call other subagents? ▼
Yes — the tree can be arbitrarily deep. A subagent can itself have callable_agents, spawning grandchildren. Anthropic recommends limiting depth to 3 levels for practical observability. Deep trees are fully inspectable — each node has its own queryable event log.
Managed Agents provides a rich tool set out of the box. Security is enforced at the container level — tools run inside the isolated execution environment, not the model harness.
💻
Bash
Run shell commands in the container
📄
File I/O
Read, write, create, delete files
🔍
Web Search
Query the web, return structured results
🌎
Web Fetch
Fetch and parse any URL's content
🔌
MCP Servers
Connect to any MCP-compatible service
🛠
Code Run
Execute Python, JS, and more in sandbox
📷
Screenshot
Capture browser or desktop screenshots
🤖
Agents
Call subagent sessions as tools
Click a tool to see its capabilities, input schema, and security boundaries.
Click a security layer to learn how it protects the host system.
Resource-bound auth: Credentials are never injected into the container environment directly. Instead, a Vault+MCP proxy mediates all auth flows — the agent requests access, the proxy evaluates the resource policy, and issues a scoped token. The container never sees the master credentials.
How do I restrict which tools an agent can use? ▼
Define the tools array in your Agent config — only listed tools are available in sessions. For finer control, add a tool_policy to the Environment: specify allowed paths for file operations, allowed domains for web fetch, or a denylist of shell commands. Policies are enforced at the container level, not by the model.
What happens if a tool exceeds its resource limit? ▼
The container's cgroup enforces CPU, memory, and disk limits. If a tool call exceeds them (e.g., a runaway process), the container runtime sends SIGKILL to the offending process. A tool_result event with error: "resource_limit_exceeded" is written to the event log. The session continues — the model receives the error and can decide how to proceed.
The Managed Agents API follows the same conventions as the Anthropic Messages API. Beta header required during preview. Five steps from setup to a running session.
Click "Run Steps" to walk through the API flow
import anthropic
client = anthropic.Anthropic()
# Step 1: Create an agent blueprint
agent = client.beta.agents.create(
model="claude-opus-4-6",
name="code-reviewer",
system="You are an expert code reviewer.",
tools=["bash", "files", "web_search"],
betas=["managed-agents-2026-04-01"]
)
# Step 2: Create an execution environment
env = client.beta.environments.create(
name="review-env",
container_template="python3.11",
memory_mb=2048,
betas=["managed-agents-2026-04-01"]
)
# Step 3: Start a session
session = client.beta.sessions.create(
agent_id=agent.id,
environment_id=env.id,
betas=["managed-agents-2026-04-01"]
)
# Step 4: Send a message
client.beta.sessions.events.create(
session_id=session.id,
type="user_message",
content="Review this PR: [paste diff here]",
betas=["managed-agents-2026-04-01"]
)
# Step 5: Stream events
for event in client.beta.sessions.events.stream(
session_id=session.id,
betas=["managed-agents-2026-04-01"]
):
if event.type == "assistant_turn":
print(event.content)
elif event.type == "status_change" and event.status == "complete":
break
$0.08 per session-hour of active execution + standard model token pricing. Sessions that are paused or idle do not count toward the hourly rate. Environments are free to create; you only pay when sessions are running.
Rate Limits
60 session create/min · 600 event read/min · 20 concurrent active sessions per workspace (beta). Orchestration patterns that dispatch many subagents in parallel should fan-out through a single orchestrator session.
Beta Access
Available on the Anthropic Console under "Managed Agents (Beta)". Requires including anthropic-beta: managed-agents-2026-04-01 in every request. All endpoints are under /v1/beta/ during the preview period.
What this enables: Long-running research agents, automated code review pipelines, multi-step data analysis, parallelized content generation — all without managing your own container infrastructure or building custom session state.
If you're already using one of these frameworks, here's exactly where Managed Agents fits and where it differs. Click any cell for the full explanation.
Click any cell in the matrix to see a detailed comparison for that feature and framework.
When to choose Managed Agents
Tasks that run for minutes, need real tool execution (shell, files, web), require session persistence, or involve parallel subagents. You want Anthropic to manage the infrastructure so your team can focus on agent logic.
When to stick with DIY / LangChain
Tasks that complete in a single round-trip, teams with existing container infrastructure, or cases where you need fine-grained control over every layer of the stack. LangChain excels for rapid prototyping with many model providers.