Concepts
Modern agents work best when they can operate on real files in a filesystem. Sandbox Agents can make use of specialized tools and shell commands to search over and manipulate large document sets, edit files, generate artifacts, and run commands. The sandbox provides the model with a persistent workspace that the agent can use to do work on your behalf. Sandbox Agents in the Agents SDK help you run agents paired with a sandbox environment, making it easy to get the right files on the filesystem and orchestrate sandboxes to start, stop, and resume tasks at scale.
You define the workspace around the data the agent needs. It can start from GitHub repos, local files and directories, synthetic task files, remote filesystems such as S3 or Azure Blob Storage, and other sandbox inputs you provide.

SandboxAgent extends Agent, so it is still an Agent. It keeps the usual agent surface such as instructions, tools, handoffs, mcpServers, modelSettings, output types, guardrails, and hooks, and it still runs through the normal run() and Runner APIs. What changes is the execution boundary:
SandboxAgentdefines the agent itself: the usual agent configuration plus sandbox-specific defaults likedefaultManifest,baseInstructions,runAs, and capabilities such as filesystem tools, shell access, skills, memory, or compaction.Manifestdeclares the desired starting contents and layout for a fresh sandbox workspace, including files, repos, mounts, and environment.- A sandbox session is the live execution environment where commands run and files change.
- The
sandboxrun option decides how the run gets that sandbox session, for example by injecting one directly, reconnecting from serialized sandbox session state, or creating a fresh sandbox session through a sandbox client. - Saved sandbox state and snapshots let later runs reconnect to prior work or seed a fresh sandbox session from saved contents.
Manifest defines the starting contents for a new sandbox workspace. It does not describe the current files in every live sandbox, because reused sessions, serialized session state, and snapshots can all provide or change the workspace at run time.
Throughout this page, “sandbox session” means the live execution environment managed by a sandbox client. The exact boundary depends on the client: Unix-local sessions run in a local workspace on the host, while Docker and hosted clients provide stronger environment isolation. This is different from the SDK’s conversational Session interfaces described in Sessions.
The outer runtime still owns approvals, tracing, handoffs, and resume bookkeeping. The sandbox session owns commands, file changes, and environment isolation. That split is a core part of the model.
How the pieces fit together
Section titled “How the pieces fit together”A sandbox run combines an agent definition with per-run sandbox configuration. The runner prepares the agent, binds it to a live sandbox session, and can save state for later runs.
Sandbox-specific defaults stay on SandboxAgent. Per-run sandbox-session choices stay in the sandbox run option.
Think about the lifecycle in three phases:
- Define the agent and starting workspace contents with
SandboxAgent,Manifest, and capabilities. - Execute a run by giving
run()orRunnerasandboxrun option that injects, resumes, or creates the sandbox session. - Continue later from runner-managed
RunState, explicit sandboxsessionState, or a saved workspace snapshot.
If shell access is only one occasional tool, start with hosted shell in the Tools guide. Reach for sandbox agents when workspace isolation, sandbox client choice, or sandbox-session resume behavior are part of the design.
When to use them
Section titled “When to use them”Sandbox agents are a good fit for workspace-centric workflows, for example:
- Coding and debugging: orchestrate automated fixes for issue reports in a GitHub repo and run targeted tests.
- Document processing and editing: extract information from a user’s financial documents and create a completed tax-form draft.
- File-grounded review or analysis: check onboarding packets, generated reports, or artifact bundles before answering.
- Isolated multi-agent patterns: give each reviewer or coding sub-agent its own workspace.
- Multi-step workspace tasks: fix a bug in one run and add a regression test later, or resume from snapshot or sandbox session state.
If you do not need access to files or a living filesystem, keep using Agent. If shell access is just one occasional capability, add hosted shell; if the workspace boundary itself is part of the feature, use sandbox agents.
Choose a sandbox client
Section titled “Choose a sandbox client”Start with UnixLocalSandboxClient for local development. Move to DockerSandboxClient when you need container isolation or image parity. Move to a hosted provider when you need provider-managed execution.
In most cases, the SandboxAgent definition stays the same while the sandbox client and its options change in the sandbox run option. See Sandbox clients for local, Docker, hosted, and remote-mount options.
Core pieces
Section titled “Core pieces”| Layer | Main SDK pieces | What it answers |
|---|---|---|
| Agent definition | SandboxAgent, Manifest, capabilities | What agent will run, and what fresh-session workspace contract should it start from? |
| Sandbox execution | sandbox run option, the sandbox client, and the live sandbox session | How does this run get a live sandbox session, and where does the work execute? |
| Saved sandbox state | RunState sandbox payload, sessionState, and snapshots | How does this workflow reconnect to prior sandbox work or seed a fresh sandbox session from saved contents? |
The main SDK pieces map onto those layers like this:
| Piece | What it owns | Ask this question |
|---|---|---|
SandboxAgent | The agent definition | What should this agent do, and which defaults should travel with it? |
Manifest | Fresh-session workspace files and folders | What files and folders should be present on the filesystem when the run starts? |
Capability | Sandbox-native behavior | Which tools, instruction fragments, or runtime behavior should attach to this agent? |
sandbox run option | Per-run sandbox client and sandbox-session source | Should this run inject, resume, or create a sandbox session? |
RunState | Runner-managed saved sandbox state | Am I resuming a prior runner-managed workflow and carrying its sandbox state forward automatically? |
sandbox.sessionState | Explicit serialized sandbox session state | Do I want to resume from sandbox state I already serialized outside RunState? |
sandbox.snapshot | Saved workspace contents for fresh sandbox sessions | Should a new sandbox session start from saved files and artifacts? |
A practical design order is:
- Define the fresh-session workspace contract with
Manifest. - Define the agent with
SandboxAgent. - Add built-in or custom capabilities.
- Decide how each run should obtain its sandbox session in
run(agent, input, { sandbox: ... })ornew Runner({ sandbox: ... }).
How a sandbox run is prepared
Section titled “How a sandbox run is prepared”At run time, the runner turns that definition into a concrete sandbox-backed run:
- It resolves the sandbox session from the
sandboxrun option. - It determines the effective workspace inputs for the run.
- It lets capabilities process the resulting manifest.
- It builds the final instructions in a fixed order: the SDK’s default sandbox prompt, or
baseInstructionsif you explicitly override it, theninstructions, then capability instruction fragments, then any remote-mount policy text, then a rendered filesystem tree. - It binds capability tools to the live sandbox session and runs the prepared agent through the normal
run()andRunnerAPIs.
Sandboxing does not change what a turn means. A turn is still a model step, not a single shell command or sandbox action. There is no fixed 1:1 mapping between sandbox-side operations and turns. As a practical rule, another turn is consumed only when the agent runtime needs another model response after sandbox work has happened.
SandboxAgent options
Section titled “SandboxAgent options”These are the sandbox-specific options on top of the usual Agent fields:
| Option | Best use |
|---|---|
defaultManifest | The default workspace for fresh sandbox sessions created by the runner. |
instructions | Additional role, workflow, and success criteria appended after the SDK sandbox prompt. |
baseInstructions | Advanced escape hatch that replaces the SDK sandbox prompt. |
capabilities | Sandbox-native tools and behavior that should travel with this agent. |
runAs | User identity for model-facing sandbox tools such as shell commands, file reads, and patches. |
Sandbox client choice, sandbox-session reuse, manifest override, and snapshot selection belong in the sandbox run option, not on the agent.
defaultManifest
Section titled “defaultManifest”defaultManifest is the default Manifest used when the runner creates a fresh sandbox session for this agent. Use it for the files, repos, helper material, output directories, and mounts the agent should usually start with.
This is only the default. A run can override it with sandbox.manifest, and a reused or resumed sandbox session keeps its existing workspace state.
import { file, gitRepo, Manifest } from '@openai/agents/sandbox';
const manifest = new Manifest({ root: '/workspace', entries: { 'task.md': file({ content: 'Fix the failing test and summarize the change.', }), repo: gitRepo({ repo: 'openai/openai-agents-js', ref: 'main', }), }, environment: { NODE_ENV: 'test', },});instructions and baseInstructions
Section titled “instructions and baseInstructions”Use instructions for short rules that should survive different prompts. In a SandboxAgent, these instructions are appended after the SDK’s sandbox base prompt, so you keep the built-in sandbox guidance and add your own role, workflow, and success criteria.
Use baseInstructions only when you want to replace the SDK sandbox base prompt. Most agents should not set it.
| Put it in… | Use it for | Examples |
|---|---|---|
instructions | Stable role, workflow rules, and success criteria for the agent. | ”Inspect onboarding documents, then hand off.”, “Write final files into output/.” |
baseInstructions | A full replacement for the SDK sandbox base prompt. | Custom low-level sandbox wrapper prompts. |
| the user prompt | The one-off request for this run. | ”Summarize this workspace.” |
| workspace files in the manifest | Longer task specs, repo-local instructions, or bounded reference material. | repo/task.md, document bundles, sample packets. |
Avoid copying the user’s one-off task into instructions, embedding long reference material that belongs in the manifest, restating tool docs that built-in capabilities already inject, or mixing in local installation notes the model does not need at run time.
capabilities
Section titled “capabilities”Capabilities attach sandbox-native behavior to a SandboxAgent. They can shape the workspace before a run starts, append sandbox-specific instructions, expose tools that bind to the live sandbox session, and adjust model behavior or input handling for that agent.
Built-in capabilities include:
| Capability | Add it when | Notes |
|---|---|---|
shell() | The agent needs shell access. | Adds exec_command, plus write_stdin when the sandbox client supports PTY interaction. |
filesystem() | The agent needs to edit files or inspect local images. | Adds apply_patch and view_image; patch paths are workspace-root-relative. |
skills() | You want skill discovery and materialization in the sandbox. | Prefer this over mounting .agents or .agents/skills manually for sandbox-local SKILL.md skills. |
memory() | Follow-on runs should read or generate memory artifacts. | Requires shell(); live updates also require filesystem(). |
compaction() | Long-running flows need context trimming after compaction items. | Adjusts model sampling and input handling. |
By default, SandboxAgent.capabilities uses Capabilities.default(), which includes filesystem(), shell(), and compaction(). If you pass capabilities: [...], that list replaces the default, so include any default capabilities you still want.
Concepts
Section titled “Concepts”Manifest
Section titled “Manifest”A Manifest describes the workspace for a fresh sandbox session. It can set the workspace root, declare files and directories, copy in local files, clone Git repos, attach remote storage mounts, set environment variables, define users or groups, and grant access to specific absolute paths outside the workspace.
Manifest environment values are persisted by default. Use ephemeral entries such as { value: "...", ephemeral: true } for API keys, access tokens, or other short-lived credentials that should not be saved with sandbox state.
Manifest entry paths are workspace-relative. They cannot be absolute paths or escape the workspace with .., which keeps the workspace contract portable across local, Docker, and hosted clients.
Use manifest entries for the material the agent needs before work begins:
| Manifest entry | Use it for |
|---|---|
file(), dir() | Small synthetic inputs, helper files, or output directories. |
localFile(), localDir() | Host files or directories that should be materialized into the sandbox. |
gitRepo() | A repository that should be fetched into the workspace. |
mounts such as s3Mount(), gcsMount(), r2Mount(), azureBlobMount(), s3FilesMount() | External storage that should appear inside the sandbox. |
Mount entries describe what storage to expose; mount strategies describe how a sandbox backend attaches that storage. See Sandbox clients for mount options and provider support.
Permissions
Section titled “Permissions”Permissions controls filesystem permissions for manifest entries. It is about the files the sandbox materializes, not model permissions, approval policy, or API credentials.
Users are the sandbox identities that can execute work. Add a user to the manifest when you want that identity to exist in the sandbox, then set SandboxAgent.runAs when model-facing sandbox tools such as shell commands, file reads, and patches should run as that user.
If you also need file-level sharing rules, combine users with manifest groups and entry group metadata. The runAs user controls who executes sandbox-native actions; Permissions controls which files that user can read, write, or execute once the sandbox has materialized the workspace.
SnapshotSpec
Section titled “SnapshotSpec”SnapshotSpec tells a fresh sandbox session where saved workspace contents should be restored from and persisted back to. It is the snapshot policy for the sandbox workspace, while sessionState is the serialized connection state for resuming a specific sandbox backend.
Use local snapshots for local durable snapshots and remote snapshots when your app provides a remote snapshot client. Mounted and ephemeral paths are not copied into snapshots as durable workspace contents.
Sandbox lifecycle
Section titled “Sandbox lifecycle”There are two lifecycle modes: SDK-owned and developer-owned.
Pass
sandbox.client.Runner creates or resumes a sandbox session.
Agent runs and snapshot-backed workspace state can persist.
Runner closes runner-owned resources.
Create a
session.Pass
sandbox.sessioninto the run.Agent uses the existing workspace.
Inspect, reuse, then close the session yourself.
Use SDK-owned lifecycle when the sandbox only needs to live for one run. Pass a client, optional manifest, optional snapshot, and client options; the runner creates or resumes the sandbox, runs the agent, persists snapshot-backed workspace state, and lets the client clean up runner-owned resources.
import { run } from '@openai/agents';import { SandboxAgent } from '@openai/agents/sandbox';import { UnixLocalSandboxClient } from '@openai/agents/sandbox/local';
const agent = new SandboxAgent({ name: 'Workspace reviewer', model: 'gpt-5.5', instructions: 'Inspect the sandbox workspace before answering.',});
const result = await run(agent, 'Inspect the workspace.', { sandbox: { client: new UnixLocalSandboxClient(), },});
console.log(result.finalOutput);Use developer-owned lifecycle when you want to eagerly create a sandbox, reuse one live sandbox across multiple runs, inspect files after a run, stream over a sandbox you created yourself, or decide exactly when cleanup happens. Passing session tells the runner to use that live sandbox, but not to close it for you.
import { run } from '@openai/agents';import { Manifest, SandboxAgent } from '@openai/agents/sandbox';import { UnixLocalSandboxClient } from '@openai/agents/sandbox/local';
const manifest = new Manifest();const agent = new SandboxAgent({ name: 'Workspace reviewer', model: 'gpt-5.5', instructions: 'Inspect the sandbox workspace before answering.',});
const client = new UnixLocalSandboxClient();const session = await client.create({ manifest });
try { await run(agent, 'First task.', { sandbox: { session } }); await run(agent, 'Follow-up task.', { sandbox: { session } });} finally { await session.close?.();}sandbox run options
Section titled “sandbox run options”The sandbox run option holds the per-run options that decide where the sandbox session comes from and how a fresh session should be initialized.
Sandbox source
Section titled “Sandbox source”These options decide whether the runner should reuse, resume, or create the sandbox session:
| Option | Use it when | Notes |
|---|---|---|
client | You want the runner to create, resume, and clean up sandbox sessions for you. | Required unless you provide a live sandbox session. |
session | You already created a live sandbox session yourself. | The caller owns lifecycle; the runner reuses that live sandbox session. |
sessionState | You have serialized sandbox session state but not a live sandbox session object. | Requires client; the runner resumes from that explicit state as an owning session. |
Fresh-session inputs
Section titled “Fresh-session inputs”These options only matter when the runner is creating a fresh sandbox session:
| Option | Use it when | Notes |
|---|---|---|
manifest | You want a one-off fresh-session workspace override. | Falls back to agent.defaultManifest when omitted. |
snapshot | A fresh sandbox session should be seeded from a snapshot. | Useful for resume-like flows or remote snapshot clients. |
options | The sandbox client needs creation-time options. | Common for Docker images, provider timeouts, and similar client-specific settings. |
concurrencyLimits controls how much sandbox materialization work can run in parallel. Use manifestEntries and localDirFiles when large manifests or local directory copies need tighter resource control.
Materialization controls
Section titled “Materialization controls”Materialization controls are intentionally per-run. Keep them near the sandbox run option so the same SandboxAgent can use conservative limits for large local directory copies and looser limits for small manifests.
Use concurrencyLimits.manifestEntries when a manifest has many independent entries such as files, directories, repos, and mounts. Use concurrencyLimits.localDirFiles when localDir() entries contain many files and local copy pressure needs to be capped.
Full example: coding task
Section titled “Full example: coding task”This coding-style example is a good default starting point:
import { run } from '@openai/agents';import { Capabilities, Manifest, SandboxAgent, localDir, skills,} from '@openai/agents/sandbox';import { UnixLocalSandboxClient, localDirLazySkillSource,} from '@openai/agents/sandbox/local';import { dirname, join } from 'node:path';import { fileURLToPath } from 'node:url';
const exampleDir = dirname(fileURLToPath(import.meta.url));const hostRepoDir = join(exampleDir, 'repo');const hostSkillsDir = join(exampleDir, 'skills');
const manifest = new Manifest({ entries: { repo: localDir({ src: hostRepoDir }), },});
const agent = new SandboxAgent({ name: 'Sandbox engineer', model: 'gpt-5.5', instructions: 'Read `repo/task.md` before editing files. Load the `$invoice-total-fixer` skill before changing code. Stay grounded in the repository, preserve existing behavior, and mention the exact verification command you ran. If you edit files with apply_patch, paths are relative to the sandbox workspace root.', defaultManifest: manifest, capabilities: [ ...Capabilities.default(), skills({ lazyFrom: localDirLazySkillSource(hostSkillsDir), }), ],});
const result = await run( agent, 'Open `repo/task.md`, fix the issue, run the targeted test, and summarize the change.', { sandbox: { client: new UnixLocalSandboxClient(), }, },);
console.log(result.finalOutput);Common patterns
Section titled “Common patterns”Start from the full example above. In many cases, the same SandboxAgent can stay intact while only the sandbox client, sandbox-session source, or workspace source changes.
Switch sandbox clients
Section titled “Switch sandbox clients”Keep the agent definition the same and change only the run config. Use Docker when you want container isolation or image parity, or a hosted provider when you want provider-managed execution. See Sandbox clients for examples and provider options.
Override the workspace
Section titled “Override the workspace”Keep the agent definition the same and swap only the fresh-session manifest with sandbox: { client, manifest }. Use this when the same agent role should run against different repos, packets, or task bundles without rebuilding the agent.
Inject a sandbox session
Section titled “Inject a sandbox session”Inject a live sandbox session when you need explicit lifecycle control, post-run inspection, or output copying. Use sandbox: { session } for that run, and close the session in your application code.
Resume from session state
Section titled “Resume from session state”If you already serialized sandbox state outside RunState, let the runner reconnect from that state with sandbox: { client, sessionState }. Use this when sandbox state lives in your own storage or job system and you want Runner to resume from it directly.
Start from a snapshot
Section titled “Start from a snapshot”Seed a new sandbox from saved files and artifacts with sandbox: { client, snapshot }. Use this when a fresh run should start from saved workspace contents rather than only agent.defaultManifest.
Load skills from Git
Section titled “Load skills from Git”Swap the local skill source for a repository-backed one with skills({ from: gitRepo(...) }). Use this when the skills bundle has its own release cadence or should be shared across sandboxes.
Expose as tools
Section titled “Expose as tools”Tool-agents can either get their own sandbox boundary or reuse a live sandbox from the parent run. Reuse is useful for a fast read-only explorer agent: it can inspect the exact workspace the parent is using without paying to create, hydrate, or snapshot another sandbox.
When a tool-agent needs real isolation instead, give it its own runConfig through sandboxAgent.asTool(...). Use a separate sandbox when the tool-agent should mutate freely, run untrusted commands, or use a different backend or image.
Combine with local tools and MCP
Section titled “Combine with local tools and MCP”Keep the sandbox workspace while still using ordinary tools on the same agent. Sandbox capabilities can coexist with tools, mcpServers, handoffs, model settings, and output configuration.
Memory
Section titled “Memory”Use the memory() capability when future sandbox-agent runs should learn from prior runs. Memory is separate from the SDK’s conversational Session memory: it distills lessons into files inside the sandbox workspace, then later runs can read those files.
See Agent memory for setup, read/generate behavior, multi-turn conversations, and layout isolation.
Composition patterns
Section titled “Composition patterns”Once the single-agent pattern is clear, the next design question is where the sandbox boundary belongs in a larger system.
Sandbox agents still compose with the rest of the SDK:
- Handoffs: hand document-heavy work from a non-sandbox intake agent into a sandbox reviewer.
- Agents as tools: expose multiple sandbox agents as tools, usually by passing a sandbox run config on each
asTool(...)call so each tool gets its own sandbox boundary. - MCP and normal function tools: sandbox capabilities can coexist with
mcpServersand ordinary tools. - Running agents: sandbox runs still use the normal
run()andRunnerAPIs.
With a handoff, there is still one top-level run and one top-level turn loop. The active agent changes, but the run does not become nested.
With asTool(...), the relationship is different. The outer orchestrator uses one outer turn to decide to call the tool, and that tool call starts a nested run for the sandbox agent. The nested run has its own turn loop, maxTurns, approvals, and usually its own sandbox run config. From the outer orchestrator’s point of view, all of that work still sits behind one tool invocation, so the nested turns do not increment the outer run’s turn counter.
Further reading
Section titled “Further reading”- Quickstart: get one sandbox agent running.
- Sandbox clients: choose local, Docker, hosted, and mount options.
- Agent memory: preserve and reuse lessons from prior sandbox runs.