Back to notes
Apply this to a build
Agent safetyChecklist6 min
Add guardrails to multi-step agents
How to constrain agent behavior around data access, tool use, user intent, and final actions.
Open source docReal workflow example
A multi-step agent drafts a proposal response and has access to email tooling. Sending the draft without review would create business risk.
Classify the email tool as external-send and require human approval. The agent can prepare the draft and checklist, but the send action remains disabled until a reviewer approves.
The agent accelerates work without crossing the boundary from recommendation to irreversible action.
Implementation approach
This guide is anchored in OpenAI tools guide. Use the official API behavior as the boundary, then design the surrounding product state so the feature can be reviewed, retried, and improved.
- Classify incoming requests by allowed, needs clarification, restricted, or unsafe.
- Validate every tool call against user permissions and workflow state.
- Require approval before destructive, financial, external-send, or production-write actions.
- Validate final outputs for format, evidence, and disallowed claims.
- Route blocked work to a human path with enough context to continue.
Code or config snippet when useful
type guardrails_for_multi_step_agents_workflow_state = {
sourceId: string;
status: "draft" | "needs_review" | "approved" | "blocked";
evidence: Array<{ source: string; summary: string }>;
nextAction: string;
};
Field notes
- Guardrails are most useful at boundaries: input, tool call, output, and final action.
- The system should reject unsafe actions before a tool executes, not after.
- A good fallback is explicit and useful, not a generic apology.
Mistakes to avoid
- Do not put approval rules only in prompt text.
- Do not give agents destructive tools without server-side gates.
- Do not make blocked actions disappear; explain the safe next step.
Ready checklist
- Input classification
- Tool authorization
- Approval gates
- Output validation
- Human fallback path
