Trace agent workflows for debugging
A guide to making multi-step AI behavior inspectable enough to fix when something goes wrong.
Open source docReal workflow example
A proposal manager disputes an AI recommendation. The output says low fit, but the team believes the tender is relevant.
Open the workflow trace: source document, extraction output, tool calls, qualification criteria, model response id, validation result, and reviewer notes. The trace shows the model missed one certificate from an appendix.
The fix is clear: improve document parsing for appendices. Without the trace, the team might have rewritten the prompt blindly.
Implementation approach
This guide is anchored in OpenAI Responses API reference. Use the official API behavior as the boundary, then design the surrounding product state so the feature can be reviewed, retried, and improved.
- Create a workflow run ID before the first model request.
- Attach every response, tool call, validation result, and approval decision to that run.
- Store safe summaries and IDs instead of full sensitive payloads where possible.
- Render an internal timeline for failed or disputed AI actions.
- Use traces to choose whether to change prompts, tools, data, or UX.
Code or config snippet when useful
type trace_agent_workflows_debugging_workflow_state = {
sourceId: string;
status: "draft" | "needs_review" | "approved" | "blocked";
evidence: Array<{ source: string; summary: string }>;
nextAction: string;
};
Field notes
- Debugging needs the request, tool calls, intermediate outputs, and final user-visible result.
- Logs should explain behavior without exposing secrets or unnecessary personal data.
- A trace is useful when a support or engineering person can replay the decision path.
Mistakes to avoid
- Do not log sensitive raw data everywhere just to debug.
- Do not keep only the final answer.
- Do not merge multiple workflow runs under one vague log entry.
Ready checklist
- Workflow run ID
- Tool calls attached
- Validation results stored
- Sensitive data minimized
- Debug timeline available
