ObservabilityGuide5 min

Trace agent workflows for debugging

A guide to making multi-step AI behavior inspectable enough to fix when something goes wrong.

Real workflow example

A proposal manager disputes an AI recommendation. The output says low fit, but the team believes the tender is relevant.

Open the workflow trace: source document, extraction output, tool calls, qualification criteria, model response id, validation result, and reviewer notes. The trace shows the model missed one certificate from an appendix.

The fix is clear: improve document parsing for appendices. Without the trace, the team might have rewritten the prompt blindly.

Implementation approach

This guide is anchored in OpenAI Responses API reference. Use the official API behavior as the boundary, then design the surrounding product state so the feature can be reviewed, retried, and improved.

Create a workflow run ID before the first model request.
Attach every response, tool call, validation result, and approval decision to that run.
Store safe summaries and IDs instead of full sensitive payloads where possible.
Render an internal timeline for failed or disputed AI actions.
Use traces to choose whether to change prompts, tools, data, or UX.

Code or config snippet when useful

type trace_agent_workflows_debugging_workflow_state = {
  sourceId: string;
  status: "draft" | "needs_review" | "approved" | "blocked";
  evidence: Array<{ source: string; summary: string }>;
  nextAction: string;
};

Field notes

Debugging needs the request, tool calls, intermediate outputs, and final user-visible result.
Logs should explain behavior without exposing secrets or unnecessary personal data.
A trace is useful when a support or engineering person can replay the decision path.

Mistakes to avoid

Do not log sensitive raw data everywhere just to debug.
Do not keep only the final answer.
Do not merge multiple workflow runs under one vague log entry.

Ready checklist

Workflow run ID
Tool calls attached
Validation results stored
Sensitive data minimized
Debug timeline available

Practical tip

Practical note

A good trace helps you decide whether the bug is prompt, data, tool, model, or UI.

Use this as an implementation constraint, not just advice. The interface, server code, and validation path should make the same behavior true.

Apply this to a build