Back to notes
Realtime voiceTutorial7 min

Build a Realtime voice agent intake flow

How to design a low-latency voice workflow that captures the right facts and hands off cleanly.

Open source doc
Real example

Example: voice intake for a service appointment

A home-services company wants a voice agent to capture appointment requests after hours. The agent needs name, address, issue type, urgency, availability, and escalation triggers.

Start a Realtime session with a narrow call objective. During the call, collect required fields, confirm spelling for critical data, and summarize the appointment request for dispatch review.

The voice agent becomes an intake worker with a bounded job instead of an open-ended conversational demo.

ts
Server-created Realtime session secret
export async function POST() {
  const session = await openai.realtime.sessions.create({
    model: "gpt-realtime",
    audio: { output: { voice: "alloy" } },
  });

  return Response.json({
    client_secret: session.client_secret.value,
    expires_at: session.client_secret.expires_at,
  });
}
Tutorial path

How to implement it

Step 01
Define the call goal, required fields, disallowed decisions, and escalation triggers.
Step 02
Create a browser or phone session with a server-issued short-lived credential.
Step 03
Stream conversation events to the UI and backend transcript store.
Step 04
Extract call outcomes into structured fields after the call or at controlled checkpoints.
Step 05
Show the operator a call summary, missing fields, and recommended next action.
Checklist

Ready when these are true

Call objective documented
Short-lived client credential
Transcript stored securely
Escalation triggers tested
Post-call summary reviewed
Field notes

What matters in practice

01
Voice agents need a clear call objective before they need a persona.
02
Interruption, uncertainty, and handoff are core product states.
03
The transcript should become structured intake data, not disappear after the call.
Avoid these mistakes

Common failure modes

01
Do not let the voice agent make commitments the business cannot keep.
02
Do not skip interruption and noisy-audio testing.
03
Do not treat the transcript as the final structured record.
Practical tip
Voice UX improves when the agent confirms only high-risk fields, not every detail.
Apply this to a build
Contact
Bring the workflow, deadline, and constraints.
Send the desired outcome, current bottleneck, users, and timeline. I will respond with a practical path for the build.