Add rate-limit and retry behavior around OpenAI calls
How to keep user workflows stable when model calls are slow, limited, or temporarily unavailable.
Open source docReal workflow example
A proposal tool lets a user upload a tender document and generate a compliance matrix. During a busy morning, several users upload large files at once. Some model calls take longer than usual, and a few requests hit rate limits.
Without retry and queue behavior, users see random failures and click the button again, which creates duplicate runs and higher cost. With a stable workflow, each upload creates one job, the job moves through clear states, retries only safe steps, and the UI tells the user whether the work is queued, running, retrying, or needs manual attention.
The goal is not to hide every delay. The goal is to prevent duplicate side effects and keep the user oriented.
Implementation approach
Separate user-facing requests from background processing when the work can wait. A short chat answer may stay synchronous, but document extraction, report generation, and large enrichment jobs usually belong in a queue.
Use bounded retries. A retry policy should have a maximum attempt count, exponential backoff, and a reason to stop. Retrying forever turns a temporary service limit into a cost problem.
Add idempotency at the workflow level. If the user refreshes or clicks again, the app should find the existing run instead of creating another one.
Code or config snippet
const MAX_ATTEMPTS = 3;
export async function runOpenAiStep(workflowRunId: string) {
const run = await db.workflowRun.findUniqueOrThrow({
where: { id: workflowRunId },
});
if (run.status === "completed") return run;
for (let attempt = run.attempts + 1; attempt <= MAX_ATTEMPTS; attempt += 1) {
try {
await db.workflowRun.update({
where: { id: workflowRunId },
data: { status: "running", attempts: attempt },
});
const response = await openai.responses.create({
model: "gpt-4.1-mini",
input: run.input,
metadata: { workflowRunId },
});
return db.workflowRun.update({
where: { id: workflowRunId },
data: { status: "completed", responseId: response.id },
});
} catch (error) {
if (!isRetryableOpenAiError(error) || attempt === MAX_ATTEMPTS) {
return db.workflowRun.update({
where: { id: workflowRunId },
data: { status: "needs_review", failureCode: normalizeErrorCode(error) },
});
}
await sleep(500 * 2 ** (attempt - 1));
}
}
}
Mistakes to avoid
- Retrying model calls that already triggered an external send or database write.
- Treating all errors as rate limits.
- Letting users create duplicate workflow runs by refreshing the page.
- Hiding queued work behind a spinner with no status.
- Ignoring cost when retries multiply large-context requests.
Ready checklist
- User action creates a stable workflow run ID.
- Timeouts, rate limits, validation failures, and authorization failures are handled separately.
- Retries are bounded and only used for safe operations.
- Background jobs expose queued, running, retrying, failed, and completed states.
- The UI prevents duplicate submissions for the same run.
- Failed runs keep enough metadata for support to diagnose safely.
