Back to notes
ReliabilityChecklist5 min

Add rate-limit and retry behavior around OpenAI calls

How to keep user workflows stable when model calls are slow, limited, or temporarily unavailable.

Open source doc

Real workflow example

A proposal tool lets a user upload a tender document and generate a compliance matrix. During a busy morning, several users upload large files at once. Some model calls take longer than usual, and a few requests hit rate limits.

Without retry and queue behavior, users see random failures and click the button again, which creates duplicate runs and higher cost. With a stable workflow, each upload creates one job, the job moves through clear states, retries only safe steps, and the UI tells the user whether the work is queued, running, retrying, or needs manual attention.

The goal is not to hide every delay. The goal is to prevent duplicate side effects and keep the user oriented.

Implementation approach

Separate user-facing requests from background processing when the work can wait. A short chat answer may stay synchronous, but document extraction, report generation, and large enrichment jobs usually belong in a queue.

Use bounded retries. A retry policy should have a maximum attempt count, exponential backoff, and a reason to stop. Retrying forever turns a temporary service limit into a cost problem.

Add idempotency at the workflow level. If the user refreshes or clicks again, the app should find the existing run instead of creating another one.

Code or config snippet

const MAX_ATTEMPTS = 3;

export async function runOpenAiStep(workflowRunId: string) {
  const run = await db.workflowRun.findUniqueOrThrow({
    where: { id: workflowRunId },
  });

  if (run.status === "completed") return run;

  for (let attempt = run.attempts + 1; attempt <= MAX_ATTEMPTS; attempt += 1) {
    try {
      await db.workflowRun.update({
        where: { id: workflowRunId },
        data: { status: "running", attempts: attempt },
      });

      const response = await openai.responses.create({
        model: "gpt-4.1-mini",
        input: run.input,
        metadata: { workflowRunId },
      });

      return db.workflowRun.update({
        where: { id: workflowRunId },
        data: { status: "completed", responseId: response.id },
      });
    } catch (error) {
      if (!isRetryableOpenAiError(error) || attempt === MAX_ATTEMPTS) {
        return db.workflowRun.update({
          where: { id: workflowRunId },
          data: { status: "needs_review", failureCode: normalizeErrorCode(error) },
        });
      }

      await sleep(500 * 2 ** (attempt - 1));
    }
  }
}

Mistakes to avoid

  • Retrying model calls that already triggered an external send or database write.
  • Treating all errors as rate limits.
  • Letting users create duplicate workflow runs by refreshing the page.
  • Hiding queued work behind a spinner with no status.
  • Ignoring cost when retries multiply large-context requests.

Ready checklist

  • User action creates a stable workflow run ID.
  • Timeouts, rate limits, validation failures, and authorization failures are handled separately.
  • Retries are bounded and only used for safe operations.
  • Background jobs expose queued, running, retrying, failed, and completed states.
  • The UI prevents duplicate submissions for the same run.
  • Failed runs keep enough metadata for support to diagnose safely.
Practical note
Make the retry state visible in the product, not just in worker logs. Users tolerate waiting much better when the system names the current state and preserves their place in the workflow.

Use this as an implementation constraint, not just advice. The interface, server code, and validation path should make the same behavior true.

Apply this to a build
Contact
Bring the product pressure, system constraints, and expected business outcome.
Send the desired outcome, users, current bottleneck, stack, and timeline. I will respond with a practical senior engineering path for the build.