Back to notes
ReliabilityChecklist5 min

Add rate-limit and retry behavior around OpenAI calls

How to keep user workflows stable when model calls are slow, limited, or temporarily unavailable.

Open source doc
Real example

Example: background proposal extraction under temporary rate pressure

A batch of 200 tenders arrives after a portal scrape. Running all extraction immediately creates rate pressure and user-facing pages slow down.

Put batch extraction in a queue with bounded concurrency. Use retries only for retryable errors, idempotency keys for each document, and a visible processing state in the UI.

Urgent user actions remain responsive while background AI work completes safely.

ts
Bounded retry wrapper
async function runWithRetry(operation: () => Promise<unknown>) {
  for (let attempt = 1; attempt <= 3; attempt += 1) {
    try {
      return await operation();
    } catch (error) {
      if (attempt === 3 || !isRetryable(error)) throw error;
      await new Promise((resolve) => setTimeout(resolve, 250 * attempt ** 2));
    }
  }
}
Tutorial path

How to implement it

Step 01
Set request timeouts and distinguish timeout, validation, authorization, and rate-limit failures.
Step 02
Retry only safe operations with exponential backoff and a small maximum attempt count.
Step 03
Use an idempotency key for workflow runs that may be retried.
Step 04
Queue non-urgent background work instead of blocking the user interface.
Step 05
Show a clear pending or retryable state to users.
Checklist

Ready when these are true

Timeouts configured
Retry policy bounded
Idempotency keys used
Background queue for non-urgent work
User sees pending state
Field notes

What matters in practice

01
Retries should improve reliability without multiplying cost or duplicate side effects.
02
Rate limits need user-facing feedback and backend queueing where work can wait.
03
Idempotency matters when a model call triggers downstream actions.
Avoid these mistakes

Common failure modes

01
Do not retry validation failures.
02
Do not run background batches with the same priority as user-facing requests.
03
Do not forget idempotency for retried workflow runs.
Practical tip
Separate user-facing latency from background throughput. They need different retry and queue policies.
Apply this to a build
Contact
Bring the workflow, deadline, and constraints.
Send the desired outcome, current bottleneck, users, and timeline. I will respond with a practical path for the build.