Back to notes
CostGuide5 min

Estimate cost before adding more AI calls

A practical cost model for deciding whether a workflow needs one model call, several calls, or a cheaper deterministic step.

Open source doc

Real workflow example

A team builds an AI intake tool for procurement notices. The first version sends the full notice to a model to classify category, extract deadlines, summarize requirements, score fit, draft next steps, and generate a reviewer note.

It works, but every small UI refresh triggers another large call. The team then splits the workflow: deterministic parsing handles dates and file metadata, one model call extracts structured fields, a smaller model scores fit, and cached extraction results are reused when the reviewer opens the page again.

The workflow becomes cheaper because the team prices the completed task, not the individual prompt.

Implementation approach

List every model call in the workflow. For each call, estimate input tokens, output tokens, expected frequency, retry rate, and the percentage of runs that reach that step. Then include reviewer time. An expensive call that saves ten minutes of expert review may be cheap. A cheap call repeated on every page load may be expensive.

Move deterministic work into code. Date normalization, permission checks, formatting, deduplication, and basic routing rules should not require a model call.

Cache stable results. Extraction from a specific uploaded document can often be reused across scoring, drafting, and dashboard views.

Code or config snippet

type AiStepEstimate = {
  name: string;
  runsPerMonth: number;
  inputTokens: number;
  outputTokens: number;
  retryRate: number;
  reachesStepRate: number;
};

function estimateMonthlyTokens(step: AiStepEstimate) {
  const attempts = step.runsPerMonth * step.reachesStepRate * (1 + step.retryRate);

  return {
    name: step.name,
    inputTokens: Math.round(attempts * step.inputTokens),
    outputTokens: Math.round(attempts * step.outputTokens),
  };
}

const extraction = estimateMonthlyTokens({
  name: "notice extraction",
  runsPerMonth: 2000,
  inputTokens: 12000,
  outputTokens: 900,
  retryRate: 0.05,
  reachesStepRate: 1,
});

Mistakes to avoid

  • Estimating cost from one prompt run instead of monthly workflow volume.
  • Forgetting retries, failed validations, and reprocessing.
  • Using a large model for simple formatting or classification.
  • Recomputing document extraction when the source file has not changed.
  • Ignoring the cost of human review time and support escalations.

Ready checklist

  • Every model call in the workflow is listed.
  • Token size, frequency, retry rate, and reach rate are estimated.
  • Deterministic steps are moved into code.
  • Stable outputs are cached by source version.
  • A smaller model has been tested where quality allows.
  • Reviewer time saved is part of the business case.
Practical note
Put a rough cost estimate in the pull request for new AI workflow steps. The number does not need to be perfect; it only needs to force the team to notice volume, retries, and avoidable calls before launch.

Use this as an implementation constraint, not just advice. The interface, server code, and validation path should make the same behavior true.

Apply this to a build
Contact
Bring the product pressure, system constraints, and expected business outcome.
Send the desired outcome, users, current bottleneck, stack, and timeline. I will respond with a practical senior engineering path for the build.