Design file inputs and document workflows for AI
How to make file upload, parsing, extraction, and review feel like one reliable product flow.
Open source docReal workflow example
A tender-analysis app asks users to upload PDF specifications, forms, and annexes. The demo version has one upload button and then a long wait. Users do not know whether the file uploaded, whether parsing failed, or whether the AI is still extracting fields.
A production document workflow treats the file as a record with states: uploaded, parsing, parsed, extracting, needs review, approved, failed, or replaced. The reviewer sees extracted requirements beside source references and can correct fields without losing the original document.
That makes the upload feel like part of the product, not a browser handoff.
Implementation approach
Create a database record before heavy processing starts. Store owner, MIME type, size, checksum, source, and status. Validate ownership before every read or write.
Separate upload progress from processing progress. Browser upload finishing only means the file reached storage. Parsing, extraction, validation, and review are separate states.
Preserve source references. AI extraction is much easier to trust when every field can link back to a page, section, or text span in the original file.
Code or config snippet
const DocumentStatus = {
Uploaded: "uploaded",
Parsing: "parsing",
Parsed: "parsed",
Extracting: "extracting",
NeedsReview: "needs_review",
Approved: "approved",
Failed: "failed",
} as const;
await db.document.create({
data: {
ownerId: user.id,
fileName,
mimeType,
byteSize,
checksum,
storageKey,
status: DocumentStatus.Uploaded,
},
});
Mistakes to avoid
- Starting extraction before file ownership and type are validated.
- Showing "uploaded" as if the document has already been processed.
- Storing extracted fields without source references.
- Hiding the original file from the reviewer.
- Making replacement or reprocessing require support intervention.
Ready checklist
- File owner, MIME type, size, and checksum are stored.
- Upload, parsing, extraction, review, and failure states are separate.
- Source references are preserved for extracted fields.
- Reviewers can correct AI output manually.
- Users can replace or reprocess a file.
- Failed parsing produces a useful next action.
