Build a moderation pass for user-generated content
How to add a moderation checkpoint before AI-assisted workflows publish, send, or store risky content.
Open source docReal workflow example
A customer success product lets users draft public knowledge-base answers with AI help. The user can paste customer text, ask for a rewrite, and publish the result. That workflow handles user content, model-generated content, and public publishing in one path.
A moderation pass protects the irreversible step. Before publishing, the app checks the source text and generated answer, blocks severe issues, routes ambiguous cases to review, and explains what the user can edit. The moderator is not a vague background filter. It is a product checkpoint with states the user can understand.
Implementation approach
Find the moments where content leaves the private workspace: publishing, emailing, notifying, exporting, or making content visible to another account. Put the moderation checkpoint immediately before those actions.
Define policy outcomes that map to product behavior. For example: allow, request_edits, needs_review, and block. Each outcome should have a safe user message and an operator reason.
Store only what is needed. Keep audit metadata and decision codes, but avoid retaining sensitive raw text longer than the product requires.
Code or config snippet
const moderationSchema = z.object({
outcome: z.enum(["allow", "request_edits", "needs_review", "block"]),
categories: z.array(z.string()),
userMessage: z.string(),
reviewerNote: z.string().optional(),
});
async function moderateBeforePublish(articleId: string, content: string) {
const response = await openai.responses.create({
model: "gpt-4.1-mini",
input: [
{ role: "system", content: "Classify content risk before publishing." },
{ role: "user", content },
],
text: {
format: {
type: "json_schema",
name: "moderation_decision",
schema: zodToJsonSchema(moderationSchema),
strict: true,
},
},
});
const decision = moderationSchema.parse(response.output_parsed);
await db.articleModeration.create({
data: { articleId, outcome: decision.outcome, categories: decision.categories },
});
return decision;
}
Mistakes to avoid
- Running moderation after the content has already been sent or published.
- Using one generic blocked message for every policy outcome.
- Storing sensitive user text in logs without a retention reason.
- Treating model-generated text as safe because the original user input looked safe.
- Blocking users without giving a clear edit path where edits are allowed.
Ready checklist
- Irreversible content actions are identified.
- Moderation runs before publish, send, export, or notification.
- Outcomes map to clear product states.
- User-facing messages are safe and actionable.
- Sensitive logs are minimized.
- Manual review exists for ambiguous cases.
