Matt 7d72ee271f fix(security): route ai-shortlist through canonical anonymization pipeline
ai-shortlist was sending raw project.description, raw juror feedback
text (feedbackGeneral / feedbackText), and full extracted file text
content directly to OpenAI as part of the user prompt. Its only
"anonymization" was renaming `id` to `anonymousId`. This bypassed the
GDPR contract documented in the file's own header comment ("All project
data is anonymized before AI processing — No personal identifiers in
prompts") and in CLAUDE.md ("All AI calls anonymize data before sending
to OpenAI").

A juror writing "Contact applicant Jane Doe at jane@example.com" in
feedback would ship that PII to OpenAI verbatim every time an admin
generated a shortlist. Same for any names / emails / phone numbers
embedded in extracted PDF text.

generateCategoryShortlist now mirrors the pattern used by ai-filtering /
ai-tagging / ai-award-eligibility:

- toProjectWithRelations + anonymizeProjectsForAI(_, 'FILTERING')
- validateAnonymizedProjects gate that aborts on detected PII
- Aggregates (avgScore, evaluationCount, feedbackSamples) computed
  separately and merged onto the anonymized projects; each feedback
  sample passes through sanitizeText (strips email/phone/url/ssn) and
  is truncated to 1000 chars.

Defense-in-depth fix in the shared helper: anonymizeProjectForAI now
also runs sanitizeText over each file's text_content before emitting it
to AI services. Previously the helper passed extracted file text
through unchanged, which would have leaked PII from PDF body text via
ai-filtering / ai-tagging / ai-award-eligibility too if those services
turn on aiParseFiles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 04:14:58 +02:00
2026-04-28 18:55:12 +02:00
2026-04-28 18:55:12 +02:00
Description
No description provided
25 MiB
Languages
TypeScript 99.5%
JavaScript 0.2%
Shell 0.2%
CSS 0.1%