7d72ee271f7f608f479709483bef2a17e7769a5d
ai-shortlist was sending raw project.description, raw juror feedback
text (feedbackGeneral / feedbackText), and full extracted file text
content directly to OpenAI as part of the user prompt. Its only
"anonymization" was renaming `id` to `anonymousId`. This bypassed the
GDPR contract documented in the file's own header comment ("All project
data is anonymized before AI processing — No personal identifiers in
prompts") and in CLAUDE.md ("All AI calls anonymize data before sending
to OpenAI").
A juror writing "Contact applicant Jane Doe at jane@example.com" in
feedback would ship that PII to OpenAI verbatim every time an admin
generated a shortlist. Same for any names / emails / phone numbers
embedded in extracted PDF text.
generateCategoryShortlist now mirrors the pattern used by ai-filtering /
ai-tagging / ai-award-eligibility:
- toProjectWithRelations + anonymizeProjectsForAI(_, 'FILTERING')
- validateAnonymizedProjects gate that aborts on detected PII
- Aggregates (avgScore, evaluationCount, feedbackSamples) computed
separately and merged onto the anonymized projects; each feedback
sample passes through sanitizeText (strips email/phone/url/ssn) and
is truncated to 1000 chars.
Defense-in-depth fix in the shared helper: anonymizeProjectForAI now
also runs sanitizeText over each file's text_content before emitting it
to AI services. Previously the helper passed extracted file text
through unchanged, which would have leaked PII from PDF body text via
ai-filtering / ai-tagging / ai-award-eligibility too if those services
turn on aiParseFiles.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
No description provided
Languages
TypeScript
99.5%
JavaScript
0.2%
Shell
0.2%
CSS
0.1%