Optimize AI system with batching, token tracking, and GDPR compliance

- Add AIUsageLog model for persistent token/cost tracking - Implement batched processing for all AI services: - Assignment: 15 projects/batch - Filtering: 20 projects/batch - Award eligibility: 20 projects/batch - Mentor matching: 15 projects/batch - Create unified error classification (ai-errors.ts) - Enhance anonymization with comprehensive project data - Add AI usage dashboard to Settings page - Add usage stats endpoints to settings router - Create AI system documentation (5 files) - Create GDPR compliance documentation (2 files) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 11:58:12 +01:00
parent a72e815d3a
commit 928b1c65dc
19 changed files with 4103 additions and 601 deletions
--- a/docs/gdpr/ai-data-processing.md
+++ b/docs/gdpr/ai-data-processing.md
@@ -0,0 +1,217 @@
+# AI Data Processing - GDPR Compliance Documentation
+
+## Overview
+
+This document describes how project data is processed by AI services in the MOPC Platform, ensuring compliance with GDPR Articles 5, 6, 13-14, 25, and 32.
+
+## Legal Basis
+
+| Processing Activity | Legal Basis | GDPR Article |
+|---------------------|-------------|--------------|
+| AI-powered project filtering | Legitimate interest | Art. 6(1)(f) |
+| AI-powered jury assignment | Legitimate interest | Art. 6(1)(f) |
+| AI-powered award eligibility | Legitimate interest | Art. 6(1)(f) |
+| AI-powered mentor matching | Legitimate interest | Art. 6(1)(f) |
+
+**Legitimate Interest Justification:** AI processing is used to efficiently evaluate ocean conservation projects and match appropriate reviewers, directly serving the platform's purpose of managing the Monaco Ocean Protection Challenge.
+
+## Data Minimization (Article 5(1)(c))
+
+The AI system applies strict data minimization:
+
+- **Only necessary fields** sent to AI (no names, emails, phone numbers)
+- **Descriptions truncated** to 300-500 characters maximum
+- **Team size** sent as count only (no member details)
+- **Dates** sent as year-only or ISO date (no timestamps)
+- **IDs replaced** with sequential anonymous identifiers (P1, P2, etc.)
+
+## Anonymization Measures
+
+### Data NEVER Sent to AI
+
+| Data Type | Reason |
+|-----------|--------|
+| Personal names | PII - identifying |
+| Email addresses | PII - identifying |
+| Phone numbers | PII - identifying |
+| Physical addresses | PII - identifying |
+| External URLs | Could identify individuals |
+| Internal project/user IDs | Could be cross-referenced |
+| Team member details | PII - identifying |
+| Internal comments | May contain PII |
+| File content | May contain PII |
+
+### Data Sent to AI (Anonymized)
+
+| Field | Type | Purpose | Anonymization |
+|-------|------|---------|---------------|
+| project_id | String | Reference | Replaced with P1, P2, etc. |
+| title | String | Spam detection | PII patterns removed |
+| description | String | Criteria matching | Truncated, PII stripped |
+| category | Enum | Filtering | As-is (no PII) |
+| ocean_issue | Enum | Topic filtering | As-is (no PII) |
+| country | String | Geographic eligibility | As-is (country name only) |
+| region | String | Regional eligibility | As-is (zone name only) |
+| institution | String | Student identification | As-is (institution name only) |
+| tags | Array | Keyword matching | As-is (no PII expected) |
+| founded_year | Number | Age filtering | Year only, not full date |
+| team_size | Number | Team requirements | Count only |
+| file_count | Number | Document checks | Count only |
+| file_types | Array | File requirements | Type names only |
+| wants_mentorship | Boolean | Mentorship filtering | As-is |
+| submission_source | Enum | Source filtering | As-is |
+| submitted_date | String | Deadline checks | Date only, no time |
+
+## Technical Safeguards
+
+### PII Detection and Stripping
+
+```typescript
+// Patterns detected and removed before AI processing
+const PII_PATTERNS = {
+  email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
+  phone: /(\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g,
+  url: /https?:\/\/[^\s]+/g,
+  ssn: /\d{3}-\d{2}-\d{4}/g,
+  ipv4: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g,
+}
+```
+
+### Validation Before Every AI Call
+
+```typescript
+// GDPR compliance enforced before EVERY API call
+export function enforceGDPRCompliance(data: unknown[]): void {
+  for (const item of data) {
+    const { valid, violations } = validateNoPersonalData(item)
+    if (!valid) {
+      throw new Error(`GDPR compliance check failed: ${violations.join(', ')}`)
+    }
+  }
+}
+```
+
+### ID Anonymization
+
+Real IDs are never sent to AI. Instead:
+- Projects: `cm1abc123...` → `P1`, `P2`, `P3`
+- Jurors: `cm2def456...` → `juror_001`, `juror_002`
+- Results mapped back using secure mapping tables
+
+## Data Retention
+
+| Data Type | Retention | Deletion Method |
+|-----------|-----------|-----------------|
+| AI usage logs | 12 months | Automatic deletion |
+| Anonymized prompts | Not stored | Sent directly to API |
+| AI responses | Not stored | Parsed and discarded |
+
+**Note:** OpenAI does not retain API data for training (per their API Terms). API data is retained for up to 30 days for abuse monitoring, configurable to 0 days.
+
+## Subprocessor: OpenAI
+
+| Aspect | Details |
+|--------|---------|
+| Subprocessor | OpenAI, Inc. |
+| Location | United States |
+| DPA Status | Data Processing Agreement in place |
+| Safeguards | Standard Contractual Clauses (SCCs) |
+| Compliance | SOC 2 Type II, GDPR-compliant |
+| Data Use | API data NOT used for model training |
+
+**OpenAI DPA:** https://openai.com/policies/data-processing-agreement
+
+## Audit Trail
+
+All AI processing is logged:
+
+```typescript
+await prisma.aIUsageLog.create({
+  data: {
+    userId: ctx.user.id,      // Who initiated
+    action: 'FILTERING',       // What type
+    entityType: 'Round',       // What entity
+    entityId: roundId,         // Which entity
+    model: 'gpt-4o',          // What model
+    totalTokens: 1500,        // Resource usage
+    status: 'SUCCESS',        // Outcome
+  },
+})
+```
+
+## Data Subject Rights
+
+### Right of Access (Article 15)
+
+Users can request:
+- What data was processed by AI
+- When AI processing occurred
+- What decisions were made
+
+**Implementation:** Export AI usage logs for user's projects.
+
+### Right to Erasure (Article 17)
+
+When a user requests deletion:
+- AI usage logs for their projects can be deleted
+- No data remains at OpenAI (API data not retained for training)
+
+**Note:** Since only anonymized data is sent to AI, there is no personal data at OpenAI to delete.
+
+### Right to Object (Article 21)
+
+Users can request to opt out of AI processing:
+- Admin can disable AI features per round
+- Manual review fallback available for all AI features
+
+## Risk Assessment
+
+### Risk: PII Leakage to AI Provider
+
+| Factor | Assessment |
+|--------|------------|
+| Likelihood | Very Low |
+| Impact | Medium |
+| Mitigation | Automated PII detection, validation before every call |
+| Residual Risk | Very Low |
+
+### Risk: AI Decision Bias
+
+| Factor | Assessment |
+|--------|------------|
+| Likelihood | Low |
+| Impact | Low |
+| Mitigation | Human review of all AI suggestions, algorithmic fallback |
+| Residual Risk | Very Low |
+
+### Risk: Data Breach at Subprocessor
+
+| Factor | Assessment |
+|--------|------------|
+| Likelihood | Very Low |
+| Impact | Low (only anonymized data) |
+| Mitigation | OpenAI SOC 2 compliance, no PII sent |
+| Residual Risk | Very Low |
+
+## Compliance Checklist
+
+- [x] Data minimization applied (only necessary fields)
+- [x] PII stripped before AI processing
+- [x] Anonymization validated before every API call
+- [x] DPA in place with OpenAI
+- [x] Audit logging of all AI operations
+- [x] Fallback available when AI declined
+- [x] Usage logs retained for 12 months only
+- [x] No personal data stored at subprocessor
+
+## Contact
+
+For questions about AI data processing:
+- Data Protection Officer: [DPO email]
+- Technical Contact: [Tech contact email]
+
+## See Also
+
+- [Platform GDPR Compliance](./platform-gdpr-compliance.md)
+- [AI System Architecture](../architecture/ai-system.md)
+- [AI Services Reference](../architecture/ai-services.md)