Optimize AI system with batching, token tracking, and GDPR compliance

- Add AIUsageLog model for persistent token/cost tracking
- Implement batched processing for all AI services:
  - Assignment: 15 projects/batch
  - Filtering: 20 projects/batch
  - Award eligibility: 20 projects/batch
  - Mentor matching: 15 projects/batch
- Create unified error classification (ai-errors.ts)
- Enhance anonymization with comprehensive project data
- Add AI usage dashboard to Settings page
- Add usage stats endpoints to settings router
- Create AI system documentation (5 files)
- Create GDPR compliance documentation (2 files)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
2026-02-03 11:58:12 +01:00
parent a72e815d3a
commit 928b1c65dc
19 changed files with 4103 additions and 601 deletions

View File

@@ -0,0 +1,176 @@
# AI Configuration Guide
## Admin Settings
Navigate to **Settings → AI** to configure AI features.
### Available Settings
| Setting | Description | Default |
|---------|-------------|---------|
| `ai_enabled` | Master switch for AI features | `true` |
| `ai_provider` | AI provider (OpenAI only currently) | `openai` |
| `ai_model` | Model to use | `gpt-4o` |
| `openai_api_key` | API key (encrypted) | - |
| `ai_send_descriptions` | Include project descriptions | `true` |
## Supported Models
### Standard Models (GPT)
| Model | Speed | Quality | Cost | Recommended For |
|-------|-------|---------|------|-----------------|
| `gpt-4o` | Fast | Best | Medium | Production use |
| `gpt-4o-mini` | Very Fast | Good | Low | High-volume, cost-sensitive |
| `gpt-4-turbo` | Medium | Very Good | High | Complex analysis |
| `gpt-3.5-turbo` | Very Fast | Basic | Very Low | Simple tasks only |
### Reasoning Models (o-series)
| Model | Speed | Quality | Cost | Recommended For |
|-------|-------|---------|------|-----------------|
| `o1` | Slow | Excellent | Very High | Complex reasoning tasks |
| `o1-mini` | Medium | Very Good | High | Moderate complexity |
| `o3-mini` | Medium | Good | Medium | Cost-effective reasoning |
**Note:** Reasoning models use different API parameters:
- `max_completion_tokens` instead of `max_tokens`
- No `temperature` parameter
- No `response_format: json_object`
- System messages become "developer" role
The platform automatically handles these differences via `buildCompletionParams()`.
## Cost Estimates
### Per 1M Tokens (USD)
| Model | Input | Output |
|-------|-------|--------|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4-turbo | $10.00 | $30.00 |
| gpt-3.5-turbo | $0.50 | $1.50 |
| o1 | $15.00 | $60.00 |
| o1-mini | $3.00 | $12.00 |
| o3-mini | $1.10 | $4.40 |
### Typical Usage Per Operation
| Operation | Projects | Est. Tokens | Est. Cost (gpt-4o) |
|-----------|----------|-------------|-------------------|
| Filter 100 projects | 100 | ~10,000 | ~$0.10 |
| Assign 50 projects | 50 | ~15,000 | ~$0.15 |
| Award eligibility | 100 | ~10,000 | ~$0.10 |
| Mentor matching | 60 | ~12,000 | ~$0.12 |
## Rate Limits
OpenAI enforces rate limits based on your account tier:
| Tier | Requests/Min | Tokens/Min |
|------|--------------|------------|
| Tier 1 | 500 | 30,000 |
| Tier 2 | 5,000 | 450,000 |
| Tier 3+ | Higher | Higher |
The platform handles rate limits with:
- Batch processing (reduces request count)
- Error classification (detects rate limit errors)
- Manual retry guidance in UI
## Environment Variables
```env
# Required for AI features
OPENAI_API_KEY=sk-your-api-key
# Optional overrides (normally set via admin UI)
OPENAI_MODEL=gpt-4o
```
## Testing Connection
1. Go to **Settings → AI**
2. Enter your OpenAI API key
3. Click **Save AI Settings**
4. Click **Test Connection**
The test verifies:
- API key validity
- Model availability
- Basic request/response
## Monitoring Usage
### Admin Dashboard
Navigate to **Settings → AI** to see:
- Current month cost
- Token usage by feature
- Usage by model
- 30-day usage trend
### Database Queries
```sql
-- Current month usage
SELECT
action,
SUM(total_tokens) as tokens,
SUM(estimated_cost_usd) as cost
FROM ai_usage_log
WHERE created_at >= date_trunc('month', NOW())
GROUP BY action;
-- Top users by cost
SELECT
u.email,
SUM(l.estimated_cost_usd) as total_cost
FROM ai_usage_log l
JOIN users u ON l.user_id = u.id
GROUP BY u.id
ORDER BY total_cost DESC
LIMIT 10;
```
## Troubleshooting
### "Model not found"
- Verify the model is available with your API key tier
- Some models (o1, o3) require specific API access
- Try a more common model like `gpt-4o-mini`
### "Rate limit exceeded"
- Wait a few minutes before retrying
- Consider using a smaller batch size
- Upgrade your OpenAI account tier
### "All projects flagged"
1. Check **Settings → AI** for correct API key
2. Verify model is available
3. Check console logs for specific error messages
4. Test connection with the button in settings
### "Invalid API key"
1. Verify the key starts with `sk-`
2. Check the key hasn't been revoked in OpenAI dashboard
3. Ensure no extra whitespace in the key
## Best Practices
1. **Use gpt-4o-mini** for high-volume operations (filtering many projects)
2. **Use gpt-4o** for critical decisions (final assignments)
3. **Monitor costs** regularly via the usage dashboard
4. **Test with small batches** before running on full dataset
5. **Keep descriptions enabled** for better matching accuracy
## See Also
- [AI System Architecture](./ai-system.md)
- [AI Services Reference](./ai-services.md)
- [AI Error Handling](./ai-errors.md)

View File

@@ -0,0 +1,208 @@
# AI Error Handling Guide
## Error Types
The AI system classifies errors into these categories:
| Error Type | Cause | User Message | Retryable |
|------------|-------|--------------|-----------|
| `rate_limit` | Too many requests | "Rate limit exceeded. Wait a few minutes." | Yes |
| `quota_exceeded` | Billing limit | "API quota exceeded. Check billing." | No |
| `model_not_found` | Invalid model | "Model not available. Check settings." | No |
| `invalid_api_key` | Bad API key | "Invalid API key. Check settings." | No |
| `context_length` | Prompt too large | "Request too large. Try fewer items." | Yes* |
| `parse_error` | AI returned invalid JSON | "Response parse error. Flagged for review." | Yes |
| `timeout` | Request took too long | "Request timed out. Try again." | Yes |
| `network_error` | Connection issue | "Network error. Check connection." | Yes |
| `content_filter` | Content blocked | "Content filtered. Check input data." | No |
| `server_error` | OpenAI server issue | "Server error. Try again later." | Yes |
*Context length errors can be retried with smaller batches.
## Error Classification
```typescript
import { classifyAIError, shouldRetry, getRetryDelay } from '@/server/services/ai-errors'
try {
const response = await openai.chat.completions.create(params)
} catch (error) {
const classified = classifyAIError(error)
console.error(`AI Error: ${classified.type} - ${classified.message}`)
if (shouldRetry(classified.type)) {
const delay = getRetryDelay(classified.type)
// Wait and retry
} else {
// Fall back to algorithm
}
}
```
## Graceful Degradation
When AI fails, the platform automatically handles it:
### AI Assignment
1. Logs the error
2. Falls back to algorithmic assignment:
- Matches by expertise tag overlap
- Balances workload across jurors
- Respects constraints (max assignments)
### AI Filtering
1. Logs the error
2. Flags all projects for manual review
3. Returns error message to admin
### Award Eligibility
1. Logs the error
2. Returns all projects as "needs manual review"
3. Admin can apply deterministic rules instead
### Mentor Matching
1. Logs the error
2. Falls back to keyword-based matching
3. Uses availability scoring
## Retry Strategy
| Error Type | Retry Count | Delay |
|------------|-------------|-------|
| `rate_limit` | 3 | Exponential (1s, 2s, 4s) |
| `timeout` | 2 | Fixed 5s |
| `network_error` | 3 | Exponential (1s, 2s, 4s) |
| `server_error` | 3 | Exponential (2s, 4s, 8s) |
| `parse_error` | 1 | None |
## Monitoring
### Error Logging
All AI errors are logged to:
1. Console (development)
2. `AIUsageLog` table with `status: 'ERROR'`
3. `AuditLog` for security-relevant failures
### Checking Errors
```sql
-- Recent AI errors
SELECT
created_at,
action,
model,
error_message
FROM ai_usage_log
WHERE status = 'ERROR'
ORDER BY created_at DESC
LIMIT 20;
-- Error rate by action
SELECT
action,
COUNT(*) FILTER (WHERE status = 'ERROR') as errors,
COUNT(*) as total,
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'ERROR') / COUNT(*), 2) as error_rate
FROM ai_usage_log
GROUP BY action;
```
## Troubleshooting
### High Error Rate
1. Check OpenAI status page for outages
2. Verify API key is valid and not rate-limited
3. Review error messages in logs
4. Consider switching to a different model
### Consistent Parse Errors
1. The AI model may be returning malformed JSON
2. Try a more capable model (gpt-4o instead of gpt-3.5)
3. Check if prompts are being truncated
4. Review recent responses in logs
### All Requests Failing
1. Test connection in Settings → AI
2. Verify API key hasn't been revoked
3. Check billing status in OpenAI dashboard
4. Review network connectivity
### Slow Responses
1. Consider using gpt-4o-mini for speed
2. Reduce batch sizes
3. Check for rate limiting (429 errors)
4. Monitor OpenAI latency
## Error Response Format
When errors occur, services return structured responses:
```typescript
// AI Assignment error response
{
success: false,
suggestions: [],
error: "Rate limit exceeded. Wait a few minutes and try again.",
fallbackUsed: true,
}
// AI Filtering error response
{
projectId: "...",
meetsCriteria: false,
confidence: 0,
reasoning: "AI error: Rate limit exceeded",
flagForReview: true,
}
```
## Implementing Custom Error Handling
```typescript
import {
classifyAIError,
shouldRetry,
getRetryDelay,
getUserFriendlyMessage,
logAIError,
} from '@/server/services/ai-errors'
async function callAIWithRetry<T>(
operation: () => Promise<T>,
serviceName: string,
maxRetries: number = 3
): Promise<T> {
let lastError: Error | null = null
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await operation()
} catch (error) {
const classified = classifyAIError(error)
logAIError(serviceName, 'operation', classified)
if (!shouldRetry(classified.type) || attempt === maxRetries) {
throw new Error(getUserFriendlyMessage(classified.type))
}
const delay = getRetryDelay(classified.type) * attempt
await new Promise(resolve => setTimeout(resolve, delay))
lastError = error as Error
}
}
throw lastError
}
```
## See Also
- [AI System Architecture](./ai-system.md)
- [AI Configuration Guide](./ai-configuration.md)
- [AI Services Reference](./ai-services.md)

View File

@@ -0,0 +1,222 @@
# AI Prompts Reference
This document describes the prompts used by each AI service. All prompts are optimized for token efficiency while maintaining accuracy.
## Design Principles
1. **Concise system prompts** - Under 100 tokens where possible
2. **Structured output** - JSON format for reliable parsing
3. **Clear field names** - Consistent naming across services
4. **Score ranges** - 0-1 for confidence, 1-10 for quality
## Filtering Prompt
**Purpose:** Evaluate projects against admin-defined criteria
### System Prompt
```
Project screening assistant. Evaluate each project against the criteria.
Return JSON: {"projects": [{project_id, meets_criteria: bool, confidence: 0-1, reasoning: str, quality_score: 1-10, spam_risk: bool}]}
Assess description quality and relevance objectively.
```
### User Prompt Template
```
CRITERIA: {criteria_text}
PROJECTS: {anonymized_project_array}
Evaluate each project against the criteria. Return JSON.
```
### Example Response
```json
{
"projects": [
{
"project_id": "P1",
"meets_criteria": true,
"confidence": 0.9,
"reasoning": "Project focuses on coral reef restoration, matching ocean conservation criteria",
"quality_score": 8,
"spam_risk": false
}
]
}
```
---
## Assignment Prompt
**Purpose:** Match jurors to projects by expertise
### System Prompt
```
Match jurors to projects by expertise. Return JSON assignments.
Each: {juror_id, project_id, confidence_score: 0-1, expertise_match_score: 0-1, reasoning: str (1-2 sentences)}
Distribute workload fairly. Avoid assigning jurors at capacity.
```
### User Prompt Template
```
JURORS: {anonymized_juror_array}
PROJECTS: {anonymized_project_array}
CONSTRAINTS: {N} reviews/project, max {M}/juror
EXISTING: {existing_assignments}
Return JSON: {"assignments": [...]}
```
### Example Response
```json
{
"assignments": [
{
"juror_id": "juror_001",
"project_id": "project_005",
"confidence_score": 0.85,
"expertise_match_score": 0.9,
"reasoning": "Juror expertise in marine biology aligns with coral restoration project"
}
]
}
```
---
## Award Eligibility Prompt
**Purpose:** Determine project eligibility for special awards
### System Prompt
```
Award eligibility evaluator. Evaluate projects against criteria, return JSON.
Format: {"evaluations": [{project_id, eligible: bool, confidence: 0-1, reasoning: str}]}
Be objective. Base evaluation only on provided data. No personal identifiers in reasoning.
```
### User Prompt Template
```
CRITERIA: {criteria_text}
PROJECTS: {anonymized_project_array}
Evaluate eligibility for each project.
```
### Example Response
```json
{
"evaluations": [
{
"project_id": "P3",
"eligible": true,
"confidence": 0.95,
"reasoning": "Project is based in Italy and focuses on Mediterranean biodiversity"
}
]
}
```
---
## Mentor Matching Prompt
**Purpose:** Recommend mentors for projects
### System Prompt
```
Match mentors to projects by expertise. Return JSON.
Format for each project: {"matches": [{project_id, mentor_matches: [{mentor_index, confidence_score: 0-1, expertise_match_score: 0-1, reasoning: str}]}]}
Rank by suitability. Consider expertise alignment and availability.
```
### User Prompt Template
```
PROJECTS:
P1: Category=STARTUP, Issue=HABITAT_RESTORATION, Tags=[coral, reef], Desc=Project description...
P2: ...
MENTORS:
0: Expertise=[marine biology, coral], Availability=2/5
1: Expertise=[business development], Availability=0/3
...
For each project, rank top {N} mentors.
```
### Example Response
```json
{
"matches": [
{
"project_id": "P1",
"mentor_matches": [
{
"mentor_index": 0,
"confidence_score": 0.92,
"expertise_match_score": 0.95,
"reasoning": "Marine biology expertise directly matches coral restoration focus"
}
]
}
]
}
```
---
## Anonymized Data Structure
All projects sent to AI use this structure:
```typescript
interface AnonymizedProjectForAI {
project_id: string // P1, P2, etc.
title: string // Sanitized (PII removed)
description: string // Truncated + PII stripped
category: string | null // STARTUP | BUSINESS_CONCEPT
ocean_issue: string | null
country: string | null
region: string | null
institution: string | null
tags: string[]
founded_year: number | null
team_size: number
has_description: boolean
file_count: number
file_types: string[]
wants_mentorship: boolean
submission_source: string
submitted_date: string | null // YYYY-MM-DD
}
```
### What Gets Stripped
- Team/company names
- Email addresses
- Phone numbers
- External URLs
- Real project/user IDs
- Internal comments
---
## Token Optimization Tips
1. **Batch projects** - Process 15-20 per request
2. **Truncate descriptions** - 300-500 chars based on task
3. **Use abbreviated fields** - `desc` vs `description`
4. **Compress constraints** - Inline in prompt
5. **Request specific fields** - Only what you need
## Prompt Versioning
| Service | Version | Last Updated |
|---------|---------|--------------|
| Filtering | 2.0 | 2025-01 |
| Assignment | 2.0 | 2025-01 |
| Award Eligibility | 2.0 | 2025-01 |
| Mentor Matching | 2.0 | 2025-01 |
## See Also
- [AI System Architecture](./ai-system.md)
- [AI Services Reference](./ai-services.md)
- [AI Configuration Guide](./ai-configuration.md)

View File

@@ -0,0 +1,249 @@
# AI Services Reference
## 1. AI Filtering Service
**File:** `src/server/services/ai-filtering.ts`
**Purpose:** Evaluate projects against admin-defined criteria text
### Input
- List of projects (anonymized)
- Criteria text (e.g., "Projects must be based in Mediterranean region")
- Rule configuration (PASS/REJECT/FLAG actions)
### Output
Per-project result:
- `meets_criteria` - Boolean
- `confidence` - 0-1 score
- `reasoning` - Explanation
- `quality_score` - 1-10 rating
- `spam_risk` - Boolean flag
### Configuration
- **Batch Size:** 20 projects per API call
- **Description Limit:** 500 characters
- **Token Usage:** ~1500-2500 tokens per batch
### Example Criteria
- "Filter out any project without a description"
- "Only include projects founded after 2020"
- "Reject projects with fewer than 2 team members"
- "Projects must be based in Mediterranean region"
### Usage
```typescript
import { aiFilterProjects } from '@/server/services/ai-filtering'
const results = await aiFilterProjects(
projects,
'Only include projects with ocean conservation focus',
userId,
roundId
)
```
---
## 2. AI Assignment Service
**File:** `src/server/services/ai-assignment.ts`
**Purpose:** Match jurors to projects based on expertise alignment
### Input
- List of jurors with expertise tags
- List of projects with tags/category
- Constraints:
- Required reviews per project
- Max assignments per juror
- Existing assignments (to avoid duplicates)
### Output
Suggested assignments:
- `jurorId` - Juror to assign
- `projectId` - Project to assign
- `confidenceScore` - 0-1 match confidence
- `expertiseMatchScore` - 0-1 expertise overlap
- `reasoning` - Explanation
### Configuration
- **Batch Size:** 15 projects per batch (all jurors included)
- **Description Limit:** 300 characters
- **Token Usage:** ~2000-3500 tokens per batch
### Fallback Algorithm
When AI is unavailable, uses:
1. Tag overlap scoring (60% weight)
2. Load balancing (40% weight)
3. Constraint satisfaction
### Usage
```typescript
import { generateAIAssignments } from '@/server/services/ai-assignment'
const result = await generateAIAssignments(
jurors,
projects,
{
requiredReviewsPerProject: 3,
maxAssignmentsPerJuror: 10,
existingAssignments: [],
},
userId,
roundId
)
```
---
## 3. Award Eligibility Service
**File:** `src/server/services/ai-award-eligibility.ts`
**Purpose:** Determine which projects qualify for special awards
### Input
- Award criteria text (plain language)
- List of projects (anonymized)
- Optional: Auto-tag rules (field-based matching)
### Output
Per-project:
- `eligible` - Boolean
- `confidence` - 0-1 score
- `reasoning` - Explanation
- `method` - 'AI' or 'AUTO'
### Configuration
- **Batch Size:** 20 projects per API call
- **Description Limit:** 400 characters
- **Token Usage:** ~1500-2500 tokens per batch
### Auto-Tag Rules
Deterministic rules can be combined with AI:
```typescript
const rules: AutoTagRule[] = [
{ field: 'country', operator: 'equals', value: 'Italy' },
{ field: 'competitionCategory', operator: 'equals', value: 'STARTUP' },
]
```
### Usage
```typescript
import { aiInterpretCriteria, applyAutoTagRules } from '@/server/services/ai-award-eligibility'
// Deterministic matching
const autoResults = applyAutoTagRules(rules, projects)
// AI-based criteria interpretation
const aiResults = await aiInterpretCriteria(
'Projects focusing on marine biodiversity',
projects,
userId,
awardId
)
```
---
## 4. Mentor Matching Service
**File:** `src/server/services/mentor-matching.ts`
**Purpose:** Recommend mentors for projects based on expertise
### Input
- Project details (single or batch)
- Available mentors with expertise tags and availability
### Output
Ranked list of mentor matches:
- `mentorId` - Mentor ID
- `confidenceScore` - 0-1 overall match
- `expertiseMatchScore` - 0-1 expertise overlap
- `reasoning` - Explanation
### Configuration
- **Batch Size:** 15 projects per batch
- **Description Limit:** 350 characters
- **Token Usage:** ~1500-2500 tokens per batch
### Fallback Algorithm
Keyword-based matching when AI unavailable:
1. Extract keywords from project tags/description
2. Match against mentor expertise tags
3. Factor in availability (assignments vs max)
### Usage
```typescript
import {
getAIMentorSuggestions,
getAIMentorSuggestionsBatch
} from '@/server/services/mentor-matching'
// Single project
const matches = await getAIMentorSuggestions(prisma, projectId, 5, userId)
// Batch processing
const batchResults = await getAIMentorSuggestionsBatch(
prisma,
projectIds,
5,
userId
)
```
---
## Common Patterns
### Token Logging
All services log usage to `AIUsageLog`:
```typescript
await logAIUsage({
userId,
action: 'FILTERING',
entityType: 'Round',
entityId: roundId,
model,
promptTokens: usage.promptTokens,
completionTokens: usage.completionTokens,
totalTokens: usage.totalTokens,
batchSize: projects.length,
itemsProcessed: projects.length,
status: 'SUCCESS',
})
```
### Error Handling
All services use unified error classification:
```typescript
try {
// AI call
} catch (error) {
const classified = classifyAIError(error)
logAIError('ServiceName', 'functionName', classified)
if (classified.retryable) {
// Retry logic
} else {
// Fall back to algorithm
}
}
```
### Anonymization
All services anonymize before sending to AI:
```typescript
const { anonymized, mappings } = anonymizeProjectsForAI(projects, 'FILTERING')
if (!validateAnonymizedProjects(anonymized)) {
throw new Error('GDPR compliance check failed')
}
```
## See Also
- [AI System Architecture](./ai-system.md)
- [AI Configuration Guide](./ai-configuration.md)
- [AI Error Handling](./ai-errors.md)

View File

@@ -0,0 +1,143 @@
# MOPC AI System Architecture
## Overview
The MOPC platform uses AI (OpenAI GPT models) for four core functions:
1. **Project Filtering** - Automated eligibility screening against admin-defined criteria
2. **Jury Assignment** - Smart juror-project matching based on expertise alignment
3. **Award Eligibility** - Special award qualification determination
4. **Mentor Matching** - Mentor-project recommendations based on expertise
## System Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ADMIN INTERFACE │
│ (Rounds, Filtering, Awards, Assignments, Mentor Assignment) │
└─────────────────────────┬───────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ tRPC ROUTERS │
│ filtering.ts │ assignment.ts │ specialAward.ts │ mentor.ts │
└─────────────────────────┬───────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ AI SERVICES │
│ ai-filtering.ts │ ai-assignment.ts │ ai-award-eligibility.ts │
│ │ mentor-matching.ts │
└─────────────────────────┬───────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ ANONYMIZATION LAYER │
│ anonymization.ts │
│ - PII stripping - ID replacement - Text sanitization │
└─────────────────────────┬───────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ OPENAI CLIENT │
│ lib/openai.ts │
│ - Model detection - Parameter building - Token tracking │
└─────────────────────────┬───────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ OPENAI API │
│ GPT-4o │ GPT-4o-mini │ o1 │ o3-mini (configurable) │
└─────────────────────────────────────────────────────────────────┘
```
## Data Flow
1. **Admin triggers AI action** (filter projects, suggest assignments)
2. **Router validates permissions** and fetches data from database
3. **AI Service prepares data** for processing
4. **Anonymization Layer strips PII**, replaces IDs, sanitizes text
5. **OpenAI Client builds request** with correct parameters for model type
6. **Request sent to OpenAI API**
7. **Response parsed and de-anonymized**
8. **Results stored in database**, usage logged
9. **UI updated** with results
## Key Components
### OpenAI Client (`lib/openai.ts`)
Handles communication with OpenAI API:
- `getOpenAI()` - Get configured OpenAI client
- `getConfiguredModel()` - Get the admin-selected model
- `buildCompletionParams()` - Build API parameters (handles reasoning vs standard models)
- `isReasoningModel()` - Detect o1/o3/o4 series models
### Anonymization Service (`server/services/anonymization.ts`)
GDPR-compliant data preparation:
- `anonymizeForAI()` - Basic anonymization for assignment
- `anonymizeProjectsForAI()` - Comprehensive project anonymization for filtering/awards
- `validateAnonymization()` - Verify no PII in anonymized data
- `deanonymizeResults()` - Map AI results back to real IDs
### Token Tracking (`server/utils/ai-usage.ts`)
Cost and usage monitoring:
- `logAIUsage()` - Log API calls to database
- `calculateCost()` - Compute estimated cost by model
- `getAIUsageStats()` - Retrieve usage statistics
- `getCurrentMonthCost()` - Get current billing period totals
### Error Handling (`server/services/ai-errors.ts`)
Unified error classification:
- `classifyAIError()` - Categorize API errors
- `shouldRetry()` - Determine if error is retryable
- `getUserFriendlyMessage()` - Get human-readable error messages
## Batching Strategy
All AI services process data in batches to avoid token limits:
| Service | Batch Size | Reason |
|---------|------------|--------|
| AI Assignment | 15 projects | Include all jurors per batch |
| AI Filtering | 20 projects | Balance throughput and cost |
| Award Eligibility | 20 projects | Consistent with filtering |
| Mentor Matching | 15 projects | All mentors per batch |
## Fallback Behavior
All AI services have algorithmic fallbacks when AI is unavailable:
1. **Assignment** - Expertise tag matching + load balancing
2. **Filtering** - Flag all projects for manual review
3. **Award Eligibility** - Flag all for manual review
4. **Mentor Matching** - Keyword-based matching algorithm
## Security Considerations
1. **API keys** stored encrypted in database
2. **No PII** sent to OpenAI (enforced by anonymization)
3. **Audit logging** of all AI operations
4. **Role-based access** to AI features (admin only)
## Files Reference
| File | Purpose |
|------|---------|
| `lib/openai.ts` | OpenAI client configuration |
| `server/services/ai-filtering.ts` | Project filtering service |
| `server/services/ai-assignment.ts` | Jury assignment service |
| `server/services/ai-award-eligibility.ts` | Award eligibility service |
| `server/services/mentor-matching.ts` | Mentor matching service |
| `server/services/anonymization.ts` | Data anonymization |
| `server/services/ai-errors.ts` | Error classification |
| `server/utils/ai-usage.ts` | Token tracking |
## See Also
- [AI Services Reference](./ai-services.md)
- [AI Configuration Guide](./ai-configuration.md)
- [AI Error Handling](./ai-errors.md)
- [AI Prompts Reference](./ai-prompts.md)

View File

@@ -0,0 +1,217 @@
# AI Data Processing - GDPR Compliance Documentation
## Overview
This document describes how project data is processed by AI services in the MOPC Platform, ensuring compliance with GDPR Articles 5, 6, 13-14, 25, and 32.
## Legal Basis
| Processing Activity | Legal Basis | GDPR Article |
|---------------------|-------------|--------------|
| AI-powered project filtering | Legitimate interest | Art. 6(1)(f) |
| AI-powered jury assignment | Legitimate interest | Art. 6(1)(f) |
| AI-powered award eligibility | Legitimate interest | Art. 6(1)(f) |
| AI-powered mentor matching | Legitimate interest | Art. 6(1)(f) |
**Legitimate Interest Justification:** AI processing is used to efficiently evaluate ocean conservation projects and match appropriate reviewers, directly serving the platform's purpose of managing the Monaco Ocean Protection Challenge.
## Data Minimization (Article 5(1)(c))
The AI system applies strict data minimization:
- **Only necessary fields** sent to AI (no names, emails, phone numbers)
- **Descriptions truncated** to 300-500 characters maximum
- **Team size** sent as count only (no member details)
- **Dates** sent as year-only or ISO date (no timestamps)
- **IDs replaced** with sequential anonymous identifiers (P1, P2, etc.)
## Anonymization Measures
### Data NEVER Sent to AI
| Data Type | Reason |
|-----------|--------|
| Personal names | PII - identifying |
| Email addresses | PII - identifying |
| Phone numbers | PII - identifying |
| Physical addresses | PII - identifying |
| External URLs | Could identify individuals |
| Internal project/user IDs | Could be cross-referenced |
| Team member details | PII - identifying |
| Internal comments | May contain PII |
| File content | May contain PII |
### Data Sent to AI (Anonymized)
| Field | Type | Purpose | Anonymization |
|-------|------|---------|---------------|
| project_id | String | Reference | Replaced with P1, P2, etc. |
| title | String | Spam detection | PII patterns removed |
| description | String | Criteria matching | Truncated, PII stripped |
| category | Enum | Filtering | As-is (no PII) |
| ocean_issue | Enum | Topic filtering | As-is (no PII) |
| country | String | Geographic eligibility | As-is (country name only) |
| region | String | Regional eligibility | As-is (zone name only) |
| institution | String | Student identification | As-is (institution name only) |
| tags | Array | Keyword matching | As-is (no PII expected) |
| founded_year | Number | Age filtering | Year only, not full date |
| team_size | Number | Team requirements | Count only |
| file_count | Number | Document checks | Count only |
| file_types | Array | File requirements | Type names only |
| wants_mentorship | Boolean | Mentorship filtering | As-is |
| submission_source | Enum | Source filtering | As-is |
| submitted_date | String | Deadline checks | Date only, no time |
## Technical Safeguards
### PII Detection and Stripping
```typescript
// Patterns detected and removed before AI processing
const PII_PATTERNS = {
email: /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g,
phone: /(\+?\d{1,3}[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}/g,
url: /https?:\/\/[^\s]+/g,
ssn: /\d{3}-\d{2}-\d{4}/g,
ipv4: /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/g,
}
```
### Validation Before Every AI Call
```typescript
// GDPR compliance enforced before EVERY API call
export function enforceGDPRCompliance(data: unknown[]): void {
for (const item of data) {
const { valid, violations } = validateNoPersonalData(item)
if (!valid) {
throw new Error(`GDPR compliance check failed: ${violations.join(', ')}`)
}
}
}
```
### ID Anonymization
Real IDs are never sent to AI. Instead:
- Projects: `cm1abc123...``P1`, `P2`, `P3`
- Jurors: `cm2def456...``juror_001`, `juror_002`
- Results mapped back using secure mapping tables
## Data Retention
| Data Type | Retention | Deletion Method |
|-----------|-----------|-----------------|
| AI usage logs | 12 months | Automatic deletion |
| Anonymized prompts | Not stored | Sent directly to API |
| AI responses | Not stored | Parsed and discarded |
**Note:** OpenAI does not retain API data for training (per their API Terms). API data is retained for up to 30 days for abuse monitoring, configurable to 0 days.
## Subprocessor: OpenAI
| Aspect | Details |
|--------|---------|
| Subprocessor | OpenAI, Inc. |
| Location | United States |
| DPA Status | Data Processing Agreement in place |
| Safeguards | Standard Contractual Clauses (SCCs) |
| Compliance | SOC 2 Type II, GDPR-compliant |
| Data Use | API data NOT used for model training |
**OpenAI DPA:** https://openai.com/policies/data-processing-agreement
## Audit Trail
All AI processing is logged:
```typescript
await prisma.aIUsageLog.create({
data: {
userId: ctx.user.id, // Who initiated
action: 'FILTERING', // What type
entityType: 'Round', // What entity
entityId: roundId, // Which entity
model: 'gpt-4o', // What model
totalTokens: 1500, // Resource usage
status: 'SUCCESS', // Outcome
},
})
```
## Data Subject Rights
### Right of Access (Article 15)
Users can request:
- What data was processed by AI
- When AI processing occurred
- What decisions were made
**Implementation:** Export AI usage logs for user's projects.
### Right to Erasure (Article 17)
When a user requests deletion:
- AI usage logs for their projects can be deleted
- No data remains at OpenAI (API data not retained for training)
**Note:** Since only anonymized data is sent to AI, there is no personal data at OpenAI to delete.
### Right to Object (Article 21)
Users can request to opt out of AI processing:
- Admin can disable AI features per round
- Manual review fallback available for all AI features
## Risk Assessment
### Risk: PII Leakage to AI Provider
| Factor | Assessment |
|--------|------------|
| Likelihood | Very Low |
| Impact | Medium |
| Mitigation | Automated PII detection, validation before every call |
| Residual Risk | Very Low |
### Risk: AI Decision Bias
| Factor | Assessment |
|--------|------------|
| Likelihood | Low |
| Impact | Low |
| Mitigation | Human review of all AI suggestions, algorithmic fallback |
| Residual Risk | Very Low |
### Risk: Data Breach at Subprocessor
| Factor | Assessment |
|--------|------------|
| Likelihood | Very Low |
| Impact | Low (only anonymized data) |
| Mitigation | OpenAI SOC 2 compliance, no PII sent |
| Residual Risk | Very Low |
## Compliance Checklist
- [x] Data minimization applied (only necessary fields)
- [x] PII stripped before AI processing
- [x] Anonymization validated before every API call
- [x] DPA in place with OpenAI
- [x] Audit logging of all AI operations
- [x] Fallback available when AI declined
- [x] Usage logs retained for 12 months only
- [x] No personal data stored at subprocessor
## Contact
For questions about AI data processing:
- Data Protection Officer: [DPO email]
- Technical Contact: [Tech contact email]
## See Also
- [Platform GDPR Compliance](./platform-gdpr-compliance.md)
- [AI System Architecture](../architecture/ai-system.md)
- [AI Services Reference](../architecture/ai-services.md)

View File

@@ -0,0 +1,324 @@
# MOPC Platform - GDPR Compliance Documentation
## 1. Data Controller Information
| Field | Value |
|-------|-------|
| **Data Controller** | Monaco Ocean Protection Challenge |
| **Contact** | [Data Protection Officer email] |
| **Platform** | monaco-opc.com |
| **Jurisdiction** | Monaco |
---
## 2. Personal Data Collected
### 2.1 User Account Data
| Data Type | Purpose | Legal Basis | Retention |
|-----------|---------|-------------|-----------|
| Email address | Account identification, notifications | Contract performance | Account lifetime + 2 years |
| Name | Display in platform, certificates | Contract performance | Account lifetime + 2 years |
| Phone number (optional) | WhatsApp notifications | Consent | Until consent withdrawn |
| Profile photo (optional) | Platform personalization | Consent | Until deleted by user |
| Role | Access control | Contract performance | Account lifetime |
| IP address | Security, audit logging | Legitimate interest | 12 months |
| User agent | Security, debugging | Legitimate interest | 12 months |
### 2.2 Project/Application Data
| Data Type | Purpose | Legal Basis | Retention |
|-----------|---------|-------------|-----------|
| Project title | Competition entry | Contract performance | Program lifetime + 5 years |
| Project description | Evaluation | Contract performance | Program lifetime + 5 years |
| Team information | Contact, evaluation | Contract performance | Program lifetime + 5 years |
| Uploaded files | Evaluation | Contract performance | Program lifetime + 5 years |
| Country/Region | Geographic eligibility | Contract performance | Program lifetime + 5 years |
### 2.3 Evaluation Data
| Data Type | Purpose | Legal Basis | Retention |
|-----------|---------|-------------|-----------|
| Jury evaluations | Competition judging | Contract performance | Program lifetime + 5 years |
| Scores and comments | Competition judging | Contract performance | Program lifetime + 5 years |
| Evaluation timestamps | Audit trail | Legitimate interest | Program lifetime + 5 years |
### 2.4 Technical Data
| Data Type | Purpose | Legal Basis | Retention |
|-----------|---------|-------------|-----------|
| Session tokens | Authentication | Contract performance | Session duration |
| Magic link tokens | Passwordless login | Contract performance | 15 minutes |
| Audit logs | Security, compliance | Legitimate interest | 12 months |
| AI usage logs | Cost tracking, debugging | Legitimate interest | 12 months |
---
## 3. Data Processing Purposes
### 3.1 Primary Purposes
1. **Competition Management** - Managing project submissions, evaluations, and results
2. **User Authentication** - Secure access to the platform
3. **Communication** - Sending notifications about evaluations, deadlines, results
### 3.2 Secondary Purposes
1. **Analytics** - Understanding platform usage (aggregated, anonymized)
2. **Security** - Detecting and preventing unauthorized access
3. **AI Processing** - Automated filtering and matching (anonymized data only)
---
## 4. Third-Party Data Sharing
### 4.1 Subprocessors
| Subprocessor | Purpose | Data Shared | Location | DPA |
|--------------|---------|-------------|----------|-----|
| OpenAI | AI processing | Anonymized project data only | USA | Yes |
| MinIO/S3 | File storage | Uploaded files | [Location] | Yes |
| Poste.io | Email delivery | Email addresses, notification content | [Location] | Yes |
### 4.2 Data Shared with OpenAI
**Sent to OpenAI:**
- Anonymized project titles (PII sanitized)
- Truncated descriptions (500 chars max)
- Project category, tags, country
- Team size (count only)
- Founded year (year only)
**NEVER sent to OpenAI:**
- Names of any individuals
- Email addresses
- Phone numbers
- Physical addresses
- External URLs
- Internal database IDs
- File contents
For full details, see [AI Data Processing](./ai-data-processing.md).
---
## 5. Data Subject Rights
### 5.1 Right of Access (Article 15)
Users can request a copy of their personal data via:
- Profile → Settings → Download My Data
- Email to [DPO email]
**Response Time:** Within 30 days
### 5.2 Right to Rectification (Article 16)
Users can update their data via:
- Profile → Settings → Edit Profile
- Contact support for assistance
**Response Time:** Immediately for self-service, 72 hours for support
### 5.3 Right to Erasure (Article 17)
Users can request deletion via:
- Profile → Settings → Delete Account
- Email to [DPO email]
**Exceptions:** Data required for legal obligations or ongoing competitions
**Response Time:** Within 30 days
### 5.4 Right to Restrict Processing (Article 18)
Users can request processing restrictions by contacting [DPO email]
**Response Time:** Within 72 hours
### 5.5 Right to Data Portability (Article 20)
Users can export their data in machine-readable format (JSON) via:
- Profile → Settings → Export Data
**Format:** JSON file containing all user data
### 5.6 Right to Object (Article 21)
Users can object to processing based on legitimate interests by contacting [DPO email]
**Response Time:** Within 72 hours
---
## 6. Security Measures (Article 32)
### 6.1 Technical Measures
| Measure | Implementation |
|---------|----------------|
| Encryption in transit | TLS 1.3 for all connections |
| Encryption at rest | AES-256 for sensitive data |
| Authentication | Magic link (passwordless) or OAuth |
| Rate limiting | 100 requests/minute per IP |
| Session management | Secure cookies, automatic expiry |
| Input validation | Zod schema validation on all inputs |
### 6.2 Access Controls
| Control | Implementation |
|---------|----------------|
| RBAC | Role-based permissions (SUPER_ADMIN, PROGRAM_ADMIN, JURY_MEMBER, etc.) |
| Least privilege | Users only see assigned projects/programs |
| Session expiry | Configurable timeout (default 24 hours) |
| Audit logging | All sensitive actions logged |
### 6.3 Infrastructure Security
| Measure | Implementation |
|---------|----------------|
| Firewall | iptables rules on VPS |
| DDoS protection | Cloudflare (if configured) |
| Updates | Regular security patches |
| Backups | Daily encrypted backups, 90-day retention |
| Monitoring | Error logging, performance monitoring |
---
## 7. Data Retention Policy
| Data Category | Retention Period | Deletion Method |
|---------------|------------------|-----------------|
| Active user accounts | Account lifetime | Soft delete → hard delete after 30 days |
| Inactive accounts | 2 years after last login | Automatic anonymization |
| Project data | Program lifetime + 5 years | Archived, then anonymized |
| Audit logs | 12 months | Automatic deletion |
| AI usage logs | 12 months | Automatic deletion |
| Session data | Session duration | Automatic expiration |
| Backup data | 90 days | Automatic rotation |
---
## 8. International Data Transfers
### 8.1 OpenAI (USA)
| Aspect | Details |
|--------|---------|
| Transfer Mechanism | Standard Contractual Clauses (SCCs) |
| DPA | OpenAI Data Processing Agreement |
| Data Minimization | Only anonymized data transferred |
| Risk Assessment | Low (no PII transferred) |
### 8.2 Data Localization
| Service | Location |
|---------|----------|
| Primary database | [EU location] |
| File storage | [Location] |
| Email service | [Location] |
---
## 9. Cookies and Tracking
### 9.1 Essential Cookies
| Cookie | Purpose | Duration |
|--------|---------|----------|
| `session_token` | User authentication | Session |
| `csrf_token` | CSRF protection | Session |
### 9.2 Optional Cookies
The platform does **not** use:
- Marketing cookies
- Analytics cookies that track individuals
- Third-party tracking
---
## 10. Data Protection Impact Assessment (DPIA)
### 10.1 AI Processing DPIA
| Factor | Assessment |
|--------|------------|
| **Risk** | Personal data sent to third-party AI |
| **Mitigation** | Strict anonymization before processing |
| **Residual Risk** | Low (no PII transferred) |
### 10.2 File Upload DPIA
| Factor | Assessment |
|--------|------------|
| **Risk** | Sensitive documents uploaded |
| **Mitigation** | Pre-signed URLs, access controls, virus scanning |
| **Residual Risk** | Medium (users control uploads) |
### 10.3 Evaluation Data DPIA
| Factor | Assessment |
|--------|------------|
| **Risk** | Subjective opinions about projects/teams |
| **Mitigation** | Access controls, audit logging |
| **Residual Risk** | Low |
---
## 11. Breach Notification Procedure
### 11.1 Detection (Within 24 hours)
1. Automated monitoring alerts
2. User reports
3. Security audit findings
### 11.2 Assessment (Within 48 hours)
1. Identify affected data and individuals
2. Assess severity and risk
3. Document incident details
### 11.3 Notification (Within 72 hours)
**Supervisory Authority:**
- Notify if risk to individuals
- Include: nature of breach, categories of data, number affected, consequences, measures taken
**Affected Individuals:**
- Notify without undue delay if high risk
- Include: nature of breach, likely consequences, measures taken, contact for information
### 11.4 Documentation
All breaches documented regardless of notification requirement.
---
## 12. Contact Information
| Role | Contact |
|------|---------|
| **Data Protection Officer** | [DPO name] |
| **Email** | [DPO email] |
| **Address** | [Physical address] |
**Supervisory Authority:**
Commission de Contrôle des Informations Nominatives (CCIN)
[Address in Monaco]
---
## 13. Document History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2025-01 | Initial version |
---
## See Also
- [AI Data Processing](./ai-data-processing.md)
- [AI System Architecture](../architecture/ai-system.md)