Optimize AI system with batching, token tracking, and GDPR compliance

- Add AIUsageLog model for persistent token/cost tracking - Implement batched processing for all AI services: - Assignment: 15 projects/batch - Filtering: 20 projects/batch - Award eligibility: 20 projects/batch - Mentor matching: 15 projects/batch - Create unified error classification (ai-errors.ts) - Enhance anonymization with comprehensive project data - Add AI usage dashboard to Settings page - Add usage stats endpoints to settings router - Create AI system documentation (5 files) - Create GDPR compliance documentation (2 files) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-03 11:58:12 +01:00
parent a72e815d3a
commit 928b1c65dc
19 changed files with 4103 additions and 601 deletions
--- a/docs/architecture/ai-errors.md
+++ b/docs/architecture/ai-errors.md
@@ -0,0 +1,208 @@
+# AI Error Handling Guide
+
+## Error Types
+
+The AI system classifies errors into these categories:
+
+| Error Type | Cause | User Message | Retryable |
+|------------|-------|--------------|-----------|
+| `rate_limit` | Too many requests | "Rate limit exceeded. Wait a few minutes." | Yes |
+| `quota_exceeded` | Billing limit | "API quota exceeded. Check billing." | No |
+| `model_not_found` | Invalid model | "Model not available. Check settings." | No |
+| `invalid_api_key` | Bad API key | "Invalid API key. Check settings." | No |
+| `context_length` | Prompt too large | "Request too large. Try fewer items." | Yes* |
+| `parse_error` | AI returned invalid JSON | "Response parse error. Flagged for review." | Yes |
+| `timeout` | Request took too long | "Request timed out. Try again." | Yes |
+| `network_error` | Connection issue | "Network error. Check connection." | Yes |
+| `content_filter` | Content blocked | "Content filtered. Check input data." | No |
+| `server_error` | OpenAI server issue | "Server error. Try again later." | Yes |
+
+*Context length errors can be retried with smaller batches.
+
+## Error Classification
+
+```typescript
+import { classifyAIError, shouldRetry, getRetryDelay } from '@/server/services/ai-errors'
+
+try {
+  const response = await openai.chat.completions.create(params)
+} catch (error) {
+  const classified = classifyAIError(error)
+
+  console.error(`AI Error: ${classified.type} - ${classified.message}`)
+
+  if (shouldRetry(classified.type)) {
+    const delay = getRetryDelay(classified.type)
+    // Wait and retry
+  } else {
+    // Fall back to algorithm
+  }
+}
+```
+
+## Graceful Degradation
+
+When AI fails, the platform automatically handles it:
+
+### AI Assignment
+1. Logs the error
+2. Falls back to algorithmic assignment:
+   - Matches by expertise tag overlap
+   - Balances workload across jurors
+   - Respects constraints (max assignments)
+
+### AI Filtering
+1. Logs the error
+2. Flags all projects for manual review
+3. Returns error message to admin
+
+### Award Eligibility
+1. Logs the error
+2. Returns all projects as "needs manual review"
+3. Admin can apply deterministic rules instead
+
+### Mentor Matching
+1. Logs the error
+2. Falls back to keyword-based matching
+3. Uses availability scoring
+
+## Retry Strategy
+
+| Error Type | Retry Count | Delay |
+|------------|-------------|-------|
+| `rate_limit` | 3 | Exponential (1s, 2s, 4s) |
+| `timeout` | 2 | Fixed 5s |
+| `network_error` | 3 | Exponential (1s, 2s, 4s) |
+| `server_error` | 3 | Exponential (2s, 4s, 8s) |
+| `parse_error` | 1 | None |
+
+## Monitoring
+
+### Error Logging
+
+All AI errors are logged to:
+1. Console (development)
+2. `AIUsageLog` table with `status: 'ERROR'`
+3. `AuditLog` for security-relevant failures
+
+### Checking Errors
+
+```sql
+-- Recent AI errors
+SELECT
+  created_at,
+  action,
+  model,
+  error_message
+FROM ai_usage_log
+WHERE status = 'ERROR'
+ORDER BY created_at DESC
+LIMIT 20;
+
+-- Error rate by action
+SELECT
+  action,
+  COUNT(*) FILTER (WHERE status = 'ERROR') as errors,
+  COUNT(*) as total,
+  ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'ERROR') / COUNT(*), 2) as error_rate
+FROM ai_usage_log
+GROUP BY action;
+```
+
+## Troubleshooting
+
+### High Error Rate
+
+1. Check OpenAI status page for outages
+2. Verify API key is valid and not rate-limited
+3. Review error messages in logs
+4. Consider switching to a different model
+
+### Consistent Parse Errors
+
+1. The AI model may be returning malformed JSON
+2. Try a more capable model (gpt-4o instead of gpt-3.5)
+3. Check if prompts are being truncated
+4. Review recent responses in logs
+
+### All Requests Failing
+
+1. Test connection in Settings → AI
+2. Verify API key hasn't been revoked
+3. Check billing status in OpenAI dashboard
+4. Review network connectivity
+
+### Slow Responses
+
+1. Consider using gpt-4o-mini for speed
+2. Reduce batch sizes
+3. Check for rate limiting (429 errors)
+4. Monitor OpenAI latency
+
+## Error Response Format
+
+When errors occur, services return structured responses:
+
+```typescript
+// AI Assignment error response
+{
+  success: false,
+  suggestions: [],
+  error: "Rate limit exceeded. Wait a few minutes and try again.",
+  fallbackUsed: true,
+}
+
+// AI Filtering error response
+{
+  projectId: "...",
+  meetsCriteria: false,
+  confidence: 0,
+  reasoning: "AI error: Rate limit exceeded",
+  flagForReview: true,
+}
+```
+
+## Implementing Custom Error Handling
+
+```typescript
+import {
+  classifyAIError,
+  shouldRetry,
+  getRetryDelay,
+  getUserFriendlyMessage,
+  logAIError,
+} from '@/server/services/ai-errors'
+
+async function callAIWithRetry<T>(
+  operation: () => Promise<T>,
+  serviceName: string,
+  maxRetries: number = 3
+): Promise<T> {
+  let lastError: Error | null = null
+
+  for (let attempt = 1; attempt <= maxRetries; attempt++) {
+    try {
+      return await operation()
+    } catch (error) {
+      const classified = classifyAIError(error)
+      logAIError(serviceName, 'operation', classified)
+
+      if (!shouldRetry(classified.type) || attempt === maxRetries) {
+        throw new Error(getUserFriendlyMessage(classified.type))
+      }
+
+      const delay = getRetryDelay(classified.type) * attempt
+      await new Promise(resolve => setTimeout(resolve, delay))
+      lastError = error as Error
+    }
+  }
+
+  throw lastError
+}
+```
+
+## See Also
+
+- [AI System Architecture](./ai-system.md)
+- [AI Configuration Guide](./ai-configuration.md)
+- [AI Services Reference](./ai-services.md)