Optimize AI system with batching, token tracking, and GDPR compliance
- Add AIUsageLog model for persistent token/cost tracking - Implement batched processing for all AI services: - Assignment: 15 projects/batch - Filtering: 20 projects/batch - Award eligibility: 20 projects/batch - Mentor matching: 15 projects/batch - Create unified error classification (ai-errors.ts) - Enhance anonymization with comprehensive project data - Add AI usage dashboard to Settings page - Add usage stats endpoints to settings router - Create AI system documentation (5 files) - Create GDPR compliance documentation (2 files) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
208
docs/architecture/ai-errors.md
Normal file
208
docs/architecture/ai-errors.md
Normal file
@@ -0,0 +1,208 @@
|
||||
# AI Error Handling Guide
|
||||
|
||||
## Error Types
|
||||
|
||||
The AI system classifies errors into these categories:
|
||||
|
||||
| Error Type | Cause | User Message | Retryable |
|
||||
|------------|-------|--------------|-----------|
|
||||
| `rate_limit` | Too many requests | "Rate limit exceeded. Wait a few minutes." | Yes |
|
||||
| `quota_exceeded` | Billing limit | "API quota exceeded. Check billing." | No |
|
||||
| `model_not_found` | Invalid model | "Model not available. Check settings." | No |
|
||||
| `invalid_api_key` | Bad API key | "Invalid API key. Check settings." | No |
|
||||
| `context_length` | Prompt too large | "Request too large. Try fewer items." | Yes* |
|
||||
| `parse_error` | AI returned invalid JSON | "Response parse error. Flagged for review." | Yes |
|
||||
| `timeout` | Request took too long | "Request timed out. Try again." | Yes |
|
||||
| `network_error` | Connection issue | "Network error. Check connection." | Yes |
|
||||
| `content_filter` | Content blocked | "Content filtered. Check input data." | No |
|
||||
| `server_error` | OpenAI server issue | "Server error. Try again later." | Yes |
|
||||
|
||||
*Context length errors can be retried with smaller batches.
|
||||
|
||||
## Error Classification
|
||||
|
||||
```typescript
|
||||
import { classifyAIError, shouldRetry, getRetryDelay } from '@/server/services/ai-errors'
|
||||
|
||||
try {
|
||||
const response = await openai.chat.completions.create(params)
|
||||
} catch (error) {
|
||||
const classified = classifyAIError(error)
|
||||
|
||||
console.error(`AI Error: ${classified.type} - ${classified.message}`)
|
||||
|
||||
if (shouldRetry(classified.type)) {
|
||||
const delay = getRetryDelay(classified.type)
|
||||
// Wait and retry
|
||||
} else {
|
||||
// Fall back to algorithm
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Graceful Degradation
|
||||
|
||||
When AI fails, the platform automatically handles it:
|
||||
|
||||
### AI Assignment
|
||||
1. Logs the error
|
||||
2. Falls back to algorithmic assignment:
|
||||
- Matches by expertise tag overlap
|
||||
- Balances workload across jurors
|
||||
- Respects constraints (max assignments)
|
||||
|
||||
### AI Filtering
|
||||
1. Logs the error
|
||||
2. Flags all projects for manual review
|
||||
3. Returns error message to admin
|
||||
|
||||
### Award Eligibility
|
||||
1. Logs the error
|
||||
2. Returns all projects as "needs manual review"
|
||||
3. Admin can apply deterministic rules instead
|
||||
|
||||
### Mentor Matching
|
||||
1. Logs the error
|
||||
2. Falls back to keyword-based matching
|
||||
3. Uses availability scoring
|
||||
|
||||
## Retry Strategy
|
||||
|
||||
| Error Type | Retry Count | Delay |
|
||||
|------------|-------------|-------|
|
||||
| `rate_limit` | 3 | Exponential (1s, 2s, 4s) |
|
||||
| `timeout` | 2 | Fixed 5s |
|
||||
| `network_error` | 3 | Exponential (1s, 2s, 4s) |
|
||||
| `server_error` | 3 | Exponential (2s, 4s, 8s) |
|
||||
| `parse_error` | 1 | None |
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Error Logging
|
||||
|
||||
All AI errors are logged to:
|
||||
1. Console (development)
|
||||
2. `AIUsageLog` table with `status: 'ERROR'`
|
||||
3. `AuditLog` for security-relevant failures
|
||||
|
||||
### Checking Errors
|
||||
|
||||
```sql
|
||||
-- Recent AI errors
|
||||
SELECT
|
||||
created_at,
|
||||
action,
|
||||
model,
|
||||
error_message
|
||||
FROM ai_usage_log
|
||||
WHERE status = 'ERROR'
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 20;
|
||||
|
||||
-- Error rate by action
|
||||
SELECT
|
||||
action,
|
||||
COUNT(*) FILTER (WHERE status = 'ERROR') as errors,
|
||||
COUNT(*) as total,
|
||||
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'ERROR') / COUNT(*), 2) as error_rate
|
||||
FROM ai_usage_log
|
||||
GROUP BY action;
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### High Error Rate
|
||||
|
||||
1. Check OpenAI status page for outages
|
||||
2. Verify API key is valid and not rate-limited
|
||||
3. Review error messages in logs
|
||||
4. Consider switching to a different model
|
||||
|
||||
### Consistent Parse Errors
|
||||
|
||||
1. The AI model may be returning malformed JSON
|
||||
2. Try a more capable model (gpt-4o instead of gpt-3.5)
|
||||
3. Check if prompts are being truncated
|
||||
4. Review recent responses in logs
|
||||
|
||||
### All Requests Failing
|
||||
|
||||
1. Test connection in Settings → AI
|
||||
2. Verify API key hasn't been revoked
|
||||
3. Check billing status in OpenAI dashboard
|
||||
4. Review network connectivity
|
||||
|
||||
### Slow Responses
|
||||
|
||||
1. Consider using gpt-4o-mini for speed
|
||||
2. Reduce batch sizes
|
||||
3. Check for rate limiting (429 errors)
|
||||
4. Monitor OpenAI latency
|
||||
|
||||
## Error Response Format
|
||||
|
||||
When errors occur, services return structured responses:
|
||||
|
||||
```typescript
|
||||
// AI Assignment error response
|
||||
{
|
||||
success: false,
|
||||
suggestions: [],
|
||||
error: "Rate limit exceeded. Wait a few minutes and try again.",
|
||||
fallbackUsed: true,
|
||||
}
|
||||
|
||||
// AI Filtering error response
|
||||
{
|
||||
projectId: "...",
|
||||
meetsCriteria: false,
|
||||
confidence: 0,
|
||||
reasoning: "AI error: Rate limit exceeded",
|
||||
flagForReview: true,
|
||||
}
|
||||
```
|
||||
|
||||
## Implementing Custom Error Handling
|
||||
|
||||
```typescript
|
||||
import {
|
||||
classifyAIError,
|
||||
shouldRetry,
|
||||
getRetryDelay,
|
||||
getUserFriendlyMessage,
|
||||
logAIError,
|
||||
} from '@/server/services/ai-errors'
|
||||
|
||||
async function callAIWithRetry<T>(
|
||||
operation: () => Promise<T>,
|
||||
serviceName: string,
|
||||
maxRetries: number = 3
|
||||
): Promise<T> {
|
||||
let lastError: Error | null = null
|
||||
|
||||
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
||||
try {
|
||||
return await operation()
|
||||
} catch (error) {
|
||||
const classified = classifyAIError(error)
|
||||
logAIError(serviceName, 'operation', classified)
|
||||
|
||||
if (!shouldRetry(classified.type) || attempt === maxRetries) {
|
||||
throw new Error(getUserFriendlyMessage(classified.type))
|
||||
}
|
||||
|
||||
const delay = getRetryDelay(classified.type) * attempt
|
||||
await new Promise(resolve => setTimeout(resolve, delay))
|
||||
lastError = error as Error
|
||||
}
|
||||
}
|
||||
|
||||
throw lastError
|
||||
}
|
||||
```
|
||||
|
||||
## See Also
|
||||
|
||||
- [AI System Architecture](./ai-system.md)
|
||||
- [AI Configuration Guide](./ai-configuration.md)
|
||||
- [AI Services Reference](./ai-services.md)
|
||||
Reference in New Issue
Block a user