Files
MOPC-Portal/.planning/codebase/CONCERNS.md
Matt 8cc86bae20 docs: map existing codebase
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 23:14:08 +01:00

218 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Codebase Concerns
**Analysis Date:** 2026-02-26
---
## Tech Debt
**Award Router is a Stub:**
- Issue: Entire award router is commented out and non-functional. All award procedures were deleted when the Pipeline/Track models were removed and have not been reimplemented.
- Files: `src/server/routers/award.ts`
- Impact: Any UI that references award management procedures will fail at runtime. The `SpecialAward` model still exists in the schema but has no tRPC exposure via this router.
- Fix approach: Reimplement the router procedures against the current `SpecialAward``Competition` FK relationship. See the TODO comment at line 9 for the list of procedures to reimplement.
**Deliberation Page Has Incomplete Implementation:**
- Issue: The jury-facing deliberation page has two hardcoded stub values that break actual vote submission.
- `juryMemberId: ''` — submitted vote will have an empty juror ID.
- `const hasVoted = false` — the "already voted" guard never fires, allowing duplicate vote submissions.
- Files: `src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx` lines 34 and 66
- Impact: Jury members can submit blank/duplicate votes in deliberation sessions. The submitted vote will be associated with an empty string `juryMemberId`, which will likely fail at the Prisma level or silently create bad data.
- Fix approach: Derive `juryMemberId` from `session.participants` by matching `ctx.user.id`. Derive `hasVoted` by checking if a `DeliberationVote` with the current user's jury member ID already exists in `session.votes`.
**Audit Middleware is a No-Op:**
- Issue: The `withAuditLog` middleware in `src/server/trpc.ts` (lines 99114) identifies mutation calls by path pattern but does nothing with them — the body contains only a comment: `// We'll implement this in the audit service`.
- Files: `src/server/trpc.ts`
- Impact: Automatic centralised audit logging for all admin mutations does not occur through this middleware. Manual `logAudit()` / `DecisionAuditLog.create()` calls are present in many routers but coverage is inconsistent.
- Fix approach: Implement the middleware body to call `logAudit()` with `ctx.user.id`, `path`, and serialized input/output. This provides a safety net for any procedure that doesn't call `logAudit` manually.
**In-Memory Rate Limiter Not Suitable for Multi-Instance Deployment:**
- Issue: `src/lib/rate-limit.ts` uses a module-level `Map` for rate limit state. This works in a single process but does not share state across multiple Node.js instances or after a process restart.
- Files: `src/lib/rate-limit.ts`
- Impact: Rate limits can be trivially bypassed by hitting different server instances. Auth brute-force protection (5-attempt lockout) also uses this same in-memory store (`src/lib/auth.ts` line 12).
- Fix approach: Replace with Redis-based rate limiting (e.g., `@upstash/ratelimit` or `ioredis`). The comment at line 5 already acknowledges this: "For production with multiple instances, replace with Redis-based solution."
**`configJson` Widely Cast Without Validation:**
- Issue: `configJson` (a Prisma `Json` field) is cast directly to `Record<string, unknown>` in 65 locations across server routers and services without running it through the Zod validators. The validators (`safeValidateRoundConfig`, `EvaluationConfigSchema.safeParse`) are only called in 4 locations.
- Files: `src/server/routers/assignment.ts`, `src/server/routers/evaluation.ts`, `src/server/routers/filtering.ts`, `src/server/services/round-engine.ts`, and many others.
- Impact: Stale or malformed config JSON stored in the database can cause silent runtime failures deep in business logic (e.g., missing criteria, wrong field names) without a clear validation error.
- Fix approach: Extract a typed `parseRoundConfig(roundType, configJson)` utility that returns a typed config or throws a `TRPCError`. Replace bare `as Record<string, unknown>` casts with this utility at query boundaries.
---
## Known Bugs
**Tag Rename Performs N+1 Database Writes:**
- Symptoms: Renaming a tag iterates over every user and every project that has the tag, issuing one `UPDATE` per record.
- Files: `src/server/routers/tag.ts` lines 361389 and 421438
- Trigger: Admin renames any tag that is widely used.
- Workaround: None. Will time out for large datasets.
- Fix approach: Use a raw SQL `UPDATE ... SET tags = array_replace(tags, $old, $new)` or a Prisma `$executeRaw` to perform the rename in a single query.
**Jury Deliberation Vote: `juryMemberId` Is Hardcoded Empty String:**
- Symptoms: Votes submitted via the jury deliberation page will have `juryMemberId: ''`.
- Files: `src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx` line 34
- Trigger: Any jury member visits a deliberation session and submits a vote.
- Workaround: None — the vote will silently pass or fail depending on Prisma validation.
---
## Security Considerations
**IP Header Spoofing on Rate Limiter:**
- Risk: All rate limiters extract the client IP from `x-forwarded-for` without validating that the header originates from a trusted proxy. A client can set this header to any value, bypassing per-IP rate limits.
- Files: `src/app/api/trpc/[trpc]/route.ts` lines 1318, `src/app/api/auth/[...nextauth]/route.ts` lines 710, `src/app/api/email/change-password/route.ts` line 36, `src/app/api/email/verify-credentials/route.ts` line 20, `src/server/context.ts` line 13.
- Current mitigation: Nginx passes `X-Forwarded-For` from upstream; in single-proxy deployment this reduces (but does not eliminate) risk.
- Recommendations: Pin IP extraction to `req.headers.get('x-real-ip')` set by Nginx only, or validate the forwarded-for chain against a trusted proxy list.
**Cron Secret Compared with `!==` (NonTiming-Safe):**
- Risk: String equality check `cronSecret !== process.env.CRON_SECRET` is vulnerable to timing side-channel attacks on the secret value.
- Files: `src/app/api/cron/audit-cleanup/route.ts`, `src/app/api/cron/digest/route.ts`, `src/app/api/cron/draft-cleanup/route.ts`, `src/app/api/cron/reminders/route.ts` — all at line 8.
- Current mitigation: Cron endpoints are not user-facing and rate limited at Nginx level.
- Recommendations: Replace with `timingSafeEqual(Buffer.from(cronSecret), Buffer.from(process.env.CRON_SECRET))` — the same approach already used in `src/lib/storage/local-provider.ts` line 75.
**No Content-Security-Policy Header:**
- Risk: No CSP is set in `next.config.ts` or via middleware headers. If an XSS vector exists, there is no second line of defence to limit script execution.
- Files: `next.config.ts` (missing `headers()` function), `docker/nginx/mopc-platform.conf` (missing CSP directive).
- Current mitigation: Nginx sets `X-Frame-Options`, `X-Content-Type-Options`, and `X-XSS-Protection`, but these are legacy headers. No HSTS header is configured in Nginx either (only set post-certbot).
- Recommendations: Add `Content-Security-Policy` and `Strict-Transport-Security` via the Next.js `headers()` config function.
**MinIO Fallback Credentials Hardcoded:**
- Risk: When `MINIO_ACCESS_KEY` / `MINIO_SECRET_KEY` are not set in non-production environments, the client defaults to `minioadmin`/`minioadmin`.
- Files: `src/lib/minio.ts` lines 2829
- Current mitigation: Production throws an error if credentials are missing (line 2022). The fallback only applies in development.
- Recommendations: Remove the hardcoded fallback entirely; require credentials in all environments to prevent accidental exposure of a non-production MinIO instance.
**`NEXT_PUBLIC_MINIO_ENDPOINT` Undefined in Production:**
- Risk: Two admin learning pages read `process.env.NEXT_PUBLIC_MINIO_ENDPOINT` at runtime. This variable is not defined in `docker-compose.yml` and has no `NEXT_PUBLIC_` entry. Next.js requires public env vars to be present at build time; at runtime this will always resolve to the fallback `http://localhost:9000`, making file previews broken in production.
- Files: `src/app/(admin)/admin/learning/new/page.tsx` line 112, `src/app/(admin)/admin/learning/[id]/page.tsx` line 165.
- Fix approach: Add `NEXT_PUBLIC_MINIO_ENDPOINT` to `docker-compose.yml` env section, or use the server-side `MINIO_PUBLIC_ENDPOINT` via a tRPC query rather than client-side env.
---
## Performance Bottlenecks
**Unbounded `findMany` Queries in Analytics Router:**
- Problem: `src/server/routers/analytics.ts` contains approximately 25 `findMany` calls with no `take` limit. With large competitions (hundreds of projects, thousands of evaluations) these will perform full table scans filtered only by `roundId` or `competitionId`.
- Files: `src/server/routers/analytics.ts` — queries at lines 38, 80, 265, 421, 539, 582, 649, 749, 794, 1207, 1227, 1346, 1406, 1481, 1498, 1654, 1677, 1700.
- Cause: Analytics queries are built for correctness, not scale. They load entire result sets into Node.js memory before aggregation.
- Improvement path: Move aggregation into the database using Prisma `groupBy` and `_count`/`_avg` aggregations, or write `$queryRaw` SQL for complex analytics. Add pagination or date-range limits to the procedure inputs.
**Tag Rename N+1 Pattern:**
- Problem: Renaming a tag issues one DB write per entity (user or project) that carries the tag rather than a single bulk update.
- Files: `src/server/routers/tag.ts` lines 355390
- Cause: Prisma does not support `array_replace` natively; the current implementation works around this with a loop.
- Improvement path: Use `prisma.$executeRaw` with PostgreSQL's `array_replace` function.
**`assignment.ts` Router is 3,337 Lines:**
- Problem: The assignment router is the single largest file and handles jury assignments, AI assignment, manual overrides, transfer, COI, and coverage checks in one module.
- Files: `src/server/routers/assignment.ts`
- Cause: Organic growth without module splitting.
- Improvement path: Extract into separate files: `src/server/routers/assignment/manual.ts`, `assignment/ai.ts`, `assignment/coverage.ts`. This will also improve IDE performance and test isolation.
---
## Fragile Areas
**Round Engine: COMPLETED State Has No Guards on Re-Entry:**
- Files: `src/server/services/round-engine.ts` lines 5764
- Why fragile: `COMPLETED` is defined as a terminal state (empty transitions array). However, there is no server-side guard preventing a direct Prisma update to a `COMPLETED` project state outside of `transitionProjectState()`. If a bug or data migration bypasses the state machine, projects can end up in unexpected states.
- Safe modification: Always transition through `transitionProjectState()`. Any admin data repair scripts should call this function rather than using `prisma.projectRoundState.update` directly.
- Test coverage: No unit tests for project state transitions. Only `tests/unit/assignment-policy.test.ts` exists, covering a different subsystem.
**`Prisma.$transaction` Typed as `any`:**
- Files: `src/server/services/round-engine.ts` line 129, `src/server/services/result-lock.ts` lines 87, 169, `src/server/services/mentor-workspace.ts` lines 39, 254, and 50+ other locations.
- Why fragile: `tx: any` disables all type-checking inside transaction callbacks. A mistakenly called method on `tx` (e.g., `tx.round.delete` instead of `tx.round.update`) will compile successfully but may cause silent data corruption.
- Safe modification: Type the callback as `(tx: Parameters<Parameters<typeof prisma.$transaction>[0]>[0]) => ...` or define a `TransactionalPrisma` type alias. The `PrismaClient | any` union also defeats the purpose of typing.
**`email.ts` is 2,175 Lines:**
- Files: `src/lib/email.ts`
- Why fragile: All email templates, SMTP transport logic, and dynamic config loading are in one file. Adding a new email type requires navigating 2,000+ lines, and any change to transport setup affects all templates.
- Safe modification: Extract individual email functions into `src/lib/email/` subdirectory with one file per template type. Keep shared transport logic in `src/lib/email/transport.ts`.
- Test coverage: No tests for email sending. Email errors are caught and logged but not surfaced to callers consistently.
**`admin/rounds/[roundId]/page.tsx` is 2,398 Lines:**
- Files: `src/app/(admin)/admin/rounds/[roundId]/page.tsx`
- Why fragile: The entire round management UI (config, assignments, filtering, deliberation controls) lives in a single client component. State from one section can accidentally affect another, and the component re-renders on any state change.
- Safe modification: Extract tab sections into separate `'use client'` components with scoped state. Consider converting to a tab-based layout with lazy loading.
**SSE Live Voting Stream Relies on Polling:**
- Files: `src/app/api/live-voting/stream/route.ts` lines 184194
- Why fragile: The SSE endpoint polls the database every 2 seconds per connected client. Under live ceremony conditions with many simultaneous audience connections, this can produce significant database load.
- Safe modification: Introduce a Redis pub/sub channel that the vote submission path writes to, and have the SSE stream subscribe to the channel rather than polling. Alternatively, implement a debounce on the poll and share the result across all open SSE connections via a singleton broadcaster.
---
## Scaling Limits
**In-Memory State (Rate Limiter, Login Attempts):**
- Current capacity: Works correctly for a single Node.js process.
- Limit: Breaks under horizontal scaling or after a process restart (all rate limit windows reset).
- Scaling path: Replace `src/lib/rate-limit.ts` with a Redis-backed solution. Replace the `failedAttempts` Map in `src/lib/auth.ts` with Redis counters or database fields on the `User` model.
**SSE Connection Count vs. Database Poll Rate:**
- Current capacity: Each SSE client issues 1 database query per 2 seconds.
- Limit: At 100 concurrent audience connections, this is 50 queries/second to `liveVotingSession` and related tables during a ceremony.
- Scaling path: Shared broadcaster pattern (one database poll, fan-out to all SSE streams) or Redis pub/sub as described above.
---
## Dependencies at Risk
**`next-auth` v5 (Auth.js) — Beta API:**
- Risk: Auth.js v5 was in release candidate status at time of integration. The API surface (`authConfig` + `handlers` + `auth`) differs significantly from v4. Upgrading to a stable v5 release may require changes to `src/lib/auth.ts` and `src/lib/auth.config.ts`.
- Impact: Session type definitions, adapter interfaces, and middleware patterns may change.
- Migration plan: Monitor the Auth.js v5 stable release. Changes are likely limited to `src/lib/auth.ts` and `src/types/next-auth.d.ts`.
---
## Missing Critical Features
**No Database Backup Configuration:**
- Problem: `docker-compose.yml` has no scheduled backup service or volume snapshot configuration for the PostgreSQL container.
- Blocks: Point-in-time recovery after data loss or accidental deletion.
- Recommendation: Add a sidecar backup service (e.g., `prodrigestivill/postgres-backup-local`) or configure WAL archiving to MinIO.
**No Error Monitoring / Observability:**
- Problem: There is no Sentry, Datadog, or equivalent error monitoring integration. Application errors are only logged to stdout via `console.error`. In production, these are only visible if the Docker logs are actively monitored.
- Files: No integration found in `src/instrumentation.ts` or anywhere else.
- Blocks: Proactive detection of runtime errors, AI service failures, and payment/submission edge cases.
- Recommendation: Add Sentry (`@sentry/nextjs`) in `src/instrumentation.ts` — Next.js has native support for this. Filter out expected errors (e.g., `TRPCError` with `NOT_FOUND`) to reduce noise.
**No Automated Tests for Core Business Logic (Round Engine, Evaluation, Filtering):**
- Problem: Only one test file exists: `tests/unit/assignment-policy.test.ts`. The round engine state machine (`src/server/services/round-engine.ts`), evaluation submission flow (`src/server/routers/evaluation.ts`), and AI filtering pipeline (`src/server/services/ai-filtering.ts`) have no automated tests.
- Blocks: Confident refactoring of the state machine, advance-criterion logic, and scoring.
- Recommendation: Add unit tests for `activateRound`, `closeRound`, `transitionProjectState` (happy path + guard failures), and `submitEvaluation` (COI check, advance criterion logic, score validation).
---
## Test Coverage Gaps
**Round Engine State Machine:**
- What's not tested: All `activateRound`, `closeRound`, `archiveRound`, `transitionProjectState` transitions, including guard conditions (e.g., activating an ARCHIVED round, transitioning a COMPLETED project).
- Files: `src/server/services/round-engine.ts`
- Risk: A regression in state transition guards could allow data corruption (e.g., re-activating a closed round, double-passing a project).
- Priority: High
**Evaluation Submission (advance criterion, COI, scoring):**
- What's not tested: `submitEvaluation` mutation — specifically the advance criterion auto-transition logic (lines 16371646 of `src/server/routers/evaluation.ts`), COI auto-reassignment on `declareCOI`, and `upsertForm` criterion validation.
- Files: `src/server/routers/evaluation.ts`
- Risk: Regression in advance criterion will silently skip project advancement. COI declaration failures are caught and logged but untested.
- Priority: High
**AI Anonymization:**
- What's not tested: `sanitizeText`, `anonymizeProject`, `validateNoPersonalInfo` in `src/server/services/anonymization.ts`. These are GDPR-critical functions.
- Files: `src/server/services/anonymization.ts`
- Risk: A PII leak in AI calls would violate GDPR without detection.
- Priority: High
**Assignment Policy Execution:**
- What's not tested: End-to-end `executeAssignment` flow in `src/server/services/round-assignment.ts` — specifically the COI filtering, geo-diversity penalty, familiarity bonus, and under-coverage gap-fill.
- Files: `src/server/services/round-assignment.ts`, `src/server/services/smart-assignment.ts`
- Risk: Silent over- or under-assignment when constraints interact.
- Priority: Medium
---
*Concerns audit: 2026-02-26*