# Codebase Concerns **Analysis Date:** 2026-02-26 --- ## Tech Debt **Award Router is a Stub:** - Issue: Entire award router is commented out and non-functional. All award procedures were deleted when the Pipeline/Track models were removed and have not been reimplemented. - Files: `src/server/routers/award.ts` - Impact: Any UI that references award management procedures will fail at runtime. The `SpecialAward` model still exists in the schema but has no tRPC exposure via this router. - Fix approach: Reimplement the router procedures against the current `SpecialAward` → `Competition` FK relationship. See the TODO comment at line 9 for the list of procedures to reimplement. **Deliberation Page Has Incomplete Implementation:** - Issue: The jury-facing deliberation page has two hardcoded stub values that break actual vote submission. - `juryMemberId: ''` — submitted vote will have an empty juror ID. - `const hasVoted = false` — the "already voted" guard never fires, allowing duplicate vote submissions. - Files: `src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx` lines 34 and 66 - Impact: Jury members can submit blank/duplicate votes in deliberation sessions. The submitted vote will be associated with an empty string `juryMemberId`, which will likely fail at the Prisma level or silently create bad data. - Fix approach: Derive `juryMemberId` from `session.participants` by matching `ctx.user.id`. Derive `hasVoted` by checking if a `DeliberationVote` with the current user's jury member ID already exists in `session.votes`. **Audit Middleware is a No-Op:** - Issue: The `withAuditLog` middleware in `src/server/trpc.ts` (lines 99–114) identifies mutation calls by path pattern but does nothing with them — the body contains only a comment: `// We'll implement this in the audit service`. - Files: `src/server/trpc.ts` - Impact: Automatic centralised audit logging for all admin mutations does not occur through this middleware. Manual `logAudit()` / `DecisionAuditLog.create()` calls are present in many routers but coverage is inconsistent. - Fix approach: Implement the middleware body to call `logAudit()` with `ctx.user.id`, `path`, and serialized input/output. This provides a safety net for any procedure that doesn't call `logAudit` manually. **In-Memory Rate Limiter Not Suitable for Multi-Instance Deployment:** - Issue: `src/lib/rate-limit.ts` uses a module-level `Map` for rate limit state. This works in a single process but does not share state across multiple Node.js instances or after a process restart. - Files: `src/lib/rate-limit.ts` - Impact: Rate limits can be trivially bypassed by hitting different server instances. Auth brute-force protection (5-attempt lockout) also uses this same in-memory store (`src/lib/auth.ts` line 12). - Fix approach: Replace with Redis-based rate limiting (e.g., `@upstash/ratelimit` or `ioredis`). The comment at line 5 already acknowledges this: "For production with multiple instances, replace with Redis-based solution." **`configJson` Widely Cast Without Validation:** - Issue: `configJson` (a Prisma `Json` field) is cast directly to `Record` in 65 locations across server routers and services without running it through the Zod validators. The validators (`safeValidateRoundConfig`, `EvaluationConfigSchema.safeParse`) are only called in 4 locations. - Files: `src/server/routers/assignment.ts`, `src/server/routers/evaluation.ts`, `src/server/routers/filtering.ts`, `src/server/services/round-engine.ts`, and many others. - Impact: Stale or malformed config JSON stored in the database can cause silent runtime failures deep in business logic (e.g., missing criteria, wrong field names) without a clear validation error. - Fix approach: Extract a typed `parseRoundConfig(roundType, configJson)` utility that returns a typed config or throws a `TRPCError`. Replace bare `as Record` casts with this utility at query boundaries. --- ## Known Bugs **Tag Rename Performs N+1 Database Writes:** - Symptoms: Renaming a tag iterates over every user and every project that has the tag, issuing one `UPDATE` per record. - Files: `src/server/routers/tag.ts` lines 361–389 and 421–438 - Trigger: Admin renames any tag that is widely used. - Workaround: None. Will time out for large datasets. - Fix approach: Use a raw SQL `UPDATE ... SET tags = array_replace(tags, $old, $new)` or a Prisma `$executeRaw` to perform the rename in a single query. **Jury Deliberation Vote: `juryMemberId` Is Hardcoded Empty String:** - Symptoms: Votes submitted via the jury deliberation page will have `juryMemberId: ''`. - Files: `src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx` line 34 - Trigger: Any jury member visits a deliberation session and submits a vote. - Workaround: None — the vote will silently pass or fail depending on Prisma validation. --- ## Security Considerations **IP Header Spoofing on Rate Limiter:** - Risk: All rate limiters extract the client IP from `x-forwarded-for` without validating that the header originates from a trusted proxy. A client can set this header to any value, bypassing per-IP rate limits. - Files: `src/app/api/trpc/[trpc]/route.ts` lines 13–18, `src/app/api/auth/[...nextauth]/route.ts` lines 7–10, `src/app/api/email/change-password/route.ts` line 36, `src/app/api/email/verify-credentials/route.ts` line 20, `src/server/context.ts` line 13. - Current mitigation: Nginx passes `X-Forwarded-For` from upstream; in single-proxy deployment this reduces (but does not eliminate) risk. - Recommendations: Pin IP extraction to `req.headers.get('x-real-ip')` set by Nginx only, or validate the forwarded-for chain against a trusted proxy list. **Cron Secret Compared with `!==` (Non–Timing-Safe):** - Risk: String equality check `cronSecret !== process.env.CRON_SECRET` is vulnerable to timing side-channel attacks on the secret value. - Files: `src/app/api/cron/audit-cleanup/route.ts`, `src/app/api/cron/digest/route.ts`, `src/app/api/cron/draft-cleanup/route.ts`, `src/app/api/cron/reminders/route.ts` — all at line 8. - Current mitigation: Cron endpoints are not user-facing and rate limited at Nginx level. - Recommendations: Replace with `timingSafeEqual(Buffer.from(cronSecret), Buffer.from(process.env.CRON_SECRET))` — the same approach already used in `src/lib/storage/local-provider.ts` line 75. **No Content-Security-Policy Header:** - Risk: No CSP is set in `next.config.ts` or via middleware headers. If an XSS vector exists, there is no second line of defence to limit script execution. - Files: `next.config.ts` (missing `headers()` function), `docker/nginx/mopc-platform.conf` (missing CSP directive). - Current mitigation: Nginx sets `X-Frame-Options`, `X-Content-Type-Options`, and `X-XSS-Protection`, but these are legacy headers. No HSTS header is configured in Nginx either (only set post-certbot). - Recommendations: Add `Content-Security-Policy` and `Strict-Transport-Security` via the Next.js `headers()` config function. **MinIO Fallback Credentials Hardcoded:** - Risk: When `MINIO_ACCESS_KEY` / `MINIO_SECRET_KEY` are not set in non-production environments, the client defaults to `minioadmin`/`minioadmin`. - Files: `src/lib/minio.ts` lines 28–29 - Current mitigation: Production throws an error if credentials are missing (line 20–22). The fallback only applies in development. - Recommendations: Remove the hardcoded fallback entirely; require credentials in all environments to prevent accidental exposure of a non-production MinIO instance. **`NEXT_PUBLIC_MINIO_ENDPOINT` Undefined in Production:** - Risk: Two admin learning pages read `process.env.NEXT_PUBLIC_MINIO_ENDPOINT` at runtime. This variable is not defined in `docker-compose.yml` and has no `NEXT_PUBLIC_` entry. Next.js requires public env vars to be present at build time; at runtime this will always resolve to the fallback `http://localhost:9000`, making file previews broken in production. - Files: `src/app/(admin)/admin/learning/new/page.tsx` line 112, `src/app/(admin)/admin/learning/[id]/page.tsx` line 165. - Fix approach: Add `NEXT_PUBLIC_MINIO_ENDPOINT` to `docker-compose.yml` env section, or use the server-side `MINIO_PUBLIC_ENDPOINT` via a tRPC query rather than client-side env. --- ## Performance Bottlenecks **Unbounded `findMany` Queries in Analytics Router:** - Problem: `src/server/routers/analytics.ts` contains approximately 25 `findMany` calls with no `take` limit. With large competitions (hundreds of projects, thousands of evaluations) these will perform full table scans filtered only by `roundId` or `competitionId`. - Files: `src/server/routers/analytics.ts` — queries at lines 38, 80, 265, 421, 539, 582, 649, 749, 794, 1207, 1227, 1346, 1406, 1481, 1498, 1654, 1677, 1700. - Cause: Analytics queries are built for correctness, not scale. They load entire result sets into Node.js memory before aggregation. - Improvement path: Move aggregation into the database using Prisma `groupBy` and `_count`/`_avg` aggregations, or write `$queryRaw` SQL for complex analytics. Add pagination or date-range limits to the procedure inputs. **Tag Rename N+1 Pattern:** - Problem: Renaming a tag issues one DB write per entity (user or project) that carries the tag rather than a single bulk update. - Files: `src/server/routers/tag.ts` lines 355–390 - Cause: Prisma does not support `array_replace` natively; the current implementation works around this with a loop. - Improvement path: Use `prisma.$executeRaw` with PostgreSQL's `array_replace` function. **`assignment.ts` Router is 3,337 Lines:** - Problem: The assignment router is the single largest file and handles jury assignments, AI assignment, manual overrides, transfer, COI, and coverage checks in one module. - Files: `src/server/routers/assignment.ts` - Cause: Organic growth without module splitting. - Improvement path: Extract into separate files: `src/server/routers/assignment/manual.ts`, `assignment/ai.ts`, `assignment/coverage.ts`. This will also improve IDE performance and test isolation. --- ## Fragile Areas **Round Engine: COMPLETED State Has No Guards on Re-Entry:** - Files: `src/server/services/round-engine.ts` lines 57–64 - Why fragile: `COMPLETED` is defined as a terminal state (empty transitions array). However, there is no server-side guard preventing a direct Prisma update to a `COMPLETED` project state outside of `transitionProjectState()`. If a bug or data migration bypasses the state machine, projects can end up in unexpected states. - Safe modification: Always transition through `transitionProjectState()`. Any admin data repair scripts should call this function rather than using `prisma.projectRoundState.update` directly. - Test coverage: No unit tests for project state transitions. Only `tests/unit/assignment-policy.test.ts` exists, covering a different subsystem. **`Prisma.$transaction` Typed as `any`:** - Files: `src/server/services/round-engine.ts` line 129, `src/server/services/result-lock.ts` lines 87, 169, `src/server/services/mentor-workspace.ts` lines 39, 254, and 50+ other locations. - Why fragile: `tx: any` disables all type-checking inside transaction callbacks. A mistakenly called method on `tx` (e.g., `tx.round.delete` instead of `tx.round.update`) will compile successfully but may cause silent data corruption. - Safe modification: Type the callback as `(tx: Parameters[0]>[0]) => ...` or define a `TransactionalPrisma` type alias. The `PrismaClient | any` union also defeats the purpose of typing. **`email.ts` is 2,175 Lines:** - Files: `src/lib/email.ts` - Why fragile: All email templates, SMTP transport logic, and dynamic config loading are in one file. Adding a new email type requires navigating 2,000+ lines, and any change to transport setup affects all templates. - Safe modification: Extract individual email functions into `src/lib/email/` subdirectory with one file per template type. Keep shared transport logic in `src/lib/email/transport.ts`. - Test coverage: No tests for email sending. Email errors are caught and logged but not surfaced to callers consistently. **`admin/rounds/[roundId]/page.tsx` is 2,398 Lines:** - Files: `src/app/(admin)/admin/rounds/[roundId]/page.tsx` - Why fragile: The entire round management UI (config, assignments, filtering, deliberation controls) lives in a single client component. State from one section can accidentally affect another, and the component re-renders on any state change. - Safe modification: Extract tab sections into separate `'use client'` components with scoped state. Consider converting to a tab-based layout with lazy loading. **SSE Live Voting Stream Relies on Polling:** - Files: `src/app/api/live-voting/stream/route.ts` lines 184–194 - Why fragile: The SSE endpoint polls the database every 2 seconds per connected client. Under live ceremony conditions with many simultaneous audience connections, this can produce significant database load. - Safe modification: Introduce a Redis pub/sub channel that the vote submission path writes to, and have the SSE stream subscribe to the channel rather than polling. Alternatively, implement a debounce on the poll and share the result across all open SSE connections via a singleton broadcaster. --- ## Scaling Limits **In-Memory State (Rate Limiter, Login Attempts):** - Current capacity: Works correctly for a single Node.js process. - Limit: Breaks under horizontal scaling or after a process restart (all rate limit windows reset). - Scaling path: Replace `src/lib/rate-limit.ts` with a Redis-backed solution. Replace the `failedAttempts` Map in `src/lib/auth.ts` with Redis counters or database fields on the `User` model. **SSE Connection Count vs. Database Poll Rate:** - Current capacity: Each SSE client issues 1 database query per 2 seconds. - Limit: At 100 concurrent audience connections, this is 50 queries/second to `liveVotingSession` and related tables during a ceremony. - Scaling path: Shared broadcaster pattern (one database poll, fan-out to all SSE streams) or Redis pub/sub as described above. --- ## Dependencies at Risk **`next-auth` v5 (Auth.js) — Beta API:** - Risk: Auth.js v5 was in release candidate status at time of integration. The API surface (`authConfig` + `handlers` + `auth`) differs significantly from v4. Upgrading to a stable v5 release may require changes to `src/lib/auth.ts` and `src/lib/auth.config.ts`. - Impact: Session type definitions, adapter interfaces, and middleware patterns may change. - Migration plan: Monitor the Auth.js v5 stable release. Changes are likely limited to `src/lib/auth.ts` and `src/types/next-auth.d.ts`. --- ## Missing Critical Features **No Database Backup Configuration:** - Problem: `docker-compose.yml` has no scheduled backup service or volume snapshot configuration for the PostgreSQL container. - Blocks: Point-in-time recovery after data loss or accidental deletion. - Recommendation: Add a sidecar backup service (e.g., `prodrigestivill/postgres-backup-local`) or configure WAL archiving to MinIO. **No Error Monitoring / Observability:** - Problem: There is no Sentry, Datadog, or equivalent error monitoring integration. Application errors are only logged to stdout via `console.error`. In production, these are only visible if the Docker logs are actively monitored. - Files: No integration found in `src/instrumentation.ts` or anywhere else. - Blocks: Proactive detection of runtime errors, AI service failures, and payment/submission edge cases. - Recommendation: Add Sentry (`@sentry/nextjs`) in `src/instrumentation.ts` — Next.js has native support for this. Filter out expected errors (e.g., `TRPCError` with `NOT_FOUND`) to reduce noise. **No Automated Tests for Core Business Logic (Round Engine, Evaluation, Filtering):** - Problem: Only one test file exists: `tests/unit/assignment-policy.test.ts`. The round engine state machine (`src/server/services/round-engine.ts`), evaluation submission flow (`src/server/routers/evaluation.ts`), and AI filtering pipeline (`src/server/services/ai-filtering.ts`) have no automated tests. - Blocks: Confident refactoring of the state machine, advance-criterion logic, and scoring. - Recommendation: Add unit tests for `activateRound`, `closeRound`, `transitionProjectState` (happy path + guard failures), and `submitEvaluation` (COI check, advance criterion logic, score validation). --- ## Test Coverage Gaps **Round Engine State Machine:** - What's not tested: All `activateRound`, `closeRound`, `archiveRound`, `transitionProjectState` transitions, including guard conditions (e.g., activating an ARCHIVED round, transitioning a COMPLETED project). - Files: `src/server/services/round-engine.ts` - Risk: A regression in state transition guards could allow data corruption (e.g., re-activating a closed round, double-passing a project). - Priority: High **Evaluation Submission (advance criterion, COI, scoring):** - What's not tested: `submitEvaluation` mutation — specifically the advance criterion auto-transition logic (lines 1637–1646 of `src/server/routers/evaluation.ts`), COI auto-reassignment on `declareCOI`, and `upsertForm` criterion validation. - Files: `src/server/routers/evaluation.ts` - Risk: Regression in advance criterion will silently skip project advancement. COI declaration failures are caught and logged but untested. - Priority: High **AI Anonymization:** - What's not tested: `sanitizeText`, `anonymizeProject`, `validateNoPersonalInfo` in `src/server/services/anonymization.ts`. These are GDPR-critical functions. - Files: `src/server/services/anonymization.ts` - Risk: A PII leak in AI calls would violate GDPR without detection. - Priority: High **Assignment Policy Execution:** - What's not tested: End-to-end `executeAssignment` flow in `src/server/services/round-assignment.ts` — specifically the COI filtering, geo-diversity penalty, familiarity bonus, and under-coverage gap-fill. - Files: `src/server/services/round-assignment.ts`, `src/server/services/smart-assignment.ts` - Risk: Silent over- or under-assignment when constraints interact. - Priority: Medium --- *Concerns audit: 2026-02-26*