Files

Matt 8cc86bae20 docs: map existing codebase

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-26 23:14:08 +01:00

18 KiB

Raw Blame History

Codebase Concerns

Analysis Date: 2026-02-26

Tech Debt

Award Router is a Stub:

Issue: Entire award router is commented out and non-functional. All award procedures were deleted when the Pipeline/Track models were removed and have not been reimplemented.
Files: src/server/routers/award.ts
Impact: Any UI that references award management procedures will fail at runtime. The SpecialAward model still exists in the schema but has no tRPC exposure via this router.
Fix approach: Reimplement the router procedures against the current SpecialAward → Competition FK relationship. See the TODO comment at line 9 for the list of procedures to reimplement.

Deliberation Page Has Incomplete Implementation:

Issue: The jury-facing deliberation page has two hardcoded stub values that break actual vote submission.
- juryMemberId: '' — submitted vote will have an empty juror ID.
- const hasVoted = false — the "already voted" guard never fires, allowing duplicate vote submissions.
Files: src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx lines 34 and 66
Impact: Jury members can submit blank/duplicate votes in deliberation sessions. The submitted vote will be associated with an empty string juryMemberId, which will likely fail at the Prisma level or silently create bad data.
Fix approach: Derive juryMemberId from session.participants by matching ctx.user.id. Derive hasVoted by checking if a DeliberationVote with the current user's jury member ID already exists in session.votes.

Audit Middleware is a No-Op:

Issue: The withAuditLog middleware in src/server/trpc.ts (lines 99–114) identifies mutation calls by path pattern but does nothing with them — the body contains only a comment: // We'll implement this in the audit service.
Files: src/server/trpc.ts
Impact: Automatic centralised audit logging for all admin mutations does not occur through this middleware. Manual logAudit() / DecisionAuditLog.create() calls are present in many routers but coverage is inconsistent.
Fix approach: Implement the middleware body to call logAudit() with ctx.user.id, path, and serialized input/output. This provides a safety net for any procedure that doesn't call logAudit manually.

In-Memory Rate Limiter Not Suitable for Multi-Instance Deployment:

Issue: src/lib/rate-limit.ts uses a module-level Map for rate limit state. This works in a single process but does not share state across multiple Node.js instances or after a process restart.
Files: src/lib/rate-limit.ts
Impact: Rate limits can be trivially bypassed by hitting different server instances. Auth brute-force protection (5-attempt lockout) also uses this same in-memory store (src/lib/auth.ts line 12).
Fix approach: Replace with Redis-based rate limiting (e.g., @upstash/ratelimit or ioredis). The comment at line 5 already acknowledges this: "For production with multiple instances, replace with Redis-based solution."

configJson Widely Cast Without Validation:

Issue: configJson (a Prisma Json field) is cast directly to Record<string, unknown> in 65 locations across server routers and services without running it through the Zod validators. The validators (safeValidateRoundConfig, EvaluationConfigSchema.safeParse) are only called in 4 locations.
Files: src/server/routers/assignment.ts, src/server/routers/evaluation.ts, src/server/routers/filtering.ts, src/server/services/round-engine.ts, and many others.
Impact: Stale or malformed config JSON stored in the database can cause silent runtime failures deep in business logic (e.g., missing criteria, wrong field names) without a clear validation error.
Fix approach: Extract a typed parseRoundConfig(roundType, configJson) utility that returns a typed config or throws a TRPCError. Replace bare as Record<string, unknown> casts with this utility at query boundaries.

Known Bugs

Tag Rename Performs N+1 Database Writes:

Symptoms: Renaming a tag iterates over every user and every project that has the tag, issuing one UPDATE per record.
Files: src/server/routers/tag.ts lines 361–389 and 421–438
Trigger: Admin renames any tag that is widely used.
Workaround: None. Will time out for large datasets.
Fix approach: Use a raw SQL UPDATE ... SET tags = array_replace(tags, $old, $new) or a Prisma $executeRaw to perform the rename in a single query.

Jury Deliberation Vote: juryMemberId Is Hardcoded Empty String:

Symptoms: Votes submitted via the jury deliberation page will have juryMemberId: ''.
Files: src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx line 34
Trigger: Any jury member visits a deliberation session and submits a vote.
Workaround: None — the vote will silently pass or fail depending on Prisma validation.

Security Considerations

IP Header Spoofing on Rate Limiter:

Risk: All rate limiters extract the client IP from x-forwarded-for without validating that the header originates from a trusted proxy. A client can set this header to any value, bypassing per-IP rate limits.
Files: src/app/api/trpc/[trpc]/route.ts lines 13–18, src/app/api/auth/[...nextauth]/route.ts lines 7–10, src/app/api/email/change-password/route.ts line 36, src/app/api/email/verify-credentials/route.ts line 20, src/server/context.ts line 13.
Current mitigation: Nginx passes X-Forwarded-For from upstream; in single-proxy deployment this reduces (but does not eliminate) risk.
Recommendations: Pin IP extraction to req.headers.get('x-real-ip') set by Nginx only, or validate the forwarded-for chain against a trusted proxy list.

Cron Secret Compared with !== (Non–Timing-Safe):

Risk: String equality check cronSecret !== process.env.CRON_SECRET is vulnerable to timing side-channel attacks on the secret value.
Files: src/app/api/cron/audit-cleanup/route.ts, src/app/api/cron/digest/route.ts, src/app/api/cron/draft-cleanup/route.ts, src/app/api/cron/reminders/route.ts — all at line 8.
Current mitigation: Cron endpoints are not user-facing and rate limited at Nginx level.
Recommendations: Replace with timingSafeEqual(Buffer.from(cronSecret), Buffer.from(process.env.CRON_SECRET)) — the same approach already used in src/lib/storage/local-provider.ts line 75.

No Content-Security-Policy Header:

Risk: No CSP is set in next.config.ts or via middleware headers. If an XSS vector exists, there is no second line of defence to limit script execution.
Files: next.config.ts (missing headers() function), docker/nginx/mopc-platform.conf (missing CSP directive).
Current mitigation: Nginx sets X-Frame-Options, X-Content-Type-Options, and X-XSS-Protection, but these are legacy headers. No HSTS header is configured in Nginx either (only set post-certbot).
Recommendations: Add Content-Security-Policy and Strict-Transport-Security via the Next.js headers() config function.

MinIO Fallback Credentials Hardcoded:

Risk: When MINIO_ACCESS_KEY / MINIO_SECRET_KEY are not set in non-production environments, the client defaults to minioadmin/minioadmin.
Files: src/lib/minio.ts lines 28–29
Current mitigation: Production throws an error if credentials are missing (line 20–22). The fallback only applies in development.
Recommendations: Remove the hardcoded fallback entirely; require credentials in all environments to prevent accidental exposure of a non-production MinIO instance.

NEXT_PUBLIC_MINIO_ENDPOINT Undefined in Production:

Risk: Two admin learning pages read process.env.NEXT_PUBLIC_MINIO_ENDPOINT at runtime. This variable is not defined in docker-compose.yml and has no NEXT_PUBLIC_ entry. Next.js requires public env vars to be present at build time; at runtime this will always resolve to the fallback http://localhost:9000, making file previews broken in production.
Files: src/app/(admin)/admin/learning/new/page.tsx line 112, src/app/(admin)/admin/learning/[id]/page.tsx line 165.
Fix approach: Add NEXT_PUBLIC_MINIO_ENDPOINT to docker-compose.yml env section, or use the server-side MINIO_PUBLIC_ENDPOINT via a tRPC query rather than client-side env.

Performance Bottlenecks

Unbounded findMany Queries in Analytics Router:

Problem: src/server/routers/analytics.ts contains approximately 25 findMany calls with no take limit. With large competitions (hundreds of projects, thousands of evaluations) these will perform full table scans filtered only by roundId or competitionId.
Files: src/server/routers/analytics.ts — queries at lines 38, 80, 265, 421, 539, 582, 649, 749, 794, 1207, 1227, 1346, 1406, 1481, 1498, 1654, 1677, 1700.
Cause: Analytics queries are built for correctness, not scale. They load entire result sets into Node.js memory before aggregation.
Improvement path: Move aggregation into the database using Prisma groupBy and _count/_avg aggregations, or write $queryRaw SQL for complex analytics. Add pagination or date-range limits to the procedure inputs.

Tag Rename N+1 Pattern:

Problem: Renaming a tag issues one DB write per entity (user or project) that carries the tag rather than a single bulk update.
Files: src/server/routers/tag.ts lines 355–390
Cause: Prisma does not support array_replace natively; the current implementation works around this with a loop.
Improvement path: Use prisma.$executeRaw with PostgreSQL's array_replace function.

assignment.ts Router is 3,337 Lines:

Problem: The assignment router is the single largest file and handles jury assignments, AI assignment, manual overrides, transfer, COI, and coverage checks in one module.
Files: src/server/routers/assignment.ts
Cause: Organic growth without module splitting.
Improvement path: Extract into separate files: src/server/routers/assignment/manual.ts, assignment/ai.ts, assignment/coverage.ts. This will also improve IDE performance and test isolation.

Fragile Areas

Round Engine: COMPLETED State Has No Guards on Re-Entry:

Files: src/server/services/round-engine.ts lines 57–64
Why fragile: COMPLETED is defined as a terminal state (empty transitions array). However, there is no server-side guard preventing a direct Prisma update to a COMPLETED project state outside of transitionProjectState(). If a bug or data migration bypasses the state machine, projects can end up in unexpected states.
Safe modification: Always transition through transitionProjectState(). Any admin data repair scripts should call this function rather than using prisma.projectRoundState.update directly.
Test coverage: No unit tests for project state transitions. Only tests/unit/assignment-policy.test.ts exists, covering a different subsystem.

Prisma.$transaction Typed as any:

Files: src/server/services/round-engine.ts line 129, src/server/services/result-lock.ts lines 87, 169, src/server/services/mentor-workspace.ts lines 39, 254, and 50+ other locations.
Why fragile: tx: any disables all type-checking inside transaction callbacks. A mistakenly called method on tx (e.g., tx.round.delete instead of tx.round.update) will compile successfully but may cause silent data corruption.
Safe modification: Type the callback as (tx: Parameters<Parameters<typeof prisma.$transaction>[0]>[0]) => ... or define a TransactionalPrisma type alias. The PrismaClient | any union also defeats the purpose of typing.

email.ts is 2,175 Lines:

Files: src/lib/email.ts
Why fragile: All email templates, SMTP transport logic, and dynamic config loading are in one file. Adding a new email type requires navigating 2,000+ lines, and any change to transport setup affects all templates.
Safe modification: Extract individual email functions into src/lib/email/ subdirectory with one file per template type. Keep shared transport logic in src/lib/email/transport.ts.
Test coverage: No tests for email sending. Email errors are caught and logged but not surfaced to callers consistently.

admin/rounds/[roundId]/page.tsx is 2,398 Lines:

Files: src/app/(admin)/admin/rounds/[roundId]/page.tsx
Why fragile: The entire round management UI (config, assignments, filtering, deliberation controls) lives in a single client component. State from one section can accidentally affect another, and the component re-renders on any state change.
Safe modification: Extract tab sections into separate 'use client' components with scoped state. Consider converting to a tab-based layout with lazy loading.

SSE Live Voting Stream Relies on Polling:

Files: src/app/api/live-voting/stream/route.ts lines 184–194
Why fragile: The SSE endpoint polls the database every 2 seconds per connected client. Under live ceremony conditions with many simultaneous audience connections, this can produce significant database load.
Safe modification: Introduce a Redis pub/sub channel that the vote submission path writes to, and have the SSE stream subscribe to the channel rather than polling. Alternatively, implement a debounce on the poll and share the result across all open SSE connections via a singleton broadcaster.

Scaling Limits

In-Memory State (Rate Limiter, Login Attempts):

Current capacity: Works correctly for a single Node.js process.
Limit: Breaks under horizontal scaling or after a process restart (all rate limit windows reset).
Scaling path: Replace src/lib/rate-limit.ts with a Redis-backed solution. Replace the failedAttempts Map in src/lib/auth.ts with Redis counters or database fields on the User model.

SSE Connection Count vs. Database Poll Rate:

Current capacity: Each SSE client issues 1 database query per 2 seconds.
Limit: At 100 concurrent audience connections, this is 50 queries/second to liveVotingSession and related tables during a ceremony.
Scaling path: Shared broadcaster pattern (one database poll, fan-out to all SSE streams) or Redis pub/sub as described above.

Dependencies at Risk

next-auth v5 (Auth.js) — Beta API:

Risk: Auth.js v5 was in release candidate status at time of integration. The API surface (authConfig + handlers + auth) differs significantly from v4. Upgrading to a stable v5 release may require changes to src/lib/auth.ts and src/lib/auth.config.ts.
Impact: Session type definitions, adapter interfaces, and middleware patterns may change.
Migration plan: Monitor the Auth.js v5 stable release. Changes are likely limited to src/lib/auth.ts and src/types/next-auth.d.ts.

Missing Critical Features

No Database Backup Configuration:

Problem: docker-compose.yml has no scheduled backup service or volume snapshot configuration for the PostgreSQL container.
Blocks: Point-in-time recovery after data loss or accidental deletion.
Recommendation: Add a sidecar backup service (e.g., prodrigestivill/postgres-backup-local) or configure WAL archiving to MinIO.

No Error Monitoring / Observability:

Problem: There is no Sentry, Datadog, or equivalent error monitoring integration. Application errors are only logged to stdout via console.error. In production, these are only visible if the Docker logs are actively monitored.
Files: No integration found in src/instrumentation.ts or anywhere else.
Blocks: Proactive detection of runtime errors, AI service failures, and payment/submission edge cases.
Recommendation: Add Sentry (@sentry/nextjs) in src/instrumentation.ts — Next.js has native support for this. Filter out expected errors (e.g., TRPCError with NOT_FOUND) to reduce noise.

No Automated Tests for Core Business Logic (Round Engine, Evaluation, Filtering):

Problem: Only one test file exists: tests/unit/assignment-policy.test.ts. The round engine state machine (src/server/services/round-engine.ts), evaluation submission flow (src/server/routers/evaluation.ts), and AI filtering pipeline (src/server/services/ai-filtering.ts) have no automated tests.
Blocks: Confident refactoring of the state machine, advance-criterion logic, and scoring.
Recommendation: Add unit tests for activateRound, closeRound, transitionProjectState (happy path + guard failures), and submitEvaluation (COI check, advance criterion logic, score validation).

Test Coverage Gaps

Round Engine State Machine:

What's not tested: All activateRound, closeRound, archiveRound, transitionProjectState transitions, including guard conditions (e.g., activating an ARCHIVED round, transitioning a COMPLETED project).
Files: src/server/services/round-engine.ts
Risk: A regression in state transition guards could allow data corruption (e.g., re-activating a closed round, double-passing a project).
Priority: High

Evaluation Submission (advance criterion, COI, scoring):

What's not tested: submitEvaluation mutation — specifically the advance criterion auto-transition logic (lines 1637–1646 of src/server/routers/evaluation.ts), COI auto-reassignment on declareCOI, and upsertForm criterion validation.
Files: src/server/routers/evaluation.ts
Risk: Regression in advance criterion will silently skip project advancement. COI declaration failures are caught and logged but untested.
Priority: High

AI Anonymization:

What's not tested: sanitizeText, anonymizeProject, validateNoPersonalInfo in src/server/services/anonymization.ts. These are GDPR-critical functions.
Files: src/server/services/anonymization.ts
Risk: A PII leak in AI calls would violate GDPR without detection.
Priority: High

Assignment Policy Execution:

What's not tested: End-to-end executeAssignment flow in src/server/services/round-assignment.ts — specifically the COI filtering, geo-diversity penalty, familiarity bonus, and under-coverage gap-fill.
Files: src/server/services/round-assignment.ts, src/server/services/smart-assignment.ts
Risk: Silent over- or under-assignment when constraints interact.
Priority: Medium

Concerns audit: 2026-02-26

18 KiB Raw Blame History Unescape Escape