218 lines
18 KiB
Markdown
218 lines
18 KiB
Markdown
|
|
# Codebase Concerns
|
|||
|
|
|
|||
|
|
**Analysis Date:** 2026-02-26
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Tech Debt
|
|||
|
|
|
|||
|
|
**Award Router is a Stub:**
|
|||
|
|
- Issue: Entire award router is commented out and non-functional. All award procedures were deleted when the Pipeline/Track models were removed and have not been reimplemented.
|
|||
|
|
- Files: `src/server/routers/award.ts`
|
|||
|
|
- Impact: Any UI that references award management procedures will fail at runtime. The `SpecialAward` model still exists in the schema but has no tRPC exposure via this router.
|
|||
|
|
- Fix approach: Reimplement the router procedures against the current `SpecialAward` → `Competition` FK relationship. See the TODO comment at line 9 for the list of procedures to reimplement.
|
|||
|
|
|
|||
|
|
**Deliberation Page Has Incomplete Implementation:**
|
|||
|
|
- Issue: The jury-facing deliberation page has two hardcoded stub values that break actual vote submission.
|
|||
|
|
- `juryMemberId: ''` — submitted vote will have an empty juror ID.
|
|||
|
|
- `const hasVoted = false` — the "already voted" guard never fires, allowing duplicate vote submissions.
|
|||
|
|
- Files: `src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx` lines 34 and 66
|
|||
|
|
- Impact: Jury members can submit blank/duplicate votes in deliberation sessions. The submitted vote will be associated with an empty string `juryMemberId`, which will likely fail at the Prisma level or silently create bad data.
|
|||
|
|
- Fix approach: Derive `juryMemberId` from `session.participants` by matching `ctx.user.id`. Derive `hasVoted` by checking if a `DeliberationVote` with the current user's jury member ID already exists in `session.votes`.
|
|||
|
|
|
|||
|
|
**Audit Middleware is a No-Op:**
|
|||
|
|
- Issue: The `withAuditLog` middleware in `src/server/trpc.ts` (lines 99–114) identifies mutation calls by path pattern but does nothing with them — the body contains only a comment: `// We'll implement this in the audit service`.
|
|||
|
|
- Files: `src/server/trpc.ts`
|
|||
|
|
- Impact: Automatic centralised audit logging for all admin mutations does not occur through this middleware. Manual `logAudit()` / `DecisionAuditLog.create()` calls are present in many routers but coverage is inconsistent.
|
|||
|
|
- Fix approach: Implement the middleware body to call `logAudit()` with `ctx.user.id`, `path`, and serialized input/output. This provides a safety net for any procedure that doesn't call `logAudit` manually.
|
|||
|
|
|
|||
|
|
**In-Memory Rate Limiter Not Suitable for Multi-Instance Deployment:**
|
|||
|
|
- Issue: `src/lib/rate-limit.ts` uses a module-level `Map` for rate limit state. This works in a single process but does not share state across multiple Node.js instances or after a process restart.
|
|||
|
|
- Files: `src/lib/rate-limit.ts`
|
|||
|
|
- Impact: Rate limits can be trivially bypassed by hitting different server instances. Auth brute-force protection (5-attempt lockout) also uses this same in-memory store (`src/lib/auth.ts` line 12).
|
|||
|
|
- Fix approach: Replace with Redis-based rate limiting (e.g., `@upstash/ratelimit` or `ioredis`). The comment at line 5 already acknowledges this: "For production with multiple instances, replace with Redis-based solution."
|
|||
|
|
|
|||
|
|
**`configJson` Widely Cast Without Validation:**
|
|||
|
|
- Issue: `configJson` (a Prisma `Json` field) is cast directly to `Record<string, unknown>` in 65 locations across server routers and services without running it through the Zod validators. The validators (`safeValidateRoundConfig`, `EvaluationConfigSchema.safeParse`) are only called in 4 locations.
|
|||
|
|
- Files: `src/server/routers/assignment.ts`, `src/server/routers/evaluation.ts`, `src/server/routers/filtering.ts`, `src/server/services/round-engine.ts`, and many others.
|
|||
|
|
- Impact: Stale or malformed config JSON stored in the database can cause silent runtime failures deep in business logic (e.g., missing criteria, wrong field names) without a clear validation error.
|
|||
|
|
- Fix approach: Extract a typed `parseRoundConfig(roundType, configJson)` utility that returns a typed config or throws a `TRPCError`. Replace bare `as Record<string, unknown>` casts with this utility at query boundaries.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Known Bugs
|
|||
|
|
|
|||
|
|
**Tag Rename Performs N+1 Database Writes:**
|
|||
|
|
- Symptoms: Renaming a tag iterates over every user and every project that has the tag, issuing one `UPDATE` per record.
|
|||
|
|
- Files: `src/server/routers/tag.ts` lines 361–389 and 421–438
|
|||
|
|
- Trigger: Admin renames any tag that is widely used.
|
|||
|
|
- Workaround: None. Will time out for large datasets.
|
|||
|
|
- Fix approach: Use a raw SQL `UPDATE ... SET tags = array_replace(tags, $old, $new)` or a Prisma `$executeRaw` to perform the rename in a single query.
|
|||
|
|
|
|||
|
|
**Jury Deliberation Vote: `juryMemberId` Is Hardcoded Empty String:**
|
|||
|
|
- Symptoms: Votes submitted via the jury deliberation page will have `juryMemberId: ''`.
|
|||
|
|
- Files: `src/app/(jury)/jury/competitions/deliberation/[sessionId]/page.tsx` line 34
|
|||
|
|
- Trigger: Any jury member visits a deliberation session and submits a vote.
|
|||
|
|
- Workaround: None — the vote will silently pass or fail depending on Prisma validation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Security Considerations
|
|||
|
|
|
|||
|
|
**IP Header Spoofing on Rate Limiter:**
|
|||
|
|
- Risk: All rate limiters extract the client IP from `x-forwarded-for` without validating that the header originates from a trusted proxy. A client can set this header to any value, bypassing per-IP rate limits.
|
|||
|
|
- Files: `src/app/api/trpc/[trpc]/route.ts` lines 13–18, `src/app/api/auth/[...nextauth]/route.ts` lines 7–10, `src/app/api/email/change-password/route.ts` line 36, `src/app/api/email/verify-credentials/route.ts` line 20, `src/server/context.ts` line 13.
|
|||
|
|
- Current mitigation: Nginx passes `X-Forwarded-For` from upstream; in single-proxy deployment this reduces (but does not eliminate) risk.
|
|||
|
|
- Recommendations: Pin IP extraction to `req.headers.get('x-real-ip')` set by Nginx only, or validate the forwarded-for chain against a trusted proxy list.
|
|||
|
|
|
|||
|
|
**Cron Secret Compared with `!==` (Non–Timing-Safe):**
|
|||
|
|
- Risk: String equality check `cronSecret !== process.env.CRON_SECRET` is vulnerable to timing side-channel attacks on the secret value.
|
|||
|
|
- Files: `src/app/api/cron/audit-cleanup/route.ts`, `src/app/api/cron/digest/route.ts`, `src/app/api/cron/draft-cleanup/route.ts`, `src/app/api/cron/reminders/route.ts` — all at line 8.
|
|||
|
|
- Current mitigation: Cron endpoints are not user-facing and rate limited at Nginx level.
|
|||
|
|
- Recommendations: Replace with `timingSafeEqual(Buffer.from(cronSecret), Buffer.from(process.env.CRON_SECRET))` — the same approach already used in `src/lib/storage/local-provider.ts` line 75.
|
|||
|
|
|
|||
|
|
**No Content-Security-Policy Header:**
|
|||
|
|
- Risk: No CSP is set in `next.config.ts` or via middleware headers. If an XSS vector exists, there is no second line of defence to limit script execution.
|
|||
|
|
- Files: `next.config.ts` (missing `headers()` function), `docker/nginx/mopc-platform.conf` (missing CSP directive).
|
|||
|
|
- Current mitigation: Nginx sets `X-Frame-Options`, `X-Content-Type-Options`, and `X-XSS-Protection`, but these are legacy headers. No HSTS header is configured in Nginx either (only set post-certbot).
|
|||
|
|
- Recommendations: Add `Content-Security-Policy` and `Strict-Transport-Security` via the Next.js `headers()` config function.
|
|||
|
|
|
|||
|
|
**MinIO Fallback Credentials Hardcoded:**
|
|||
|
|
- Risk: When `MINIO_ACCESS_KEY` / `MINIO_SECRET_KEY` are not set in non-production environments, the client defaults to `minioadmin`/`minioadmin`.
|
|||
|
|
- Files: `src/lib/minio.ts` lines 28–29
|
|||
|
|
- Current mitigation: Production throws an error if credentials are missing (line 20–22). The fallback only applies in development.
|
|||
|
|
- Recommendations: Remove the hardcoded fallback entirely; require credentials in all environments to prevent accidental exposure of a non-production MinIO instance.
|
|||
|
|
|
|||
|
|
**`NEXT_PUBLIC_MINIO_ENDPOINT` Undefined in Production:**
|
|||
|
|
- Risk: Two admin learning pages read `process.env.NEXT_PUBLIC_MINIO_ENDPOINT` at runtime. This variable is not defined in `docker-compose.yml` and has no `NEXT_PUBLIC_` entry. Next.js requires public env vars to be present at build time; at runtime this will always resolve to the fallback `http://localhost:9000`, making file previews broken in production.
|
|||
|
|
- Files: `src/app/(admin)/admin/learning/new/page.tsx` line 112, `src/app/(admin)/admin/learning/[id]/page.tsx` line 165.
|
|||
|
|
- Fix approach: Add `NEXT_PUBLIC_MINIO_ENDPOINT` to `docker-compose.yml` env section, or use the server-side `MINIO_PUBLIC_ENDPOINT` via a tRPC query rather than client-side env.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Performance Bottlenecks
|
|||
|
|
|
|||
|
|
**Unbounded `findMany` Queries in Analytics Router:**
|
|||
|
|
- Problem: `src/server/routers/analytics.ts` contains approximately 25 `findMany` calls with no `take` limit. With large competitions (hundreds of projects, thousands of evaluations) these will perform full table scans filtered only by `roundId` or `competitionId`.
|
|||
|
|
- Files: `src/server/routers/analytics.ts` — queries at lines 38, 80, 265, 421, 539, 582, 649, 749, 794, 1207, 1227, 1346, 1406, 1481, 1498, 1654, 1677, 1700.
|
|||
|
|
- Cause: Analytics queries are built for correctness, not scale. They load entire result sets into Node.js memory before aggregation.
|
|||
|
|
- Improvement path: Move aggregation into the database using Prisma `groupBy` and `_count`/`_avg` aggregations, or write `$queryRaw` SQL for complex analytics. Add pagination or date-range limits to the procedure inputs.
|
|||
|
|
|
|||
|
|
**Tag Rename N+1 Pattern:**
|
|||
|
|
- Problem: Renaming a tag issues one DB write per entity (user or project) that carries the tag rather than a single bulk update.
|
|||
|
|
- Files: `src/server/routers/tag.ts` lines 355–390
|
|||
|
|
- Cause: Prisma does not support `array_replace` natively; the current implementation works around this with a loop.
|
|||
|
|
- Improvement path: Use `prisma.$executeRaw` with PostgreSQL's `array_replace` function.
|
|||
|
|
|
|||
|
|
**`assignment.ts` Router is 3,337 Lines:**
|
|||
|
|
- Problem: The assignment router is the single largest file and handles jury assignments, AI assignment, manual overrides, transfer, COI, and coverage checks in one module.
|
|||
|
|
- Files: `src/server/routers/assignment.ts`
|
|||
|
|
- Cause: Organic growth without module splitting.
|
|||
|
|
- Improvement path: Extract into separate files: `src/server/routers/assignment/manual.ts`, `assignment/ai.ts`, `assignment/coverage.ts`. This will also improve IDE performance and test isolation.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Fragile Areas
|
|||
|
|
|
|||
|
|
**Round Engine: COMPLETED State Has No Guards on Re-Entry:**
|
|||
|
|
- Files: `src/server/services/round-engine.ts` lines 57–64
|
|||
|
|
- Why fragile: `COMPLETED` is defined as a terminal state (empty transitions array). However, there is no server-side guard preventing a direct Prisma update to a `COMPLETED` project state outside of `transitionProjectState()`. If a bug or data migration bypasses the state machine, projects can end up in unexpected states.
|
|||
|
|
- Safe modification: Always transition through `transitionProjectState()`. Any admin data repair scripts should call this function rather than using `prisma.projectRoundState.update` directly.
|
|||
|
|
- Test coverage: No unit tests for project state transitions. Only `tests/unit/assignment-policy.test.ts` exists, covering a different subsystem.
|
|||
|
|
|
|||
|
|
**`Prisma.$transaction` Typed as `any`:**
|
|||
|
|
- Files: `src/server/services/round-engine.ts` line 129, `src/server/services/result-lock.ts` lines 87, 169, `src/server/services/mentor-workspace.ts` lines 39, 254, and 50+ other locations.
|
|||
|
|
- Why fragile: `tx: any` disables all type-checking inside transaction callbacks. A mistakenly called method on `tx` (e.g., `tx.round.delete` instead of `tx.round.update`) will compile successfully but may cause silent data corruption.
|
|||
|
|
- Safe modification: Type the callback as `(tx: Parameters<Parameters<typeof prisma.$transaction>[0]>[0]) => ...` or define a `TransactionalPrisma` type alias. The `PrismaClient | any` union also defeats the purpose of typing.
|
|||
|
|
|
|||
|
|
**`email.ts` is 2,175 Lines:**
|
|||
|
|
- Files: `src/lib/email.ts`
|
|||
|
|
- Why fragile: All email templates, SMTP transport logic, and dynamic config loading are in one file. Adding a new email type requires navigating 2,000+ lines, and any change to transport setup affects all templates.
|
|||
|
|
- Safe modification: Extract individual email functions into `src/lib/email/` subdirectory with one file per template type. Keep shared transport logic in `src/lib/email/transport.ts`.
|
|||
|
|
- Test coverage: No tests for email sending. Email errors are caught and logged but not surfaced to callers consistently.
|
|||
|
|
|
|||
|
|
**`admin/rounds/[roundId]/page.tsx` is 2,398 Lines:**
|
|||
|
|
- Files: `src/app/(admin)/admin/rounds/[roundId]/page.tsx`
|
|||
|
|
- Why fragile: The entire round management UI (config, assignments, filtering, deliberation controls) lives in a single client component. State from one section can accidentally affect another, and the component re-renders on any state change.
|
|||
|
|
- Safe modification: Extract tab sections into separate `'use client'` components with scoped state. Consider converting to a tab-based layout with lazy loading.
|
|||
|
|
|
|||
|
|
**SSE Live Voting Stream Relies on Polling:**
|
|||
|
|
- Files: `src/app/api/live-voting/stream/route.ts` lines 184–194
|
|||
|
|
- Why fragile: The SSE endpoint polls the database every 2 seconds per connected client. Under live ceremony conditions with many simultaneous audience connections, this can produce significant database load.
|
|||
|
|
- Safe modification: Introduce a Redis pub/sub channel that the vote submission path writes to, and have the SSE stream subscribe to the channel rather than polling. Alternatively, implement a debounce on the poll and share the result across all open SSE connections via a singleton broadcaster.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Scaling Limits
|
|||
|
|
|
|||
|
|
**In-Memory State (Rate Limiter, Login Attempts):**
|
|||
|
|
- Current capacity: Works correctly for a single Node.js process.
|
|||
|
|
- Limit: Breaks under horizontal scaling or after a process restart (all rate limit windows reset).
|
|||
|
|
- Scaling path: Replace `src/lib/rate-limit.ts` with a Redis-backed solution. Replace the `failedAttempts` Map in `src/lib/auth.ts` with Redis counters or database fields on the `User` model.
|
|||
|
|
|
|||
|
|
**SSE Connection Count vs. Database Poll Rate:**
|
|||
|
|
- Current capacity: Each SSE client issues 1 database query per 2 seconds.
|
|||
|
|
- Limit: At 100 concurrent audience connections, this is 50 queries/second to `liveVotingSession` and related tables during a ceremony.
|
|||
|
|
- Scaling path: Shared broadcaster pattern (one database poll, fan-out to all SSE streams) or Redis pub/sub as described above.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Dependencies at Risk
|
|||
|
|
|
|||
|
|
**`next-auth` v5 (Auth.js) — Beta API:**
|
|||
|
|
- Risk: Auth.js v5 was in release candidate status at time of integration. The API surface (`authConfig` + `handlers` + `auth`) differs significantly from v4. Upgrading to a stable v5 release may require changes to `src/lib/auth.ts` and `src/lib/auth.config.ts`.
|
|||
|
|
- Impact: Session type definitions, adapter interfaces, and middleware patterns may change.
|
|||
|
|
- Migration plan: Monitor the Auth.js v5 stable release. Changes are likely limited to `src/lib/auth.ts` and `src/types/next-auth.d.ts`.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Missing Critical Features
|
|||
|
|
|
|||
|
|
**No Database Backup Configuration:**
|
|||
|
|
- Problem: `docker-compose.yml` has no scheduled backup service or volume snapshot configuration for the PostgreSQL container.
|
|||
|
|
- Blocks: Point-in-time recovery after data loss or accidental deletion.
|
|||
|
|
- Recommendation: Add a sidecar backup service (e.g., `prodrigestivill/postgres-backup-local`) or configure WAL archiving to MinIO.
|
|||
|
|
|
|||
|
|
**No Error Monitoring / Observability:**
|
|||
|
|
- Problem: There is no Sentry, Datadog, or equivalent error monitoring integration. Application errors are only logged to stdout via `console.error`. In production, these are only visible if the Docker logs are actively monitored.
|
|||
|
|
- Files: No integration found in `src/instrumentation.ts` or anywhere else.
|
|||
|
|
- Blocks: Proactive detection of runtime errors, AI service failures, and payment/submission edge cases.
|
|||
|
|
- Recommendation: Add Sentry (`@sentry/nextjs`) in `src/instrumentation.ts` — Next.js has native support for this. Filter out expected errors (e.g., `TRPCError` with `NOT_FOUND`) to reduce noise.
|
|||
|
|
|
|||
|
|
**No Automated Tests for Core Business Logic (Round Engine, Evaluation, Filtering):**
|
|||
|
|
- Problem: Only one test file exists: `tests/unit/assignment-policy.test.ts`. The round engine state machine (`src/server/services/round-engine.ts`), evaluation submission flow (`src/server/routers/evaluation.ts`), and AI filtering pipeline (`src/server/services/ai-filtering.ts`) have no automated tests.
|
|||
|
|
- Blocks: Confident refactoring of the state machine, advance-criterion logic, and scoring.
|
|||
|
|
- Recommendation: Add unit tests for `activateRound`, `closeRound`, `transitionProjectState` (happy path + guard failures), and `submitEvaluation` (COI check, advance criterion logic, score validation).
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Test Coverage Gaps
|
|||
|
|
|
|||
|
|
**Round Engine State Machine:**
|
|||
|
|
- What's not tested: All `activateRound`, `closeRound`, `archiveRound`, `transitionProjectState` transitions, including guard conditions (e.g., activating an ARCHIVED round, transitioning a COMPLETED project).
|
|||
|
|
- Files: `src/server/services/round-engine.ts`
|
|||
|
|
- Risk: A regression in state transition guards could allow data corruption (e.g., re-activating a closed round, double-passing a project).
|
|||
|
|
- Priority: High
|
|||
|
|
|
|||
|
|
**Evaluation Submission (advance criterion, COI, scoring):**
|
|||
|
|
- What's not tested: `submitEvaluation` mutation — specifically the advance criterion auto-transition logic (lines 1637–1646 of `src/server/routers/evaluation.ts`), COI auto-reassignment on `declareCOI`, and `upsertForm` criterion validation.
|
|||
|
|
- Files: `src/server/routers/evaluation.ts`
|
|||
|
|
- Risk: Regression in advance criterion will silently skip project advancement. COI declaration failures are caught and logged but untested.
|
|||
|
|
- Priority: High
|
|||
|
|
|
|||
|
|
**AI Anonymization:**
|
|||
|
|
- What's not tested: `sanitizeText`, `anonymizeProject`, `validateNoPersonalInfo` in `src/server/services/anonymization.ts`. These are GDPR-critical functions.
|
|||
|
|
- Files: `src/server/services/anonymization.ts`
|
|||
|
|
- Risk: A PII leak in AI calls would violate GDPR without detection.
|
|||
|
|
- Priority: High
|
|||
|
|
|
|||
|
|
**Assignment Policy Execution:**
|
|||
|
|
- What's not tested: End-to-end `executeAssignment` flow in `src/server/services/round-assignment.ts` — specifically the COI filtering, geo-diversity penalty, familiarity bonus, and under-coverage gap-fill.
|
|||
|
|
- Files: `src/server/services/round-assignment.ts`, `src/server/services/smart-assignment.ts`
|
|||
|
|
- Risk: Silent over- or under-assignment when constraints interact.
|
|||
|
|
- Priority: Medium
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
*Concerns audit: 2026-02-26*
|