feat: render project averages to two decimals

Project-level averages (Raw + Balanced in the side panel, observer project detail score card, observer preview dialog Avg Score) now show two decimals (e.g. 8.33 instead of 8.0/8.3) so admins can see the actual computed value. Per-juror individual scores keep one decimal — they're submissions, not aggregates. ScorePill gains an optional precision prop so call sites can opt into 2-decimal display where the value is an aggregate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat: mount score explainer dialog in admin and observer surfaces
2026-04-27 13:32:52 +02:00 · 2026-04-27 13:31:29 +02:00 · 2026-04-27 13:25:20 +02:00 · 2026-04-27 13:24:34 +02:00 · 2026-04-27 13:23:29 +02:00 · 2026-04-27 13:22:11 +02:00
17 changed files with 2317 additions and 94 deletions
--- a/docs/superpowers/plans/2026-04-27-juror-balance-toggle-and-round-scoping.md
+++ b/docs/superpowers/plans/2026-04-27-juror-balance-toggle-and-round-scoping.md
--- a/docs/superpowers/specs/2026-04-27-juror-balance-toggle-and-round-scoping-design.md
+++ b/docs/superpowers/specs/2026-04-27-juror-balance-toggle-and-round-scoping-design.md
@@ -0,0 +1,218 @@
+# Juror-Balanced Scoring Toggle + Round-Scoping Fixes
+
+**Status:** design
+**Date:** 2026-04-27
+**Author:** Matt + Claude
+
+## Goal
+
+Two related changes to the ranking system:
+
+1. **Add a per-round toggle** that controls whether the ranking dashboard ranks projects by the juror-balanced (z-normalized) score or by the raw average. The toggle persists in `Round.configJson` and is shared across all viewers. Admins flip it from the side panel of the admin ranking dashboard; observers see the effect (which score is "active") but don't get the toggle UI themselves, matching today's role gates on the dashboard.
+2. **Fix cross-round contamination** in two analytics procedures (`getProjectDetail`, `getProjectRankings`) and several UI surfaces that consume them. Per-juror balance contexts must be computed within a single round; aggregate stats (avg score, evaluator count, pass rate) must be scoped to the round being viewed.
+
+A side panel "deeper display" replaces the small `⇢ X.X` annotation on the list view: the list view stays clean, and clicking into a project surfaces the raw + balanced numbers, the toggle, an explainer, and per-juror balance contributions.
+
+## Background
+
+Juror-balanced scoring (`src/server/services/juror-balance.ts`) corrects for per-juror grading harshness using z-normalization. Each juror's scores are normalized against their own mean + stddev across the round, then rescaled onto the round's overall mean + stddev so balanced numbers are comparable to raw averages.
+
+The math is correct, but two scoping problems exist:
+
+**Problem 1 — `getProjectDetail` is round-blind.** The query at `src/server/routers/analytics.ts:1417-1422` pulls every SUBMITTED evaluation for a project across every round it ever participated in, then computes Avg Score / Evaluators / Pass Rate from that pool. Meanwhile the per-juror list rendered in the admin sheet at `src/components/admin/round/ranking-dashboard.tsx:1034-1036` filters to the current round. Result: stats card disagrees with the visible per-juror list.
+
+**Problem 2 — `getProjectRankings` (programId/edition mode) pools z-context across rounds.** At `src/server/routers/analytics.ts:212-218`, when invoked with `programId` (instead of `roundId`), evaluations from every round in the edition are fed into a single `computeBalanceContext`. A juror's mean/stddev is then computed across mixed contexts (e.g. quick intake screening + deep evaluation), producing meaningless personal calibration.
+
+Other call sites (`ranking.ts`, `ai-juror-calibration.ts`) already filter by round and are unaffected.
+
+## Surfaces affected
+
+| # | Surface | Procedure | Issue |
+|---|---|---|---|
+| 1 | Admin ranking dashboard side sheet | `analytics.getProjectDetail` | Stats card pulls cross-round evals |
+| 2 | Observer full project detail page | `analytics.getProjectDetail` | Same; observer-side |
+| 3 | Observer reports preview dialog | `analytics.getProjectDetail` | Same; observer-side |
+| 4 | Admin reports overview tab rankings | `analytics.getProjectRankings` | Edition mode uses cross-round z-context |
+| 5 | Admin reports detail tab rankings | `analytics.getProjectRankings` | Same |
+| 6 | Admin reports overview "Balanced Avg" tile | derives from #4 | Inherits the bad numbers |
+| 7 | Result lock controls | `analytics.getProjectRankings` (roundId only) | OK — already round-scoped |
+| 8 | Admin ranking dashboard list | `ranking.getRoundRanking` | OK — already filters by roundId |
+| 9 | AI juror calibration service | self-contained | OK — already filters by roundId |
+
+## Design
+
+### 1. Round-scoping fixes
+
+#### `analytics.getProjectDetail`
+
+- Add an optional `roundId` to the input schema.
+- When `roundId` is provided, filter `submittedEvaluations` (the query at line 1417) by `assignment: { roundId }`. The stats block computed from those evaluations becomes round-scoped automatically.
+- When `roundId` is not provided, return `stats: null` and a new field `statsByRound: Array<{ roundId, roundName, stats }>` so callers can render per-round breakdowns instead of one misleading aggregate. (The current dialogs always know which round they want — they just weren't passing it.)
+- Pass `roundId` from the three callers (#1, #2, #3 above).
+
+#### `analytics.getProjectRankings`
+
+When called in edition mode (`programId` only), z-normalization must run **per round**, not across the pool:
+
+1. Group `points: ScorePoint[]` by `roundId` (we'll need to include `roundId` in each point — currently `evalWhere` returns flat evaluations; add `assignment.round.id` to the select).
+2. For each round, call `computeBalanceContext(pointsForRound)` and `computeBalancedProjectScores(pointsForRound, ctx)`.
+3. Aggregate per-project: a project's edition-level `balancedScore` is the unweighted mean of its per-round balanced averages. Its `averageScore` (raw) is the unweighted mean of its per-round raw averages.
+4. `evaluationCount` becomes the total across rounds (unchanged in spirit).
+
+In `roundId` mode, behavior is unchanged.
+
+#### Default round resolution (observer full project page, #2)
+
+The observer page at `/observer/projects/[projectId]` doesn't know which round to focus on. Resolution logic:
+
+```
+Among rounds where ProjectRoundState exists for this project:
+  1. If exactly one round.status = ROUND_ACTIVE, use it.
+  2. Else use the most recent round with status = ROUND_CLOSED
+     (ordered by sortOrder desc, or exitedAt desc as tiebreak).
+  3. Else if only ROUND_DRAFT rounds exist, fall back to none (stats: null).
+```
+
+A small round selector chip near the stats card lets the user switch contexts; the URL updates with `?round=<id>`.
+
+### 2. Per-round balanced-scoring toggle
+
+#### Storage
+
+Add `useBalancedRanking: boolean` to `Round.configJson` (default `true` — preserve current behavior). No schema migration needed since `configJson` is already a flexible JSON column.
+
+#### tRPC procedure
+
+Extend `ranking.updateConfig` (or add `setUseBalancedRanking`) — admin/observer-procedure level. The page is admin-only today, so observer access for this toggle would be a deliberate widening. **Decision: keep it `adminProcedure`** (PROGRAM_ADMIN + SUPER_ADMIN). The user said "anyone who can view should be able to toggle," and the page is gated to admins.
+
+#### UI integration
+
+- Toggle lives at the top of the side sheet (not the list view) — labeled "Use balanced scoring for ranking" with a help icon that opens the explainer.
+- When toggled, the dashboard re-sorts immediately (the list-view sort at `ranking-dashboard.tsx:417,879` reads from `evalScores.balanced[id]?.balancedAverage`; we'll wrap that in `useBalancedRanking ? balanced : raw`).
+- The list row's compact `⇢ X.X` annotation is **removed**. Visual delta lives in the side panel only.
+
+### 3. Side panel deeper display
+
+The existing side sheet (`ranking-dashboard.tsx:970-1090`) gains:
+
+#### Stats area (replaces the current 3-card grid)
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│ Avg Score                                                    │
+│   Raw: 8.3      Balanced: 8.0  ← used for ranking            │
+│                                                              │
+│ Evaluators: 3        Pass Rate: 67%                          │
+│                                                              │
+│ ⓘ How is this calculated?  (collapsible)                     │
+└──────────────────────────────────────────────────────────────┘
+```
+
+- "Raw" and "Balanced" sit side-by-side. The active one (per the round's toggle) gets a subtle "← used for ranking" tag and bolder weight.
+- Both numbers always show one decimal (`.toFixed(1)`).
+- Below the numbers, a clickable affordance: **"How scores are calculated"** (small button or link with an info icon). Clicking opens an explainer dialog (see "Score explainer dialog" below).
+
+#### Per-juror rows (extends current `Juror Evaluations` block)
+
+Each row currently shows `Name · Yes/No badge · Score: 9.0`. New layout when balanced is on:
+
+```
+Rachid Benchaouir          Yes   Score: 9.0   (typical 7.2 → contributes 8.5)
+```
+
+The trailing chip is muted text. When balanced is off, the chip is hidden. Tooltip on the chip explains the calculation.
+
+#### Per-round toggle row at top
+
+```
+[Use balanced scoring for ranking]  [toggle]   ⓘ
+```
+
+Single horizontal row, just below the project header. Persists on flip. The ⓘ icon opens the same "How scores are calculated" dialog.
+
+#### Score explainer dialog ("How scores are calculated")
+
+A reusable dialog component (`<ScoreExplainerDialog />`) opens from the affordance in the side panel and from a matching affordance on the observer surfaces (#2, #3) so both audiences see the same explanation. Content is plain-language, not academic, and walks through one concrete worked example.
+
+Structure:
+
+1. **What it does (1 paragraph)** — "Different jurors have different grading styles. Some grade harshly, some leniently. Balanced scoring corrects for that so a project isn't punished for drawing harsh jurors or rewarded for drawing lenient ones."
+
+2. **How it works, step by step** — five short numbered points:
+   1. For each juror, calculate their personal average and spread across all the projects they scored in this round.
+   2. Convert each individual score into "how many standard deviations above or below this juror's typical" — a 6 from a juror who averages 5 reads the same as a 9 from a juror who averages 8.
+   3. Average those normalized values across the project's jurors.
+   4. Rescale back onto the same 1–10 scale using the round's overall average and spread.
+   5. The result is directly comparable to the raw average — same scale, but corrected for grading style.
+
+3. **Worked example** — a concrete table using fabricated jurors, e.g.:
+
+   | Juror | Their typical avg | Their score for "Project X" | What that means |
+   |---|---|---|---|
+   | Juror A (lenient) | 8.2 | 9.0 | Just slightly above their typical (+0.4σ) |
+   | Juror B (harsh) | 5.8 | 7.5 | Well above their typical (+1.5σ) |
+   | Juror C (typical) | 7.0 | 8.0 | Slightly above their typical (+0.7σ) |
+
+   "Raw average: (9.0 + 7.5 + 8.0) / 3 = **8.2**
+   Balanced average rescales each juror's enthusiasm to the round's overall scale and lands at **8.4** — Juror B's strong endorsement (well above their harsh baseline) carries more weight than the raw 7.5 suggests."
+
+4. **When it kicks in / when it doesn't** — short paragraph:
+   - Needs ≥ 2 evaluations from the round to compute a juror's spread; otherwise that juror falls back to the round-wide average.
+   - Needs at least one juror with non-zero spread for the round; if everyone gave identical scores, balanced equals raw.
+   - Computed within a single round only — a juror's grading style in an intake screening round doesn't affect their balance in a deeper evaluation round.
+
+5. **Why "Raw" is still shown** — "We always show both numbers so admins can sanity-check. The toggle at the top of the panel decides which one is used for ranking."
+
+The dialog is a `shadcn/ui` `Dialog`, max-width ~`md`, scrollable. No live data — content is static text + the static example table. Lives in `src/components/shared/score-explainer-dialog.tsx` so it can be imported by admin and observer surfaces alike.
+
+### 4. Decimal display audit
+
+Standardize on **one decimal** for all balanced/raw score surfaces:
+
+- `admin/reports/page.tsx:368` currently shows `toFixed(2)` — change to `toFixed(1)`.
+- All other sites already use `.toFixed(1)` or compute integers.
+
+## Data flow summary
+
+```
+Round.configJson.useBalancedRanking ──→ ranking-dashboard reads on mount
+                                    ──→ list sort uses raw or balanced based on flag
+                                    ──→ side panel shows both, marks the active one
+
+getProjectDetail({ id, roundId })  ──→ filtered submittedEvaluations
+                                   ──→ round-scoped stats
+                                   ──→ optionally: per-round balance context computed
+                                       inline for the side panel deeper display
+
+getProjectRankings({ programId })  ──→ group by roundId
+                                   ──→ per-round balance context
+                                   ──→ aggregate per-project means across rounds
+```
+
+## Out of scope
+
+- Migrating historical `ResultLock` snapshots that captured the old (potentially miscomputed) edition-level rankings. Past locks were round-scoped, so they're already correct; only the read-time edition rollup was broken.
+- Exposing the toggle to OBSERVER role. Today it's admin-only, matching page access.
+- AI calibration service changes — already round-scoped.
+- Changing the underlying juror-balance math. The algorithm is correct; only the inputs needed scoping.
+
+## Risks
+
+- **Edition rollup semantic change.** Anyone currently looking at "all rounds" balanced rankings sees different numbers after the fix. This is the right outcome but should be communicated to the team. The numbers shown today are not trustworthy.
+- **Toggle default.** Defaulting `useBalancedRanking = true` preserves today's behavior. Existing rounds without the field set use the default.
+- **Side-panel re-renders.** The toggle live-updates the list sort; ensure `useQuery` invalidations are wired so a flip in the panel triggers a re-fetch / re-sort without a full page reload.
+
+## Open items
+
+None blocking. Implementation plan can proceed.
+
+## Acceptance criteria
+
+1. With 3 round-scoped evaluations of 9, 8, 8, the side panel stats card shows **Avg 8.3** (not 8.0) and **Evaluators 3** (not 5).
+2. Flipping the per-round toggle re-sorts the list view; the choice persists across page reloads and is shared across users.
+3. The list view shows no per-row balanced delta annotation.
+4. The side panel always shows both Raw and Balanced; the active one is marked.
+5. Edition-level rankings (`programId` mode) compute one balance context per round and aggregate, never pooling across rounds.
+6. Observer project detail page defaults to the currently-active or most-recently-closed round the project participated in.
+7. All score displays use one decimal.
+8. A "How scores are calculated" affordance is present in the admin side panel, the observer full project page, and the observer reports preview dialog. Clicking it opens an explainer dialog with the algorithm summary, a step-by-step plain-language walkthrough, and a worked example.
--- a/src/app/(observer)/observer/projects/[projectId]/page.tsx
+++ b/src/app/(observer)/observer/projects/[projectId]/page.tsx
@@ -6,10 +6,13 @@ export const dynamic = 'force-dynamic'

 export default async function ObserverProjectDetailPage({
  params,
+  searchParams,
 }: {
  params: Promise<{ projectId: string }>
+  searchParams: Promise<{ round?: string }>
 }) {
  const { projectId } = await params
+  const sp = await searchParams

-  return <ObserverProjectDetail projectId={projectId} />
+  return <ObserverProjectDetail projectId={projectId} initialRoundId={sp.round} />
 }
--- a/src/components/admin/round/ranking-dashboard.tsx
+++ b/src/components/admin/round/ranking-dashboard.tsx
@@ -37,6 +37,8 @@ import { Input } from '@/components/ui/input'
 import { Label } from '@/components/ui/label'
 import { Textarea } from '@/components/ui/textarea'
 import { Slider } from '@/components/ui/slider'
+import { Switch } from '@/components/ui/switch'
+import { ScoreExplainerDialog } from '@/components/shared/score-explainer-dialog'
 import {
  Collapsible,
  CollapsibleContent,
@@ -82,7 +84,6 @@ type SortableProjectRowProps = {
  entry: (RankedProjectEntry & { originalIndex?: number }) | undefined
  projectInfo: ProjectInfo | undefined
  jurorScores: JurorScore[] | undefined
-  balancedScore: number | null
  onSelect: () => void
  isSelected: boolean
  originalRank: number | undefined // from snapshotOrder — always in sync with localOrder
@@ -96,7 +97,6 @@ function SortableProjectRow({
  entry,
  projectInfo,
  jurorScores,
-  balancedScore,
  onSelect,
  isSelected,
  originalRank,
@@ -197,31 +197,10 @@ function SortableProjectRow({
          </div>
        ) : entry?.avgGlobalScore !== null && entry?.avgGlobalScore !== undefined ? (
          <span className="text-xs text-muted-foreground">
-            Avg {entry.avgGlobalScore.toFixed(1)}
+            Avg {entry.avgGlobalScore.toFixed(2)}
          </span>
        ) : null}

-        {/* Raw + balanced averages shown side by side */}
-        {entry?.avgGlobalScore !== null && entry?.avgGlobalScore !== undefined && jurorScores && jurorScores.length > 1 && (
-          <div className="flex items-center gap-1.5 text-xs" title="Raw juror average vs. juror-balanced average (z-score normalized per juror, rescaled to 1-10)">
-            <span className="font-medium text-muted-foreground">
-              {entry.avgGlobalScore.toFixed(1)}
-            </span>
-            {balancedScore != null && Math.abs(balancedScore - entry.avgGlobalScore) >= 0.05 && (
-              <span
-                className={cn(
-                  'font-semibold tabular-nums rounded px-1.5 py-0.5 border',
-                  balancedScore > entry.avgGlobalScore
-                    ? 'bg-emerald-50 text-emerald-700 border-emerald-200'
-                    : 'bg-amber-50 text-amber-700 border-amber-200',
-                )}
-              >
-                ⇢ {balancedScore.toFixed(1)}
-              </span>
-            )}
-          </div>
-        )}
-
        {/* Advance decision indicator */}
        <div className={cn(
          'inline-flex items-center gap-1 rounded-full px-2 py-0.5 text-xs font-medium',
@@ -271,6 +250,7 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
  const [localCriteriaText, setLocalCriteriaText] = useState<string>('')
  const [localScoreWeight, setLocalScoreWeight] = useState(5)
  const [localPassRateWeight, setLocalPassRateWeight] = useState(5)
+  const [useBalanced, setUseBalanced] = useState(true)
  const weightsInitialized = useRef(false)

  // ─── Sensors ──────────────────────────────────────────────────────────────
@@ -304,7 +284,7 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
  )

  const { data: projectDetail, isLoading: detailLoading } = trpc.project.getFullDetail.useQuery(
-    { id: selectedProjectId! },
+    { id: selectedProjectId!, roundId },
    { enabled: !!selectedProjectId },
  )

@@ -409,12 +389,15 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
      const dedupedStartup = dedup(startup)
      const dedupedConcept = dedup(concept)

-      // Sort by balanced (juror-corrected) score descending, falling back to raw
-      // avgGlobalScore when no balanced score is available, then compositeScore as
-      // a final tiebreaker. The threshold cutoff line uses the same metric so the
-      // cutoff lands in the correct spot regardless of which score type is used.
-      const scoreFor = (projectId: string, raw: number | null | undefined) =>
-        evalScores.balanced[projectId]?.balancedAverage ?? raw ?? 0
+      // Sort by balanced (juror-corrected) score descending when the toggle is
+      // on, otherwise by raw. compositeScore is the final tiebreaker. The
+      // threshold cutoff line uses the same metric so the cutoff lands in the
+      // right spot regardless of which score type is used.
+      const scoreFor = (projectId: string, raw: number | null | undefined) => {
+        const balanced = evalScores.balanced[projectId]?.balancedAverage
+        if (useBalanced && balanced != null) return balanced
+        return raw ?? 0
+      }
      dedupedStartup.sort((a, b) =>
        scoreFor(b.projectId, b.avgGlobalScore) - scoreFor(a.projectId, a.avgGlobalScore)
        || b.compositeScore - a.compositeScore)
@@ -463,6 +446,49 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
    }
  }, [snapshot, evalScores])

+  // ─── Re-sort on toggle flip (after init) ─────────────────────────────────
+  // Only resorts when no server-side manual reorder is pinned for the snapshot;
+  // persisted manual reorders always win regardless of the score being used.
+  useEffect(() => {
+    if (!initialized.current || !snapshot || !evalScores) return
+    const reorders = (snapshot.reordersJson as Array<{
+      category: 'STARTUP' | 'BUSINESS_CONCEPT'
+      orderedProjectIds: string[]
+    }> | null) ?? []
+    const hasManualReorder =
+      reorders.some((r) => r.category === 'STARTUP') ||
+      reorders.some((r) => r.category === 'BUSINESS_CONCEPT')
+    if (hasManualReorder) return
+    const startup = (snapshot.startupRankingJson ?? []) as unknown as RankedProjectEntry[]
+    const concept = (snapshot.conceptRankingJson ?? []) as unknown as RankedProjectEntry[]
+    const dedup = (arr: RankedProjectEntry[]): RankedProjectEntry[] => {
+      const seen = new Set<string>()
+      return arr.filter((r) => {
+        if (seen.has(r.projectId)) return false
+        seen.add(r.projectId)
+        return true
+      })
+    }
+    const scoreFor = (projectId: string, raw: number | null | undefined) => {
+      const balanced = evalScores.balanced[projectId]?.balancedAverage
+      if (useBalanced && balanced != null) return balanced
+      return raw ?? 0
+    }
+    const sortedStartup = dedup(startup).sort((a, b) =>
+      scoreFor(b.projectId, b.avgGlobalScore) - scoreFor(a.projectId, a.avgGlobalScore)
+      || b.compositeScore - a.compositeScore)
+    const sortedConcept = dedup(concept).sort((a, b) =>
+      scoreFor(b.projectId, b.avgGlobalScore) - scoreFor(a.projectId, a.avgGlobalScore)
+      || b.compositeScore - a.compositeScore)
+    setLocalOrder({
+      STARTUP: sortedStartup.map((r) => r.projectId),
+      BUSINESS_CONCEPT: sortedConcept.map((r) => r.projectId),
+    })
+    // Eslint disable: snapshot/evalScores are read but the resort should only
+    // run on toggle flip, not on every snapshot/scores refetch.
+    // eslint-disable-next-line react-hooks/exhaustive-deps
+  }, [useBalanced])
+
  // ─── numericCriteria from eval form ─────────────────────────────────────
  const numericCriteria = useMemo(() => {
    if (!evalForm?.criteriaJson) return []
@@ -471,18 +497,32 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
  }, [evalForm])

  // ─── Init local weights + criteriaText from round config ──────────────────
+  // useBalanced is hydrated on every roundData refetch (it has its own toggle
+  // that persists immediately), so it sits outside the once-only guard.
  useEffect(() => {
-    if (!weightsInitialized.current && roundData?.configJson) {
-      const cfg = roundData.configJson as Record<string, unknown>
-      const saved = (cfg.criteriaWeights ?? {}) as Record<string, number>
-      setLocalWeights(saved)
-      setLocalCriteriaText((cfg.rankingCriteria as string) ?? '')
-      setLocalScoreWeight((cfg.scoreWeight as number) ?? 5)
-      setLocalPassRateWeight((cfg.passRateWeight as number) ?? 5)
-      weightsInitialized.current = true
-    }
+    if (!roundData?.configJson) return
+    const cfg = roundData.configJson as Record<string, unknown>
+    setUseBalanced((cfg.useBalancedRanking as boolean | undefined) ?? true)
+    if (weightsInitialized.current) return
+    const saved = (cfg.criteriaWeights ?? {}) as Record<string, number>
+    setLocalWeights(saved)
+    setLocalCriteriaText((cfg.rankingCriteria as string) ?? '')
+    setLocalScoreWeight((cfg.scoreWeight as number) ?? 5)
+    setLocalPassRateWeight((cfg.passRateWeight as number) ?? 5)
+    weightsInitialized.current = true
  }, [roundData])

+  // ─── Persist the balanced-ranking toggle immediately ─────────────────────
+  const persistUseBalanced = (next: boolean) => {
+    setUseBalanced(next)
+    if (!roundData?.configJson) return
+    const cfg = roundData.configJson as Record<string, unknown>
+    updateRoundMutation.mutate({
+      id: roundId,
+      configJson: { ...cfg, useBalancedRanking: next },
+    })
+  }
+
  // ─── Save weights + criteria text to round config ─────────────────────────
  const saveRankingConfig = () => {
    if (!roundData?.configJson) return
@@ -870,13 +910,15 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
                  : (evalConfig?.conceptAdvanceCount ?? 0))
                const threshold = evalConfig?.advanceScoreThreshold ?? 0

-                // Effective ranking score = balanced (juror-corrected) average,
-                // falling back to raw avgGlobalScore. Both the sort and the
-                // threshold check use this same value so the cutoff lands in
-                // the right spot.
+                // Effective ranking score respects the per-round
+                // useBalancedRanking toggle. Both the sort and the threshold
+                // check read from the same helper so the cutoff lands in the
+                // right spot.
                const effectiveScore = (id: string) => {
                  const e = rankingMap.get(id)
-                  return evalScores?.balanced[id]?.balancedAverage ?? e?.avgGlobalScore ?? 0
+                  const balanced = evalScores?.balanced[id]?.balancedAverage
+                  if (useBalanced && balanced != null) return balanced
+                  return e?.avgGlobalScore ?? 0
                }

                let cutoffIndex = -1
@@ -936,7 +978,6 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
                                  entry={rankingMap.get(projectId)}
                                  projectInfo={projectInfoMap.get(projectId)}
                                  jurorScores={evalScores?.byProject[projectId]}
-                                  balancedScore={evalScores?.balanced[projectId]?.balancedAverage ?? null}
                                  onSelect={() => setSelectedProjectId(projectId)}
                                  isSelected={selectedProjectId === projectId}
                                  originalRank={hasReorders ? snapshotOrder[projectId] : undefined}
@@ -1001,31 +1042,63 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
            </div>
          ) : projectDetail ? (
            <div className="mt-6 space-y-6">
-              {/* Stats summary */}
-              {projectDetail.stats && (
-                <div className="grid grid-cols-3 gap-3">
-                  <div className="rounded-lg border p-3 text-center">
-                    <p className="text-xs text-muted-foreground">Avg Score</p>
-                    <p className="mt-1 text-lg font-semibold">
-                      {projectDetail.stats.averageGlobalScore?.toFixed(1) ?? '—'}
-                    </p>
-                  </div>
-                  <div className="rounded-lg border p-3 text-center">
-                    <p className="text-xs text-muted-foreground">Pass Rate</p>
-                    <p className="mt-1 text-lg font-semibold">
-                      {projectDetail.stats.totalEvaluations > 0
-                        ? `${Math.round((projectDetail.stats.yesVotes / projectDetail.stats.totalEvaluations) * 100)}%`
-                        : '—'}
-                    </p>
-                  </div>
-                  <div className="rounded-lg border p-3 text-center">
-                    <p className="text-xs text-muted-foreground">Evaluators</p>
-                    <p className="mt-1 text-lg font-semibold">
-                      {projectDetail.stats.totalEvaluations}
-                    </p>
-                  </div>
+              {/* Balanced-ranking toggle (per-round; persists across viewers) */}
+              <div className="flex items-center justify-between rounded-lg border p-3">
+                <div className="flex flex-col">
+                  <span className="text-sm font-medium">Use balanced scoring for ranking</span>
+                  <span className="text-xs text-muted-foreground">
+                    Corrects for per-juror grading style. Off uses raw averages.
+                  </span>
                </div>
-              )}
+                <Switch checked={useBalanced} onCheckedChange={persistUseBalanced} />
+              </div>
+              {/* Stats summary: combined Avg card with Raw + Balanced side-by-side */}
+              {projectDetail.stats && (() => {
+                const raw = selectedProjectId
+                  ? evalScores?.balanced[selectedProjectId]?.rawAverage ?? null
+                  : null
+                const balanced = selectedProjectId
+                  ? evalScores?.balanced[selectedProjectId]?.balancedAverage ?? null
+                  : null
+                return (
+                  <div className="space-y-3">
+                    <div className="rounded-lg border p-3">
+                      <p className="text-xs text-muted-foreground mb-2">Avg Score</p>
+                      <div className="flex items-baseline gap-4 flex-wrap">
+                        <div className={`flex items-baseline gap-1 ${useBalanced ? 'text-muted-foreground' : 'font-semibold'}`}>
+                          <span className="text-xs">Raw</span>
+                          <span className="text-lg tabular-nums">{raw != null ? raw.toFixed(2) : '—'}</span>
+                          {!useBalanced && <span className="ml-1 text-[10px] text-muted-foreground">← used for ranking</span>}
+                        </div>
+                        <div className={`flex items-baseline gap-1 ${useBalanced ? 'font-semibold' : 'text-muted-foreground'}`}>
+                          <span className="text-xs">Balanced</span>
+                          <span className="text-lg tabular-nums">{balanced != null ? balanced.toFixed(2) : '—'}</span>
+                          {useBalanced && <span className="ml-1 text-[10px] text-muted-foreground">← used for ranking</span>}
+                        </div>
+                      </div>
+                      <div className="mt-2 flex justify-end">
+                        <ScoreExplainerDialog />
+                      </div>
+                    </div>
+                    <div className="grid grid-cols-2 gap-3">
+                      <div className="rounded-lg border p-3 text-center">
+                        <p className="text-xs text-muted-foreground">Pass Rate</p>
+                        <p className="mt-1 text-lg font-semibold">
+                          {projectDetail.stats.totalEvaluations > 0
+                            ? `${Math.round((projectDetail.stats.yesVotes / projectDetail.stats.totalEvaluations) * 100)}%`
+                            : '—'}
+                        </p>
+                      </div>
+                      <div className="rounded-lg border p-3 text-center">
+                        <p className="text-xs text-muted-foreground">Evaluators</p>
+                        <p className="mt-1 text-lg font-semibold">
+                          {projectDetail.stats.totalEvaluations}
+                        </p>
+                      </div>
+                    </div>
+                  </div>
+                )
+              })()}

              {/* Per-juror evaluations */}
              <div>
@@ -1067,6 +1140,28 @@ export function RankingDashboard({ competitionId: _competitionId, roundId }: Ran
                                  </Badge>
                                )}
                                <Badge variant="outline">Score: {a.evaluation?.globalScore?.toFixed(1) ?? '—'}</Badge>
+                                {useBalanced && (() => {
+                                  const userId = a.user?.id
+                                  const score = a.evaluation?.globalScore
+                                  if (!userId || score == null) return null
+                                  const stats = evalScores?.jurorStats?.[userId]
+                                  const overallMean = evalScores?.overallMean
+                                  const overallStddev = evalScores?.overallStddev
+                                  if (!stats || overallMean == null || overallStddev == null || overallStddev === 0) return null
+                                  const z = stats.stddev > 0
+                                    ? (score - stats.mean) / stats.stddev
+                                    : (score - overallMean) / overallStddev
+                                  const contributesAs = overallMean + z * overallStddev
+                                  return (
+                                    <span
+                                      className="text-xs text-muted-foreground"
+                                      title={`Their typical score in this round; rescaled contribution after juror balancing`}
+                                      onClick={(e) => e.stopPropagation()}
+                                    >
+                                      typical {stats.mean.toFixed(2)} → contributes {contributesAs.toFixed(2)}
+                                    </span>
+                                  )
+                                })()}
                              </div>
                            </div>
                            {isExpanded && a.evaluation?.feedbackText && (
--- a/src/components/observer/observer-project-detail.tsx
+++ b/src/components/observer/observer-project-detail.tsx
@@ -1,5 +1,6 @@
 'use client'

+import { useEffect, useState } from 'react'
 import Link from 'next/link'
 import type { Route } from 'next'
 import { useRouter } from 'next/navigation'
@@ -43,16 +44,43 @@ import {
  ArrowLeft,
 } from 'lucide-react'
 import { cn, formatDate, formatDateOnly } from '@/lib/utils'
+import { ScoreExplainerDialog } from '@/components/shared/score-explainer-dialog'

-export function ObserverProjectDetail({ projectId }: { projectId: string }) {
+export function ObserverProjectDetail({
+  projectId,
+  initialRoundId,
+}: {
+  projectId: string
+  initialRoundId?: string
+}) {
  const router = useRouter()
+  const [activeRoundId, setActiveRoundId] = useState<string | undefined>(initialRoundId)
+
+  // Resolve a default round when none is set: prefer the currently OPEN round
+  // the project participates in, fall back to the most recently CLOSED one.
+  const { data: roundCandidates } = trpc.analytics.getProjectRoundsForObserver.useQuery(
+    { projectId },
+  )
+  useEffect(() => {
+    if (activeRoundId || !roundCandidates) return
+    const active = roundCandidates.find((r) => r.status === 'ROUND_ACTIVE')
+    if (active) {
+      setActiveRoundId(active.id)
+      return
+    }
+    const closed = [...roundCandidates]
+      .filter((r) => r.status === 'ROUND_CLOSED')
+      .sort((a, b) => b.sortOrder - a.sortOrder)[0]
+    if (closed) setActiveRoundId(closed.id)
+  }, [roundCandidates, activeRoundId])
+
  const { data, isLoading } = trpc.analytics.getProjectDetail.useQuery(
-    { id: projectId },
+    { id: projectId, roundId: activeRoundId },
    { refetchInterval: 30_000 },
  )
  const { data: flags } = trpc.settings.getFeatureFlags.useQuery()

-  const roundId = data?.assignments?.[0]?.roundId as string | undefined
+  const roundId = activeRoundId ?? (data?.assignments?.[0]?.roundId as string | undefined)
  const { data: activeForm } = trpc.evaluation.getStageForm.useQuery(
    { roundId: roundId ?? '', category: data?.project?.competitionCategory },
    { enabled: !!roundId },
@@ -207,7 +235,7 @@ export function ObserverProjectDetail({ projectId }: { projectId: string }) {
                </div>
                <p className="mt-2 text-4xl font-bold tabular-nums">
                  {stats.averageGlobalScore != null
-                    ? stats.averageGlobalScore.toFixed(1)
+                    ? stats.averageGlobalScore.toFixed(2)
                    : '-'}
                </p>
                <p className="text-xs text-muted-foreground">
@@ -223,6 +251,22 @@ export function ObserverProjectDetail({ projectId }: { projectId: string }) {
                    {stats.yesPercentage.toFixed(0)}% recommended
                  </p>
                )}
+                {roundCandidates && roundCandidates.length > 1 && (
+                  <div className="mt-3 w-full">
+                    <select
+                      className="w-full rounded border bg-background px-2 py-1 text-xs"
+                      value={activeRoundId ?? ''}
+                      onChange={(e) => setActiveRoundId(e.target.value)}
+                    >
+                      {roundCandidates.map((r) => (
+                        <option key={r.id} value={r.id}>{r.name}</option>
+                      ))}
+                    </select>
+                  </div>
+                )}
+                <div className="mt-2 w-full">
+                  <ScoreExplainerDialog />
+                </div>
              </div>
            </CardContent>
          </Card>
--- a/src/components/observer/reports/evaluation-report-tabs.tsx
+++ b/src/components/observer/reports/evaluation-report-tabs.tsx
@@ -546,7 +546,7 @@ function JurorsSubTab({ roundId, selectedValue }: { roundId: string; selectedVal
      {isLoading ? (
        <Skeleton className="h-[400px]" />
      ) : jurors.length > 0 ? (
-        <ExpandableJurorTable jurors={jurors} />
+        <ExpandableJurorTable jurors={jurors} roundId={roundId} />
      ) : hasSelection ? (
        <Card>
          <CardContent className="flex items-center justify-center py-12">
--- a/src/components/observer/reports/expandable-juror-table.tsx
+++ b/src/components/observer/reports/expandable-juror-table.tsx
@@ -30,6 +30,7 @@ interface JurorRow {

 interface ExpandableJurorTableProps {
  jurors: JurorRow[]
+  roundId?: string
 }

 function evalStatusBadge(status: string) {
@@ -56,7 +57,7 @@ function ScorePill({ score }: { score: number }) {
  )
 }

-export function ExpandableJurorTable({ jurors }: ExpandableJurorTableProps) {
+export function ExpandableJurorTable({ jurors, roundId }: ExpandableJurorTableProps) {
  const [expanded, setExpanded] = useState<string | null>(null)
  const [previewProjectId, setPreviewProjectId] = useState<string | null>(null)

@@ -260,6 +261,7 @@ export function ExpandableJurorTable({ jurors }: ExpandableJurorTableProps) {
      {/* Project Preview Dialog */}
      <ProjectPreviewDialog
        projectId={previewProjectId}
+        roundId={roundId}
        open={!!previewProjectId}
        onOpenChange={(open) => { if (!open) setPreviewProjectId(null) }}
      />
--- a/src/components/observer/reports/filtering-report-tabs.tsx
+++ b/src/components/observer/reports/filtering-report-tabs.tsx
@@ -334,6 +334,7 @@ export function FilteringReportTabs({ roundId }: FilteringReportTabsProps) {

      <ProjectPreviewDialog
        projectId={previewProjectId}
+        roundId={roundId}
        open={!!previewProjectId}
        onOpenChange={(open) => { if (!open) setPreviewProjectId(null) }}
      />
--- a/src/components/observer/reports/project-preview-dialog.tsx
+++ b/src/components/observer/reports/project-preview-dialog.tsx
@@ -18,14 +18,16 @@ import { ExternalLink, MapPin, Waves, Users } from 'lucide-react'
 import Link from 'next/link'
 import type { Route } from 'next'
 import { scoreGradient } from '@/components/charts/chart-theme'
+import { ScoreExplainerDialog } from '@/components/shared/score-explainer-dialog'

 interface ProjectPreviewDialogProps {
  projectId: string | null
+  roundId?: string
  open: boolean
  onOpenChange: (open: boolean) => void
 }

-function ScorePill({ score }: { score: number }) {
+function ScorePill({ score, precision = 1 }: { score: number; precision?: 1 | 2 }) {
  const bg = scoreGradient(score)
  const text = score >= 6 ? '#ffffff' : '#1a1a1a'
  return (
@@ -33,14 +35,14 @@ function ScorePill({ score }: { score: number }) {
      className="inline-flex items-center justify-center rounded-md px-2.5 py-1 text-sm font-bold tabular-nums"
      style={{ backgroundColor: bg, color: text }}
    >
-      {score.toFixed(1)}
+      {score.toFixed(precision)}
    </span>
  )
 }

-export function ProjectPreviewDialog({ projectId, open, onOpenChange }: ProjectPreviewDialogProps) {
+export function ProjectPreviewDialog({ projectId, roundId, open, onOpenChange }: ProjectPreviewDialogProps) {
  const { data, isLoading } = trpc.analytics.getProjectDetail.useQuery(
-    { id: projectId! },
+    { id: projectId!, roundId },
    { enabled: !!projectId && open },
  )

@@ -107,12 +109,15 @@ export function ProjectPreviewDialog({ projectId, open, onOpenChange }: ProjectP
              {/* Evaluation summary */}
              {data.stats && (
                <div>
-                  <h3 className="text-sm font-semibold mb-2">Evaluation Summary</h3>
+                  <div className="mb-2 flex items-center justify-between">
+                    <h3 className="text-sm font-semibold">Evaluation Summary</h3>
+                    <ScoreExplainerDialog />
+                  </div>
                  <div className="grid grid-cols-2 sm:grid-cols-4 gap-3">
                    <div className="rounded-md border p-3 text-center">
                      <p className="text-lg font-bold tabular-nums">
                        {data.stats.averageGlobalScore != null ? (
-                          <ScorePill score={data.stats.averageGlobalScore} />
+                          <ScorePill score={data.stats.averageGlobalScore} precision={2} />
                        ) : '—'}
                      </p>
                      <p className="text-xs text-muted-foreground mt-1">Avg Score</p>
--- a/src/components/shared/score-explainer-dialog.tsx
+++ b/src/components/shared/score-explainer-dialog.tsx
@@ -0,0 +1,109 @@
+'use client'
+
+import {
+  Dialog,
+  DialogContent,
+  DialogHeader,
+  DialogTitle,
+  DialogTrigger,
+} from '@/components/ui/dialog'
+import { Button } from '@/components/ui/button'
+import { Info } from 'lucide-react'
+import type { ReactNode } from 'react'
+
+export function ScoreExplainerDialog({ trigger }: { trigger?: ReactNode }) {
+  return (
+    <Dialog>
+      <DialogTrigger asChild>
+        {trigger ?? (
+          <Button variant="ghost" size="sm" className="h-7 gap-1 px-2 text-xs">
+            <Info className="h-3.5 w-3.5" />
+            How scores are calculated
+          </Button>
+        )}
+      </DialogTrigger>
+      <DialogContent className="max-w-xl max-h-[85vh] overflow-y-auto">
+        <DialogHeader>
+          <DialogTitle>How scores are calculated</DialogTitle>
+        </DialogHeader>
+
+        <div className="space-y-4 text-sm">
+          <p>
+            Different jurors have different grading styles. Some grade harshly, some
+            leniently. Balanced scoring corrects for that so a project isn&apos;t
+            punished for drawing harsh jurors or rewarded for drawing lenient ones.
+          </p>
+
+          <div>
+            <h3 className="font-semibold mb-1">How it works</h3>
+            <ol className="list-decimal pl-5 space-y-1">
+              <li>For each juror, calculate their personal average and spread across all the projects they scored in this round.</li>
+              <li>Convert each individual score into &quot;how many standard deviations above or below this juror&apos;s typical&quot; — a 6 from a juror who averages 5 reads the same as a 9 from a juror who averages 8.</li>
+              <li>Average those normalized values across the project&apos;s jurors.</li>
+              <li>Rescale back onto the same 1–10 scale using the round&apos;s overall average and spread.</li>
+              <li>The result is directly comparable to the raw average — same scale, but corrected for grading style.</li>
+            </ol>
+          </div>
+
+          <div>
+            <h3 className="font-semibold mb-1">Worked example</h3>
+            <table className="w-full text-xs border-collapse">
+              <thead>
+                <tr className="border-b">
+                  <th className="py-1 text-left">Juror</th>
+                  <th className="py-1 text-left">Their typical avg</th>
+                  <th className="py-1 text-left">Score for &quot;Project X&quot;</th>
+                  <th className="py-1 text-left">What that means</th>
+                </tr>
+              </thead>
+              <tbody>
+                <tr className="border-b">
+                  <td className="py-1">Juror A (lenient)</td>
+                  <td>8.20</td>
+                  <td>9.00</td>
+                  <td>Just above their typical (+0.4σ)</td>
+                </tr>
+                <tr className="border-b">
+                  <td className="py-1">Juror B (harsh)</td>
+                  <td>5.80</td>
+                  <td>7.50</td>
+                  <td>Well above their typical (+1.5σ)</td>
+                </tr>
+                <tr>
+                  <td className="py-1">Juror C (typical)</td>
+                  <td>7.00</td>
+                  <td>8.00</td>
+                  <td>Slightly above their typical (+0.7σ)</td>
+                </tr>
+              </tbody>
+            </table>
+            <p className="mt-2 text-xs text-muted-foreground">
+              Raw average: (9.00 + 7.50 + 8.00) / 3 = <strong>8.17</strong>.
+              Balanced average rescales each juror&apos;s enthusiasm to the round&apos;s
+              overall scale and lands at roughly <strong>8.40</strong> — Juror B&apos;s
+              strong endorsement (well above their harsh baseline) carries more weight
+              than the raw 7.50 suggests.
+            </p>
+          </div>
+
+          <div>
+            <h3 className="font-semibold mb-1">When it kicks in</h3>
+            <ul className="list-disc pl-5 space-y-1">
+              <li>Needs at least 2 evaluations from the round to compute a juror&apos;s spread; otherwise that juror falls back to the round-wide average.</li>
+              <li>Needs at least one juror with non-zero spread; if every juror gave identical scores, balanced equals raw.</li>
+              <li>Computed within a single round only — a juror&apos;s grading style in an intake screening doesn&apos;t affect their balance in a deep evaluation.</li>
+            </ul>
+          </div>
+
+          <div>
+            <h3 className="font-semibold mb-1">Why we still show &quot;Raw&quot;</h3>
+            <p>
+              Both numbers are always shown so you can sanity-check the correction. The
+              toggle at the top of the side panel decides which one is used for ranking.
+            </p>
+          </div>
+        </div>
+      </DialogContent>
+    </Dialog>
+  )
+}
--- a/src/server/routers/analytics.ts
+++ b/src/server/routers/analytics.ts
@@ -6,7 +6,13 @@ import { getProjectLogoUrl } from '../utils/project-logo-url'
 import { aggregateVotes } from '../services/deliberation'
 import { validateRoundConfig } from '@/types/competition-configs'
 import type { LiveFinalConfig } from '@/types/competition-configs'
-import { computeBalanceContext, computeBalancedProjectScores, type ScorePoint } from '../services/juror-balance'
+import {
+  computeBalanceContext,
+  computeBalancedProjectScores,
+  computePerRoundBalanced,
+  type ScorePoint,
+  type RoundScopedScorePoint,
+} from '../services/juror-balance'
 import { generateJurorCalibration } from '../services/ai-juror-calibration'

 const editionOrRoundInput = z.object({
@@ -213,24 +219,39 @@ export const analyticsRouter = router({
          where: evalWhere(input, { status: 'SUBMITTED' }),
          select: {
            criterionScoresJson: true,
-            assignment: { select: { userId: true, projectId: true } },
+            assignment: { select: { userId: true, projectId: true, roundId: true } },
          },
        }),
      ])

      // Extract a single eval-level score (mean of numeric criterion scores) per evaluation.
-      const points: ScorePoint[] = []
+      const rawPoints: RoundScopedScorePoint[] = []
      for (const e of evaluations) {
        const scores = e.criterionScoresJson as Record<string, unknown> | null
        if (!scores) continue
        const vals = Object.values(scores).filter((s): s is number => typeof s === 'number')
        if (vals.length === 0) continue
        const rawScore = vals.reduce((a, b) => a + b, 0) / vals.length
-        points.push({ projectId: e.assignment.projectId, userId: e.assignment.userId, rawScore })
+        rawPoints.push({
+          projectId: e.assignment.projectId,
+          userId: e.assignment.userId,
+          roundId: e.assignment.roundId,
+          rawScore,
+        })
      }

-      const balanceCtx = computeBalanceContext(points)
-      const balancedByProject = computeBalancedProjectScores(points, balanceCtx)
+      // roundId mode: single-round z-context (existing behavior).
+      // programId mode: per-round z-contexts aggregated as the mean of per-round
+      // balanced averages — never pool z-contexts across rounds because a juror's
+      // grading profile differs by round type.
+      const balancedByProject: Map<string, { rawAverage: number | null; balancedAverage: number | null; count: number }> = (() => {
+        if (input.roundId) {
+          const flat: ScorePoint[] = rawPoints.map(({ projectId, userId, rawScore }) => ({ projectId, userId, rawScore }))
+          const ctx = computeBalanceContext(flat)
+          return computeBalancedProjectScores(flat, ctx)
+        }
+        return computePerRoundBalanced(rawPoints)
+      })()

      const rankings = projects
        .map((project) => {
@@ -1368,7 +1389,7 @@ export const analyticsRouter = router({
   * Read-only combined endpoint to avoid multiple round-trips.
   */
  getProjectDetail: observerProcedure
-    .input(z.object({ id: z.string() }))
+    .input(z.object({ id: z.string(), roundId: z.string().optional() }))
    .query(async ({ ctx, input }) => {
      const [projectRaw, projectTags, assignments, submittedEvaluations] = await Promise.all([
        ctx.prisma.project.findUniqueOrThrow({
@@ -1417,7 +1438,10 @@ export const analyticsRouter = router({
        ctx.prisma.evaluation.findMany({
          where: {
            status: 'SUBMITTED',
-            assignment: { projectId: input.id },
+            assignment: {
+              projectId: input.id,
+              ...(input.roundId ? { roundId: input.roundId } : {}),
+            },
          },
        }),
      ])
@@ -2163,6 +2187,26 @@ export const analyticsRouter = router({
      }
    }),

+  /**
+   * Returns rounds the project has participated in, restricted to those that
+   * are open or already closed. Used by the observer full project page to
+   * resolve a default round when none is specified in the URL.
+   */
+  getProjectRoundsForObserver: observerProcedure
+    .input(z.object({ projectId: z.string() }))
+    .query(async ({ ctx, input }) => {
+      const states = await ctx.prisma.projectRoundState.findMany({
+        where: { projectId: input.projectId },
+        select: {
+          round: { select: { id: true, name: true, status: true, sortOrder: true } },
+        },
+      })
+      return states
+        .map((s) => s.round)
+        .filter((r) => r.status === 'ROUND_ACTIVE' || r.status === 'ROUND_CLOSED')
+        .sort((a, b) => a.sortOrder - b.sortOrder)
+    }),
+
  getRecentFiles: observerProcedure
    .input(z.object({ roundId: z.string(), limit: z.number().min(1).max(50).default(10) }))
    .query(async ({ ctx, input }) => {
--- a/src/server/routers/project.ts
+++ b/src/server/routers/project.ts
@@ -1249,7 +1249,7 @@ export const projectRouter = router({
   * Reduces client-side waterfall by combining project.get + assignment.listByProject + evaluation.getProjectStats.
   */
  getFullDetail: adminProcedure
-    .input(z.object({ id: z.string() }))
+    .input(z.object({ id: z.string(), roundId: z.string().optional() }))
    .query(async ({ ctx, input }) => {
      const [projectRaw, projectTags, assignments, submittedEvaluations] = await Promise.all([
        ctx.prisma.project.findUniqueOrThrow({
@@ -1297,7 +1297,10 @@ export const projectRouter = router({
        ctx.prisma.evaluation.findMany({
          where: {
            status: 'SUBMITTED',
-            assignment: { projectId: input.id },
+            assignment: {
+              projectId: input.id,
+              ...(input.roundId ? { roundId: input.roundId } : {}),
+            },
          },
        }),
      ])
--- a/src/server/routers/ranking.ts
+++ b/src/server/routers/ranking.ts
@@ -537,6 +537,19 @@ export const rankingRouter = router({
        }
      }

-      return { byProject, balanced }
+      // Per-juror grading stats so the side panel can render each juror's
+      // personal baseline and rescaled contribution.
+      const jurorStats: Record<string, { mean: number; stddev: number; count: number }> = {}
+      for (const [userId, s] of balanceCtx.jurorStats.entries()) {
+        jurorStats[userId] = { mean: s.mean, stddev: s.stddev, count: s.count }
+      }
+
+      return {
+        byProject,
+        balanced,
+        jurorStats,
+        overallMean: balanceCtx.overallMean,
+        overallStddev: balanceCtx.overallStddev,
+      }
    }),
 })
--- a/src/server/services/juror-balance.ts
+++ b/src/server/services/juror-balance.ts
@@ -118,3 +118,71 @@ export function computeBalancedProjectScores(

  return results
 }
+
+/**
+ * Per-round balanced rollup: groups points by roundId, computes a balance
+ * context per round, then averages the per-round balanced averages for each
+ * project. Use when surfacing edition-level rankings — never pool z-contexts
+ * across rounds, because a juror's grading profile differs by round type.
+ */
+export type RoundScopedScorePoint = ScorePoint & { roundId: string }
+
+export type EditionRollupResult = {
+  projectId: string
+  rawAverage: number | null
+  balancedAverage: number | null
+  count: number
+  roundCount: number
+}
+
+export function computePerRoundBalanced(
+  points: RoundScopedScorePoint[],
+): Map<string, EditionRollupResult> {
+  const byRound = new Map<string, ScorePoint[]>()
+  for (const p of points) {
+    const arr = byRound.get(p.roundId) ?? []
+    arr.push({ projectId: p.projectId, userId: p.userId, rawScore: p.rawScore })
+    byRound.set(p.roundId, arr)
+  }
+
+  const perRoundResults: Array<Map<string, BalancedProjectResult>> = []
+  for (const roundPoints of byRound.values()) {
+    const ctx = computeBalanceContext(roundPoints)
+    perRoundResults.push(computeBalancedProjectScores(roundPoints, ctx))
+  }
+
+  const accumulator = new Map<
+    string,
+    { rawSum: number; rawCount: number; balancedSum: number; balancedCount: number; count: number; roundCount: number }
+  >()
+  for (const roundMap of perRoundResults) {
+    for (const [projectId, result] of roundMap.entries()) {
+      const acc = accumulator.get(projectId) ?? {
+        rawSum: 0, rawCount: 0, balancedSum: 0, balancedCount: 0, count: 0, roundCount: 0,
+      }
+      if (result.rawAverage != null) {
+        acc.rawSum += result.rawAverage
+        acc.rawCount += 1
+      }
+      if (result.balancedAverage != null) {
+        acc.balancedSum += result.balancedAverage
+        acc.balancedCount += 1
+      }
+      acc.count += result.count
+      acc.roundCount += 1
+      accumulator.set(projectId, acc)
+    }
+  }
+
+  const out = new Map<string, EditionRollupResult>()
+  for (const [projectId, acc] of accumulator.entries()) {
+    out.set(projectId, {
+      projectId,
+      rawAverage: acc.rawCount > 0 ? acc.rawSum / acc.rawCount : null,
+      balancedAverage: acc.balancedCount > 0 ? acc.balancedSum / acc.balancedCount : null,
+      count: acc.count,
+      roundCount: acc.roundCount,
+    })
+  }
+  return out
+}
--- a/src/types/competition-configs.ts
+++ b/src/types/competition-configs.ts
@@ -142,6 +142,11 @@ export const EvaluationConfigSchema = z.object({
    })
    .optional(),

+  // Whether the ranking dashboard ranks projects by juror-balanced (z-normalized)
+  // average. Defaulting to true preserves existing behavior. Toggled per-round
+  // from the dashboard side panel.
+  useBalancedRanking: z.boolean().default(true),
+
  // Ranking (Phase 1)
  rankingEnabled: z.boolean().default(false),
  rankingCriteria: z.string().optional(),
--- a/tests/unit/juror-balance-round-scoping.test.ts
+++ b/tests/unit/juror-balance-round-scoping.test.ts
@@ -0,0 +1,167 @@
+import { afterAll, beforeAll, describe, expect, it } from 'vitest'
+import { prisma, createCaller } from '../setup'
+import {
+  createTestUser, createTestProgram, createTestCompetition, createTestRound,
+  createTestProject, createTestProjectRoundState, createTestAssignment,
+  createTestEvaluation, createTestEvaluationForm, cleanupTestData, uid,
+} from '../helpers'
+import { analyticsRouter } from '../../src/server/routers/analytics'
+import { projectRouter } from '../../src/server/routers/project'
+
+describe('analytics.getProjectDetail round scoping', () => {
+  let programId: string
+  let admin: { id: string; email: string; role: 'SUPER_ADMIN' }
+  let projectId: string
+  let roundAId: string
+  let roundBId: string
+  const userIds: string[] = []
+
+  beforeAll(async () => {
+    const program = await createTestProgram({ name: `bal-scope-${uid()}` })
+    programId = program.id
+    const competition = await createTestCompetition(programId)
+    const roundA = await createTestRound(competition.id, { name: 'Round A', sortOrder: 0, status: 'ROUND_CLOSED' })
+    const roundB = await createTestRound(competition.id, { name: 'Round B', sortOrder: 1, status: 'ROUND_ACTIVE' })
+    roundAId = roundA.id
+    roundBId = roundB.id
+
+    const formA = await createTestEvaluationForm(roundA.id)
+    const formB = await createTestEvaluationForm(roundB.id)
+
+    const project = await createTestProject(programId)
+    projectId = project.id
+    await createTestProjectRoundState(projectId, roundA.id, { state: 'PASSED' })
+    await createTestProjectRoundState(projectId, roundB.id, { state: 'IN_PROGRESS' })
+
+    // 2 evaluations on Round A: 7.0, 8.0  (mean 7.5)
+    for (const score of [7, 8]) {
+      const juror = await createTestUser('JURY_MEMBER')
+      userIds.push(juror.id)
+      const a = await createTestAssignment(juror.id, projectId, roundA.id)
+      await createTestEvaluation(a.id, formA.id, { status: 'SUBMITTED', globalScore: score, submittedAt: new Date() })
+    }
+    // 3 evaluations on Round B: 9.0, 8.0, 8.0  (mean 8.333…)
+    for (const score of [9, 8, 8]) {
+      const juror = await createTestUser('JURY_MEMBER')
+      userIds.push(juror.id)
+      const a = await createTestAssignment(juror.id, projectId, roundB.id)
+      await createTestEvaluation(a.id, formB.id, { status: 'SUBMITTED', globalScore: score, submittedAt: new Date() })
+    }
+
+    const adminUser = await createTestUser('SUPER_ADMIN')
+    userIds.push(adminUser.id)
+    admin = { id: adminUser.id, email: adminUser.email, role: 'SUPER_ADMIN' }
+  })
+
+  afterAll(async () => {
+    await cleanupTestData(programId, userIds)
+  })
+
+  it('returns only round-B stats when roundId=roundB is passed', async () => {
+    const caller = createCaller(analyticsRouter, admin)
+    const result = await caller.getProjectDetail({ id: projectId, roundId: roundBId })
+    expect(result.stats).not.toBeNull()
+    expect(result.stats!.totalEvaluations).toBe(3)
+    expect(result.stats!.averageGlobalScore).toBeCloseTo(8.333, 2)
+  })
+
+  it('returns aggregated stats across all rounds when roundId is omitted', async () => {
+    const caller = createCaller(analyticsRouter, admin)
+    const result = await caller.getProjectDetail({ id: projectId })
+    expect(result.stats!.totalEvaluations).toBe(5)
+  })
+
+  it('project.getFullDetail also scopes stats to roundId when provided', async () => {
+    const caller = createCaller(projectRouter, admin)
+    const scoped = await caller.getFullDetail({ id: projectId, roundId: roundBId })
+    expect(scoped.stats!.totalEvaluations).toBe(3)
+    expect(scoped.stats!.averageGlobalScore).toBeCloseTo(8.333, 2)
+    const aggregate = await caller.getFullDetail({ id: projectId })
+    expect(aggregate.stats!.totalEvaluations).toBe(5)
+  })
+})
+
+describe('analytics.getProjectRankings per-round z-context (edition mode)', () => {
+  let programId: string
+  let admin: { id: string; email: string; role: 'SUPER_ADMIN' }
+  let projectXId: string
+  let projectYId: string
+  const userIds: string[] = []
+
+  beforeAll(async () => {
+    const program = await createTestProgram({ name: `rank-edition-${uid()}` })
+    programId = program.id
+    const competition = await createTestCompetition(programId)
+    const roundA = await createTestRound(competition.id, { name: 'A', sortOrder: 0 })
+    const roundB = await createTestRound(competition.id, { name: 'B', sortOrder: 1 })
+    const formA = await createTestEvaluationForm(roundA.id, [
+      { id: 'c1', label: 'X', scale: '1-10', weight: 1 },
+    ])
+    const formB = await createTestEvaluationForm(roundB.id, [
+      { id: 'c1', label: 'X', scale: '1-10', weight: 1 },
+    ])
+
+    const projX = await createTestProject(programId, { title: 'X' })
+    const projY = await createTestProject(programId, { title: 'Y' })
+    projectXId = projX.id
+    projectYId = projY.id
+    await createTestProjectRoundState(projX.id, roundA.id)
+    await createTestProjectRoundState(projY.id, roundA.id)
+    await createTestProjectRoundState(projX.id, roundB.id)
+    await createTestProjectRoundState(projY.id, roundB.id)
+
+    const lenient = await createTestUser('JURY_MEMBER')
+    const harsh = await createTestUser('JURY_MEMBER')
+    userIds.push(lenient.id, harsh.id)
+
+    const writeEval = async (jurorId: string, projId: string, roundId: string, formId: string, c1: number) => {
+      const a = await createTestAssignment(jurorId, projId, roundId)
+      await prisma.evaluation.create({
+        data: {
+          assignmentId: a.id,
+          formId,
+          status: 'SUBMITTED',
+          submittedAt: new Date(),
+          criterionScoresJson: { c1 },
+        },
+      })
+    }
+
+    // Round A
+    await writeEval(lenient.id, projX.id, roundA.id, formA.id, 9)
+    await writeEval(lenient.id, projY.id, roundA.id, formA.id, 9)
+    await writeEval(harsh.id, projX.id, roundA.id, formA.id, 6)
+    await writeEval(harsh.id, projY.id, roundA.id, formA.id, 4)
+    // Round B (different scoring profile)
+    await writeEval(lenient.id, projX.id, roundB.id, formB.id, 8)
+    await writeEval(lenient.id, projY.id, roundB.id, formB.id, 8)
+    await writeEval(harsh.id, projX.id, roundB.id, formB.id, 7)
+    await writeEval(harsh.id, projY.id, roundB.id, formB.id, 5)
+
+    const adminUser = await createTestUser('SUPER_ADMIN')
+    userIds.push(adminUser.id)
+    admin = { id: adminUser.id, email: adminUser.email, role: 'SUPER_ADMIN' }
+  })
+
+  afterAll(async () => {
+    await cleanupTestData(programId, userIds)
+  })
+
+  it('aggregates per-project balanced score as the mean of per-round balanced averages', async () => {
+    const caller = createCaller(analyticsRouter, admin)
+    const result = await caller.getProjectRankings({ programId })
+    const x = result.find((p: { id: string }) => p.id === projectXId)!
+    const y = result.find((p: { id: string }) => p.id === projectYId)!
+    // Per-round balanced (computed by hand using the algorithm in juror-balance.ts):
+    //   Round A overall mean=7, stddev=√4.5; lenient stddev=0 (fallback), harsh stddev=1
+    //     X balanced ≈ 9.06, Y balanced ≈ 6.94
+    //   Round B overall mean=7, stddev=√1.5; lenient stddev=0 (fallback), harsh stddev=1
+    //     X balanced ≈ 8.11, Y balanced ≈ 6.89
+    // Edition rollup = mean of per-round balanced averages:
+    //   X ≈ 8.59, Y ≈ 6.91
+    expect(x.balancedScore!).toBeCloseTo(8.59, 1)
+    expect(y.balancedScore!).toBeCloseTo(6.91, 1)
+    // Crucially, X must rank above Y after the per-round correction.
+    expect(x.balancedScore!).toBeGreaterThan(y.balancedScore!)
+  })
+})
--- a/tests/unit/round-config-balance-toggle.test.ts
+++ b/tests/unit/round-config-balance-toggle.test.ts
@@ -0,0 +1,37 @@
+import { afterAll, beforeAll, describe, expect, it } from 'vitest'
+import { prisma, createCaller } from '../setup'
+import {
+  createTestUser, createTestProgram, createTestCompetition, createTestRound,
+  cleanupTestData, uid,
+} from '../helpers'
+import { roundRouter } from '../../src/server/routers/round'
+
+describe('Round.configJson.useBalancedRanking', () => {
+  let programId: string
+  let admin: { id: string; email: string; role: 'SUPER_ADMIN' }
+  const userIds: string[] = []
+
+  beforeAll(async () => {
+    const program = await createTestProgram({ name: `bal-toggle-${uid()}` })
+    programId = program.id
+    const adminUser = await createTestUser('SUPER_ADMIN')
+    userIds.push(adminUser.id)
+    admin = { id: adminUser.id, email: adminUser.email, role: 'SUPER_ADMIN' }
+  })
+
+  afterAll(async () => {
+    await cleanupTestData(programId, userIds)
+  })
+
+  it('persists useBalancedRanking via round.update', async () => {
+    const competition = await createTestCompetition(programId)
+    const round = await createTestRound(competition.id)
+    const caller = createCaller(roundRouter, admin)
+    await caller.update({
+      id: round.id,
+      configJson: { useBalancedRanking: false },
+    })
+    const reloaded = await prisma.round.findUniqueOrThrow({ where: { id: round.id } })
+    expect((reloaded.configJson as Record<string, unknown>).useBalancedRanking).toBe(false)
+  })
+})
Author	SHA1	Message	Date
Matt	9db8312b96	feat: render project averages to two decimals All checks were successful Build and Push Docker Image / build (push) Successful in 8m15s Details Project-level averages (Raw + Balanced in the side panel, observer project detail score card, observer preview dialog Avg Score) now show two decimals (e.g. 8.33 instead of 8.0/8.3) so admins can see the actual computed value. Per-juror individual scores keep one decimal — they're submissions, not aggregates. ScorePill gains an optional precision prop so call sites can opt into 2-decimal display where the value is an aggregate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:32:52 +02:00
Matt	3b12078e04	feat: mount score explainer dialog in admin and observer surfaces Adds the 'How scores are calculated' affordance to: - the admin ranking dashboard side panel (next to the Avg Score card) - the observer full project detail page (in the score card) - the observer reports preview dialog (next to Evaluation Summary) so all three audiences can open the same explainer dialog. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:31:29 +02:00
Matt	b4f5189a8e	feat: shared 'How scores are calculated' explainer dialog Reusable component used by admin and observer surfaces. Covers the algorithm, a five-step plain-language walkthrough, a worked example with three jurors of different grading styles, edge cases, and why both Raw and Balanced are always shown. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:25:20 +02:00
Matt	ee68f8af41	feat: side panel shows per-juror baseline and balanced contribution Extends the ranking router's roundEvaluationScores response with per-juror grading stats (mean, stddev, count) plus the round's overall mean/stddev. The side-sheet juror rows render 'typical X.XX → contributes Y.YY' next to each Score badge whenever balanced is on, making the z-rescaling visible per individual rather than only as a project-level number. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:24:34 +02:00
Matt	664a682585	feat: side panel shows raw + balanced averages, list drops delta Removes the per-row '⇢ X.X' annotation from the ranking list — the list view stays clean. The side panel's stats area gains a combined Avg Score card that shows Raw and Balanced side-by-side, with the active one (per the round's toggle) bolded and tagged 'used for ranking'. Pass Rate and Evaluators move below into a 2-col grid. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:23:29 +02:00
Matt	e12f26092a	feat: list sort respects useBalancedRanking toggle The two existing sort sites (initial init + threshold cutoff) now read from the local toggle. A second effect re-sorts the list when the toggle flips, but only when no manual reorder is pinned to the snapshot — persisted manual reorders always win, matching prior behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:22:11 +02:00
Matt	387f84c338	feat: per-round balanced-scoring toggle in side sheet A Switch at the top of the project side panel writes useBalancedRanking onto Round.configJson via the existing round.update mutation. The flip is shared across all viewers because the value lives in the round's persisted config; hydration runs on every roundData refetch so the UI converges quickly when another admin flips it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:20:21 +02:00
Matt	0680a5d601	feat: add useBalancedRanking flag to round config schema Defaults to true so existing rounds preserve current behavior; toggled per-round from the ranking dashboard side panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:19:15 +02:00
Matt	6f3e8885e0	feat: resolve observer project page round default and add selector The observer full project page used to call getProjectDetail without a round, getting cross-round contaminated stats. It now resolves a default — the currently OPEN round the project is in, falling back to the most recently CLOSED one — and renders a selector chip in the score card whenever the project participated in more than one candidate round. Initial selection respects the ?round= query param. A new observer procedure (getProjectRoundsForObserver) returns the project's open or closed rounds for the picker. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:18:36 +02:00
Matt	cfd9dc6afe	fix: scope observer reports preview dialog to selected round Threads the active roundId through ProjectPreviewDialog and its two callers (filtering tabs, expandable juror table). When a round is in scope, the preview's stats card now matches the per-juror list and the page-level round selector. The roundId prop is optional so the component still works in any future caller that lacks round context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:17:18 +02:00
Matt	9a2c10a6f8	fix: scope admin ranking dashboard side-sheet stats to current round The admin dashboard fetches its side-sheet detail from project.getFullDetail (not analytics.getProjectDetail as the audit assumed), and that procedure had the same cross-round contamination bug. Add an optional roundId to its input, filter the SUBMITTED-evaluations query when provided, and pass roundId from the dashboard's useQuery so the Avg Score / Pass Rate / Evaluators card now matches the per-juror list rendered below it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:15:47 +02:00
Matt	97d1f2a3af	fix: compute z-context per-round in edition-mode rankings rollup Previously the edition-level branch of analytics.getProjectRankings (programId mode) pooled every juror's evaluations across every round into a single z-normalization context. A juror's mean and stddev are not stable across round types — quick intake screening produces a very different grading profile than a deep evaluation round, and mixing them yields a meaningless personal calibration. The rollup now groups points by roundId, computes one balance context per round, and aggregates per-project as the unweighted mean of the per-round balanced averages. roundId mode is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:14:30 +02:00
Matt	7147115918	fix: scope analytics.getProjectDetail by optional roundId The procedure pulled every SUBMITTED evaluation for a project across every round it ever participated in, then computed Avg Score / Pass Rate / Evaluators from that pool. Meanwhile the per-juror list rendered in the admin sheet filters to the current round, producing a card that disagreed with the visible list. With roundId in the input, callers opt into round-scoped stats; omitting it preserves the old aggregate behavior for any caller that hasn't been updated yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 13:12:27 +02:00
Matt	260baf3a41	docs: add implementation plan for juror-balance toggle and scoping fixes 15 TDD-style tasks covering the round-scoping bug fixes for getProjectDetail and getProjectRankings, the per-round toggle, the side-panel deeper display, the shared score explainer dialog, and the decimal display audit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:59:36 +02:00
Matt	64e7be2418	docs: add design spec for juror-balance toggle and round-scoping fixes Captures the per-round toggle, side-panel deeper display, "How scores are calculated" explainer dialog, and the cross-round contamination fixes for getProjectDetail and getProjectRankings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:50:32 +02:00