feat: shared 'How scores are calculated' explainer dialog

Reusable component used by admin and observer surfaces. Covers the
algorithm, a five-step plain-language walkthrough, a worked example
with three jurors of different grading styles, edge cases, and why
both Raw and Balanced are always shown.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Matt
2026-04-27 13:25:20 +02:00
parent ee68f8af41
commit b4f5189a8e

View File

@@ -0,0 +1,109 @@
'use client'
import {
Dialog,
DialogContent,
DialogHeader,
DialogTitle,
DialogTrigger,
} from '@/components/ui/dialog'
import { Button } from '@/components/ui/button'
import { Info } from 'lucide-react'
import type { ReactNode } from 'react'
export function ScoreExplainerDialog({ trigger }: { trigger?: ReactNode }) {
return (
<Dialog>
<DialogTrigger asChild>
{trigger ?? (
<Button variant="ghost" size="sm" className="h-7 gap-1 px-2 text-xs">
<Info className="h-3.5 w-3.5" />
How scores are calculated
</Button>
)}
</DialogTrigger>
<DialogContent className="max-w-xl max-h-[85vh] overflow-y-auto">
<DialogHeader>
<DialogTitle>How scores are calculated</DialogTitle>
</DialogHeader>
<div className="space-y-4 text-sm">
<p>
Different jurors have different grading styles. Some grade harshly, some
leniently. Balanced scoring corrects for that so a project isn&apos;t
punished for drawing harsh jurors or rewarded for drawing lenient ones.
</p>
<div>
<h3 className="font-semibold mb-1">How it works</h3>
<ol className="list-decimal pl-5 space-y-1">
<li>For each juror, calculate their personal average and spread across all the projects they scored in this round.</li>
<li>Convert each individual score into &quot;how many standard deviations above or below this juror&apos;s typical&quot; a 6 from a juror who averages 5 reads the same as a 9 from a juror who averages 8.</li>
<li>Average those normalized values across the project&apos;s jurors.</li>
<li>Rescale back onto the same 110 scale using the round&apos;s overall average and spread.</li>
<li>The result is directly comparable to the raw average same scale, but corrected for grading style.</li>
</ol>
</div>
<div>
<h3 className="font-semibold mb-1">Worked example</h3>
<table className="w-full text-xs border-collapse">
<thead>
<tr className="border-b">
<th className="py-1 text-left">Juror</th>
<th className="py-1 text-left">Their typical avg</th>
<th className="py-1 text-left">Score for &quot;Project X&quot;</th>
<th className="py-1 text-left">What that means</th>
</tr>
</thead>
<tbody>
<tr className="border-b">
<td className="py-1">Juror A (lenient)</td>
<td>8.20</td>
<td>9.00</td>
<td>Just above their typical (+0.4σ)</td>
</tr>
<tr className="border-b">
<td className="py-1">Juror B (harsh)</td>
<td>5.80</td>
<td>7.50</td>
<td>Well above their typical (+1.5σ)</td>
</tr>
<tr>
<td className="py-1">Juror C (typical)</td>
<td>7.00</td>
<td>8.00</td>
<td>Slightly above their typical (+0.7σ)</td>
</tr>
</tbody>
</table>
<p className="mt-2 text-xs text-muted-foreground">
Raw average: (9.00 + 7.50 + 8.00) / 3 = <strong>8.17</strong>.
Balanced average rescales each juror&apos;s enthusiasm to the round&apos;s
overall scale and lands at roughly <strong>8.40</strong> Juror B&apos;s
strong endorsement (well above their harsh baseline) carries more weight
than the raw 7.50 suggests.
</p>
</div>
<div>
<h3 className="font-semibold mb-1">When it kicks in</h3>
<ul className="list-disc pl-5 space-y-1">
<li>Needs at least 2 evaluations from the round to compute a juror&apos;s spread; otherwise that juror falls back to the round-wide average.</li>
<li>Needs at least one juror with non-zero spread; if every juror gave identical scores, balanced equals raw.</li>
<li>Computed within a single round only a juror&apos;s grading style in an intake screening doesn&apos;t affect their balance in a deep evaluation.</li>
</ul>
</div>
<div>
<h3 className="font-semibold mb-1">Why we still show &quot;Raw&quot;</h3>
<p>
Both numbers are always shown so you can sanity-check the correction. The
toggle at the top of the side panel decides which one is used for ranking.
</p>
</div>
</div>
</DialogContent>
</Dialog>
)
}