Advisory Benchmark Pilot

Compare a classical baseline with a future hybrid-style case-priority ranking.

This internal pilot stays downstream of the governed reviewer layer. It only ranks de-identified retrospective cases and never writes back into reviewer ownership, GP sign-off, or final case export truth.

Back to reviewer queue Open retrospective dataset Open benchmark view Open benchmark adjudication Export benchmark JSON

Cases analysed

Completion-gated retrospective cases included in this pilot.

Bundle window

500

Recent bundles requested for retrospective evaluation.

Highest priority target

Cases where divergence or a signed-off override should surface first.

Elevated target

Cases with testing-led caution or missing comparable benchmark snapshots.

Median completion time

Not yet calculable

Used here only as a retrospective process-friction signal.

Boundary

Completion-gated accepted-pathway cases are turned into an advisory benchmark set for ranking likely divergence or review-priority signals across a configurable retrospective bundle window. The priority target is expressed through explicit reviewer-defined benchmark labels derived from saved pathway direction, testing prompts, signed-off overrides, partial alignment, and missing comparable snapshots. It is a benchmark label, not a clinical truth label.

Current window: 0 adjudicated cases and 0 derived-label fallback cases.

Evaluate 200 bundles Evaluate 500 bundles Evaluate 1000 bundles

If you want the pilot to prefer explicit reviewer labels over derived workflow semantics, use the benchmark adjudication view for the same retrospective window first.

Retrospective only

The pilot only reads completion-gated retrospective cases and never touches the live reviewer queue.

Advisory only

The output is a ranking and score explanation, not a diagnosis, treatment decision, or workflow action.

Governance preserving

Nothing in this benchmark writes back into reviewer status, GP sign-off, handoff state, or export truth.

De-identified benchmark

The experiment works from the de-identified retrospective/benchmark layer rather than raw intake text or identifiable patient views.

Reviewer-defined benchmark labels

The pilot now uses explicit benchmark labels derived from reviewer and clinician workflow semantics, which makes the target layer easier to inspect and less vague than a single broad priority bucket.

Classical baseline

Ranking view

A deterministic weighted score over explicit governance, uncertainty, and duration signals already captured in the benchmark layer.

This is the baseline comparator and remains fully classical, transparent, and auditable.

Precision@5

nDCG@10

Highest-priority recall@10

Precision@20

nDCG@20

Highest-priority recall@20

Coverage

No completion-gated benchmark cases are available yet.

Future hybrid-style comparator