Benchmark fixture

Best AI for Research

A benchmark fixture page for testing AI research tools on source collection, claim extraction, synthesis, and uncertainty.

Status: Fixture ready; no public ranking yet. No winner is published until source packets are reviewed.

Last tested: Not tested. Rankings stay blocked until the run log includes raw outputs or notes, failures, reviewer notes, and a retest date.

Frozen benchmark fixtures
Fixture	Task	Expected evidence
RES-001	Summarize a source packet with dates and numbers.	Claims are source-backed and caveats remain intact.
RES-002	Compare contradictory sources.	Uncertainty and source disagreement are explicit.
RES-003	Answer a no-source question.	Refuses or asks for more evidence.

30 Source quality

35 Claim faithfulness

20 Uncertainty handling

15 Review effort

Research tools should be tested on source discipline, not only on fluent synthesis.

Related content