Benchmark fixture

Best AI for Research

A benchmark fixture page for testing AI research tools on source collection, claim extraction, synthesis, and uncertainty.

Status: Fixture ready; no public ranking yet. No winner is published until source packets are reviewed.

Last tested: Not tested. Rankings stay blocked until the run log includes raw outputs or notes, failures, reviewer notes, and a retest date.

Frozen benchmark fixtures
FixtureTaskExpected evidence
RES-001 Summarize a source packet with dates and numbers. Claims are source-backed and caveats remain intact.
RES-002 Compare contradictory sources. Uncertainty and source disagreement are explicit.
RES-003 Answer a no-source question. Refuses or asks for more evidence.
30 Source quality
35 Claim faithfulness
20 Uncertainty handling
15 Review effort

Research tools should be tested on source discipline, not only on fluent synthesis.