Research

AI vs. Human Forecasters: Who Actually Wins on Polymarket?

Superforecasters beat the average person by roughly 60% on Brier scores. AI models have access to all published human knowledge. Which produces better prediction market probabilities? The answer is more nuanced than most people expect.

April 5, 20269 min read

1. What the Research Actually Shows

Studies published since 2023 show GPT-4 and comparable frontier models performing at roughly the median superforecaster level on established forecasting tournaments, measured by Brier score. On questions requiring real-time data — active sports markets, breaking news events — models with web search access outperform those without by a substantial margin.

The comparison is not entirely fair to AI. Human superforecasters benefit from a deliberative process that includes updating beliefs in response to new information, discussing with other forecasters, and applying considered judgement about source reliability. Comparing a single LLM output to a trained human's considered forecast conflates output quality with process quality.

2. Where Humans Maintain the Edge

Human superforecasters outperform AI on questions that require insider knowledge, source evaluation, and understanding of institutional dynamics. Questions like who is likely to be appointed to a specific regulatory role, how a particular government tends to respond to financial pressure, or whether a CEO's public statements align with their likely private position — these require judgment that language models have no direct mechanism to apply.

Pattern matching from historical text is not the same as reading the room. Superforecasters with deep domain expertise in specific categories — former intelligence analysts on geopolitical questions, structural biologists on pandemic-related markets — outperform general-purpose AI models on questions within their specialization. The advantage disappears outside it.

3. Where AI Has the Edge

AI models process and weight information from hundreds of sources simultaneously. A human forecaster reading one analyst report per market is working with a fraction of the relevant evidence. An AI agent can incorporate historical base rates, market microstructure, related prediction market prices, analyst forecasts, and real-time data before producing a probability — all in the time it takes a human to open a browser tab.

Consistency is the other clear advantage. A human making 50 predictions in a week exhibits fatigue, anchoring to recent high-profile events, and availability bias. An AI agent applies the same reasoning process to every market regardless of sequencing, recency of related events, or time of day. On large samples of predictions, this consistency produces measurably better calibration than human forecasters working without structured feedback.

4. The Real Answer: Neither Beats a Well-Structured Consensus

Meta-analysis of the Good Judgment Project — Philip Tetlock's landmark forecasting research — shows that aggregating diverse forecasters outperforms any individual, including the best individual superforecaster in a cohort. Diversity of view, not quality of any single view, drives the aggregate's accuracy.

The same principle holds for AI agents. Claude, Gemini, and Grok are trained on different data by different teams with different research priorities. Their disagreements on a given question are not noise — they are signals about genuine uncertainty. When all three independently converge on a probability that diverges from the market price, the convergence is far more meaningful than any single agent's output.

5. What This Means for Prediction Market Traders

Human edge in prediction markets tends to be narrow and thematic. A serious political analyst may have genuine edge on US electoral markets and almost none on cryptocurrency resolution questions. AI consensus distributes edge more evenly across categories because it applies the same information processing to each one, without the specialization bias that concentrates human accuracy in certain domains.

For traders without specialized expertise in a specific market category, AI consensus provides a calibration baseline they could not otherwise access. For those with genuine domain expertise, the question is whether the AI baseline confirms or contradicts their read. The answer to that question is itself informative — and when your read and the AI consensus diverge significantly, one of you is incorporating information the other missed.

Put it into practice

See where the market has it wrong — right now.

Three AI agents scan 30+ active Polymarket markets and surface where consensus diverges from the live price. Your first 3 scans are free. No API keys, no capital at risk.