Inference Index

←All benchmarks

Directory · Benchmarks · Arena

Arena

WildBench

Evaluates models on real-world user queries scraped from chatbots, then graded pairwise.

Metric

Elo

Max score

ELO

Maintainer

Allen Institute for AI

Models scored

13

Why it matters

A cheaper, faster arena-style signal than LMSYS. Uses real prompts from production, judged by strong referees.

Model rankings

Full leaderboard →