←All benchmarks
Directory · Benchmarks · Arena
ArenaWildBench
Evaluates models on real-world user queries scraped from chatbots, then graded pairwise.
Metric
Elo
Max score
ELO
Maintainer
Allen Institute for AI
Models scored
13
Why it matters
A cheaper, faster arena-style signal than LMSYS. Uses real prompts from production, judged by strong referees.