Inference Index
All benchmarks
Directory · Benchmarks · Arena
Arena

WildBench

Evaluates models on real-world user queries scraped from chatbots, then graded pairwise.

Metric
Elo
Max score
ELO
Maintainer
Allen Institute for AI
Models scored
13