LMSYS Chatbot Arena

Name: LMSYS Chatbot Arena ELO
Creator: LMSYS / UC Berkeley

Head-to-head blind comparisons by real users. The closest thing to a "real-world" ranking — models are judged on actual conversations.

Metric

Elo

Max score

ELO

Maintainer

LMSYS / UC Berkeley

Models scored

Why it matters

The least gameable benchmark. Humans vote, anonymously, on pairwise matchups. If users prefer model A over model B, that’s hard to fake.

Measures preference, not capability. Charm, verbosity, and confidence can inflate scores; deep reasoning doesn’t always help.