Inference Index
All intelligence
ResearchInvalid Date · NaNy ago

LMSYS Arena drops live leaderboard for three months of review

Citing ongoing research into gameability, LMSYS is pausing live ELO updates through July pending a methodology refresh.

LMSYS announced a three-month pause on live ELO updates for Chatbot Arena while the team reviews style-bias and prompt-quality concerns raised by outside researchers.

The decision is a significant moment. For two years, Arena ELO has been the single most-watched signal in AI model evaluation. A pause — even a methodological one — creates space for competitors like WildBench, LiveBench, and Artificial Analysis to establish stronger market positions.

We’ll continue to track the last published ELO on the leaderboard with a clear freeze marker, and we’ve begun tracking WildBench as a secondary signal.

Byline

Inference Index