Inference Index
All benchmarks
Directory · Benchmarks · Knowledge
Knowledge

MMLU-Pro

Tests expert-level knowledge across 57 academic and professional domains. The "Pro" version uses harder questions with more answer choices to better differentiate top models.

Metric
Percentage
Max score
100
Maintainer
TIGER-Lab
Models scored
22

Why it matters

If you need an AI that knows things — from medieval history to molecular biology — MMLU-Pro is the single best test of breadth.

Known limitations

Multiple-choice tests reward pattern matching. High MMLU doesn’t guarantee good reasoning under novel framing.

Model rankings

Full leaderboard →