MMLU-Pro

Name: Massive Multitask Language Understanding — Professional
Creator: TIGER-Lab

Tests expert-level knowledge across 57 academic and professional domains. The "Pro" version uses harder questions with more answer choices to better differentiate top models.

Metric

Percentage

Max score

100

Maintainer

TIGER-Lab

Models scored

Why it matters

If you need an AI that knows things — from medieval history to molecular biology — MMLU-Pro is the single best test of breadth.

Known limitations

Multiple-choice tests reward pattern matching. High MMLU doesn’t guarantee good reasoning under novel framing.

Model rankings

Full leaderboard →