Inference Index
All benchmarks
Directory · Benchmarks · Multimodal
Multimodal

MMMU

College-level questions that require visual reasoning across diagrams, charts, and scientific figures.

Metric
Percentage
Max score
100
Maintainer
IN.AI Research
Models scored
8

Why it matters

The canonical test for vision-language models. If your use case involves reading diagrams or charts, this number is what you care about.

Known limitations

English-language, academic framing. Real-world image tasks (receipts, UIs, handwriting) aren’t covered.

Model rankings

Full leaderboard →