←All benchmarks
Directory · Benchmarks · Multimodal
MultimodalMMMU
College-level questions that require visual reasoning across diagrams, charts, and scientific figures.
Metric
Percentage
Max score
100
Maintainer
IN.AI Research
Models scored
8
Why it matters
The canonical test for vision-language models. If your use case involves reading diagrams or charts, this number is what you care about.
Known limitations
English-language, academic framing. Real-world image tasks (receipts, UIs, handwriting) aren’t covered.