×
Jun 3, 2024 · This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning- ...
Nov 13, 2024 · This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning- ...
We introduce MMLU-Pro, an enhanced benchmark designed to evaluate language understanding models across broader and more challenging tasks.
Jun 3, 2024 · This paper introduces MMLU-Pro, an enhanced dataset designed to extend the mostly knowledge-driven MMLU benchmark by integrating more challenging, reasoning- ...
People also ask
The MMLU-Pro dataset is an enhanced version of the Massive Multitask Language Understanding (MMLU) benchmark. It's designed to be more robust and challenging.
Jun 3, 2024 · This paper introduces MMMU-Pro, a robust version of the Massive Multi-discipline Multimodal Understanding and Reasoning (MMMU) benchmark.
Oct 7, 2024 · The paper presents a new benchmark called MMLU-Pro, which is designed to more thoroughly test the language understanding capabilities of AI models.
Jun 4, 2024 · Today's paper introduces MMLU-Pro, an enhanced and more challenging benchmark for evaluating multi-task language understanding capabilities of large language ...
Jul 3, 2024 · The original MMLU was more of a knowledge & reasoning test. Sure it had math components, but was formulated to be mostly answerable without ...
Missing: Robust Challenging Task
Jun 4, 2024 · Our MMLU-Pro paper is out. It's a more difficult, robust and reasoning-driven benchmark to measure expert-level intelligence.