×
As multimodal large language models (MLLMs) continue to demonstrate increasingly competitive performance across a broad spectrum of tasks, more intricate and comprehensive benchmarks have been developed to assess these cutting-edge models.
Oct 9, 2024
Sep 26, 2024 · This paper proposed ING-VP, a new benchmark that can be used to test the zero-shot performance of MLLMs on visual interactive games. They ...
Oct 10, 2024 · We present ING-VP, the first INteractive Game-based Vision Planning benchmark, specifically designed to evaluate the spatial imagination and multi-step ...
We present ING-VP, the first INteractive Game-based Vision Planning benchmark, specifically designed to evaluate the spatial imagination and multi-step ...
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet. from twitter.com
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet :https://t.co/RaVPoGD3S7.
Oct 9, 2024 · The researchers have developed a new benchmark, called ING-VP, to specifically assess the spatial imagination and multi-step reasoning abilities ...
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet. from twitter.com
Nov 2, 2024 · Highlights of ING-VP: ♻️Multimodal interactive environment ♟️Six classic games: Sokoban, Maze, 8-queens, Sudoku, Tower or Hanoi, 15-puzzles ...
ING-VP: MLLMs Cannot Play Easy Vision-based Games Yet. ICLR 2025 Conference ... Legendre-KAN : High Accuracy KA Network Based on Legendre Polynomials.
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet · 1 code implementation ... To bridge this gap, we present ING-VP, the first INteractive Game-based ...
Nov 25, 2024 · We introduce BALROG, a novel benchmark designed to assess the agentic capabilities of LLMs and VLMs through a diverse set of challenging games.