Oct 4, 2023 · We propose a dual-system for multi-step multimodal reasoning, which consists of a "System-1" step for visual information extraction and a "System-2" step for ...
We propose a dual-system for multi-step visual language reasoning called DOMINO which outperforms existing models on challenging chart question answering ...
Visual language reasoning requires a system to extract text or numbers from information-dense images like charts or plots and perform logical or arithmetic.
DOMINO: A Dual-System for Multi-step Visual Language Reasoning ... A dual-system for answering a complex question over a chart step-by-step with LLM and a vision ...
DOMINO: A Dual-System for Multi-step Visual Language Reasoning. P Wang, O Golovneva, A Aghajanyan, X Ren, M Chen, A Celikyilmaz, ... arXiv preprint arXiv: ...
DOMINO: A Dual-System for Multi-step Visual Language Reasoning Peifang Wang, Olga Golovneva, Armen Aghajanyan, Xiang Ren, Muhao Chen, Asli Celikyilmaz ...
为了解决这一任务,现有工作依赖于(1)利用大量数据训练的端到端视觉语言模型,或者(2)将captioning模型转换为文本,由另一个大型语言模型进一步阅读以推断 ...
Abstract—Current work on using visual analytics to determine causal relations among variables has mostly been based on the concept of counterfactuals.
DOMINO: A Dual-System for Multi-step Visual Language Reasoning Peifang Wang, Olga Golovneva, Armen Aghajanyan, Xiang Ren, Muhao Chen, Asli Celikyilmaz ...