NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Wang, Cunxiang; Ning, Ruoxi; Pan, Boqi; Wu, Tonghui; Guo, Qipeng; Deng, Cheng; Bao, Guangsheng; Hu, Xiangkun; Zhang, Zheng; Wang, Qian; Zhang, Yue

Computer Science > Computation and Language

arXiv:2403.12766 (cs)

[Submitted on 18 Mar 2024 (v1), last revised 17 Jun 2024 (this version, v2)]

Title:NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Authors:Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Xiangkun Hu, Zheng Zhang, Qian Wang, Yue Zhang

View PDF HTML (experimental)

Abstract:The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to test the capabilities of LLMs with extended texts. Constructed from English novels, NovelQA offers a unique blend of complexity, length, and narrative coherence, making it an ideal tool for assessing deep textual understanding in LLMs. This paper presents the design and construction of NovelQA, highlighting its manual annotation, and diverse question types. Our evaluation of Long-context LLMs on NovelQA reveals significant insights into the models' performance, particularly emphasizing the challenges they face with multi-hop reasoning, detail-oriented questions, and extremely long input with an average length more than 200,000 tokens. The results underscore the necessity for further advancements in LLMs to improve their long-context comprehension.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2403.12766 [cs.CL]
	(or arXiv:2403.12766v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2403.12766

Submission history

From: Cunxiang Wang [view email]
[v1] Mon, 18 Mar 2024 17:32:32 UTC (12,471 KB)
[v2] Mon, 17 Jun 2024 13:53:15 UTC (9,026 KB)

Computer Science > Computation and Language

Title:NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators