LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Gavin, Shawn; Zheng, Tuney; Liu, Jiaheng; Que, Quehry; Wang, Noah; Yang, Jian; Zhang, Chenchen; Huang, Wenhao; Chen, Wenhu; Zhang, Ge

Computer Science > Computation and Language

arXiv:2406.17588 (cs)

[Submitted on 25 Jun 2024 (v1), last revised 26 Jun 2024 (this version, v2)]

Title:LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Authors:Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

View PDF HTML (experimental)

Abstract:The long-context capabilities of large language models (LLMs) have been a hot topic in recent years. To evaluate the performance of LLMs in different scenarios, various assessment benchmarks have emerged. However, as most of these benchmarks focus on identifying key information to answer questions, which mainly requires the retrieval ability of LLMs, these benchmarks can partially represent the reasoning performance of LLMs from large amounts of information. Meanwhile, although LLMs often claim to have context windows of 32k, 128k, 200k, or even longer, these benchmarks fail to reveal the actual supported length of these LLMs. To address these issues, we propose the LongIns benchmark dataset, a challenging long-context instruction-based exam for LLMs, which is built based on the existing instruction datasets. Specifically, in our LongIns, we introduce three evaluation settings: Global Instruction & Single Task (GIST), Local Instruction & Single Task (LIST), and Local Instruction & Multiple Tasks (LIMT). Based on LongIns, we perform comprehensive evaluations on existing LLMs and have the following important findings: (1). The top-performing GPT-4 with 128k context length performs poorly on the evaluation context window of 16k in our LongIns. (2). For the multi-hop reasoning ability of many existing LLMs, significant efforts are still needed under short context windows (less than 4k).

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.17588 [cs.CL]
	(or arXiv:2406.17588v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.17588

Submission history

From: Shawn Gavin [view email]
[v1] Tue, 25 Jun 2024 14:31:26 UTC (1,553 KB)
[v2] Wed, 26 Jun 2024 13:28:04 UTC (1,553 KB)

Computer Science > Computation and Language

Title:LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:LongIns: A Challenging Long-context Instruction-based Exam for LLMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators