Retrieval-Augmented Code Generation for Universal Information Extraction

Guo, Yucan; Li, Zixuan; Jin, Xiaolong; Liu, Yantao; Zeng, Yutao; Liu, Wenxuan; Li, Xiang; Yang, Pan; Bai, Long; Guo, Jiafeng; Cheng, Xueqi

Computer Science > Artificial Intelligence

arXiv:2311.02962 (cs)

[Submitted on 6 Nov 2023]

Title:Retrieval-Augmented Code Generation for Universal Information Extraction

Authors:Yucan Guo, Zixuan Li, Xiaolong Jin, Yantao Liu, Yutao Zeng, Wenxuan Liu, Xiang Li, Pan Yang, Long Bai, Jiafeng Guo, Xueqi Cheng

View PDF

Abstract:Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code, as a typical kind of formalized language, is capable of describing structural knowledge under various schemas in a universal way. On the other hand, Large Language Models (LLMs) trained on both codes and texts have demonstrated powerful capabilities of transforming texts into codes, which provides a feasible solution to IE tasks. Therefore, in this paper, we propose a universal retrieval-augmented code generation framework based on LLMs, called Code4UIE, for IE tasks. Specifically, Code4UIE adopts Python classes to define task-specific schemas of various structural knowledge in a universal way. By so doing, extracting knowledge under these schemas can be transformed into generating codes that instantiate the predefined Python classes with the information in texts. To generate these codes more precisely, Code4UIE adopts the in-context learning mechanism to instruct LLMs with examples. In order to obtain appropriate examples for different tasks, Code4UIE explores several example retrieval strategies, which can retrieve examples semantically similar to the given texts. Extensive experiments on five representative IE tasks across nine datasets demonstrate the effectiveness of the Code4UIE framework.

Subjects:	Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:2311.02962 [cs.AI]
	(or arXiv:2311.02962v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2311.02962

Submission history

From: Zixuan Li [view email]
[v1] Mon, 6 Nov 2023 09:03:21 UTC (8,289 KB)

Computer Science > Artificial Intelligence

Title:Retrieval-Augmented Code Generation for Universal Information Extraction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Retrieval-Augmented Code Generation for Universal Information Extraction

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators