Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Han, Sungwon; Yoon, Jinsung; Arik, Sercan O; Pfister, Tomas

Computer Science > Machine Learning

arXiv:2404.09491 (cs)

[Submitted on 15 Apr 2024 (v1), last revised 6 May 2024 (this version, v2)]

Title:Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Authors:Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. The generated features are used to infer class likelihood with a simple downstream machine learning model, such as linear regression and yields high performance few-shot learning. The proposed FeatLLM framework only uses this simple predictive model with the discovered features at inference time. Compared to existing LLM-based approaches, FeatLLM eliminates the need to send queries to the LLM for each sample at inference time. Moreover, it merely requires API-level access to LLMs, and overcomes prompt size limitations. As demonstrated across numerous tabular datasets from a wide range of domains, FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.

Comments:	Accepted to ICML, 2024
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2404.09491 [cs.LG]
	(or arXiv:2404.09491v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2404.09491

Submission history

From: Sungwon Han [view email]
[v1] Mon, 15 Apr 2024 06:26:08 UTC (647 KB)
[v2] Mon, 6 May 2024 08:00:00 UTC (647 KB)

Computer Science > Machine Learning

Title:Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators