BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Koncel-Kedziorski, Rik; Krumdick, Michael; Lai, Viet; Reddy, Varshini; Lovering, Charles; Tanner, Chris

Computer Science > Computation and Language

arXiv:2311.06602 (cs)

[Submitted on 11 Nov 2023 (v1), last revised 12 Mar 2024 (this version, v2)]

Title:BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Authors:Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner

View PDF

Abstract:Answering questions within business and finance requires reasoning, precision, and a wide-breadth of technical knowledge. Together, these requirements make this domain difficult for large language models (LLMs). We introduce BizBench, a benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises eight quantitative reasoning tasks, focusing on question-answering (QA) over financial data via program synthesis. We include three financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate the reasoning capabilities required for financial QA: reading comprehension of financial text and tables for extracting intermediate values, and understanding financial concepts and formulas needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to parse financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, comparing and contrasting the behavior of code-focused and language-focused models. We demonstrate that the current bottleneck in performance is due to LLMs' limited business and financial understanding, highlighting the value of a challenging benchmark for quantitative reasoning within this domain.

Comments:	Work in progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.06602 [cs.CL]
	(or arXiv:2311.06602v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.06602

Submission history

From: Viet Lai [view email]
[v1] Sat, 11 Nov 2023 16:16:11 UTC (8,764 KB)
[v2] Tue, 12 Mar 2024 16:54:57 UTC (8,825 KB)

Computer Science > Computation and Language

Title:BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators