OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation.

AllImages Books Videos Maps News Shopping

OfficeBench: Benchmarking Language Agents across Multiple ...

Jul 26, 2024 · We introduce OfficeBench, one of the first office automation benchmarks for evaluating current LLM agents' capability to address office tasks in realistic ...

[PDF] OFFICEBENCH: Benchmarking Language Agents across Multiple ...

openreview.net › pdf

May 17, 2024 · We as- sess the ability of language agents to perform complex office workflows across multiple applications using cus- tomized evaluation ...

OfficeBench: Benchmarking Language Agents across Multiple ...

github.com › zlwang-cs › OfficeBench

OfficeBench is one of the first office automation benchmarks for language agents. We assess the ability of language agents to perform complex office workflows ...

OfficeBench: Benchmarking Language Agents across Multiple ...

arxiv.org › html

The OfficeBench benchmark operates within a Docker environment pre-installed with office applications such as Word, Excel, calendar, and email clients to ...

OfficeBench: Benchmarking Language Agents across Multiple ...

www.aimodels.fyi › papers › arxiv › offi...

Jul 29, 2024 · The OfficeBench paper presents a comprehensive and well-designed benchmark for evaluating language agents in office automation scenarios.

news | Zilong Wang

zilongwang.me › news

OFFICEBENCH: Benchmarking Language Agents across Multiple Applications for Office Automation New paper alert! Check our latest LLM agent benchmark on the office ...

Zilong Wang

zilongwang.me

OFFICEBENCH: Benchmarking Language Agents across Multiple Applications for Office Automation New paper alert! Check our latest LLM agent benchmark on the office ...

Li Zhong's research works - ResearchGate

www.researchgate.net › Li-Zhong-22772...

We introduce OfficeBench, one of the first office automation benchmarks for evaluating current LLM agents' capability to address office tasks in realistic ...

Bill Yuchen Lin | Papers With Code

paperswithcode.com › author › bill-yuch...

For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs such as GPT-4-turbo.

Da Yin - UCLA

wadeyin9712.github.io

OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation. Zilong Wang, Yuedong Cui, Li Zhong, Zimin Zhang, Da Yin, Bill ...