PixT3: Pixel-based Table-To-Text Generation

Iñigo Alonso; Eneko Agirre; Mirella Lapata

doi:10.18653/v1/2024.acl-long.364

PixT3: Pixel-based Table-To-Text Generation

Iñigo Alonso, Eneko Agirre, Mirella Lapata

Abstract

Table-to-text generation involves generating appropriate textual descriptions given structured tabular data. It has attracted increasing attention in recent years thanks to the popularity of neural network models and the availability of large-scale datasets. A common feature across existing methods is their treatment of the input as a string, i.e., by employing linearization techniques that do not always preserve information in the table, are verbose, and lack space efficiency. We propose to rethink data-to-text generation as a visual recognition task, removing the need for rendering the input in a string format. We present PixT3, a multimodal table-to-text model that overcomes the challenges of linearization and input size limitations encountered by existing models. PixT3 is trained with a new self-supervised learning objective to reinforce table structure awareness and is applicable to open-ended and controlled generation settings. Experiments on the ToTTo and Logic2Text benchmarks show that PixT3 is competitive and, in some settings, superior to generators that operate solely on text.

Anthology ID:: 2024.acl-long.364
Volume:: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6721–6736
Language:
URL:: https://aclanthology.org/2024.acl-long.364
DOI:: 10.18653/v1/2024.acl-long.364
Bibkey:
Cite (ACL):: Iñigo Alonso, Eneko Agirre, and Mirella Lapata. 2024. PixT3: Pixel-based Table-To-Text Generation. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6721–6736, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: PixT3: Pixel-based Table-To-Text Generation (Alonso et al., ACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.acl-long.364.pdf

PDF Cite Search