QCET: An Interactive Taxonomy of Quality Criteria for Comparable and Repeatable Evaluation of NLP Systems

Anya Belz, Simon Mille, Craig Thomson, Rudali Huidrom


Abstract
Four years on from two papers (Belz et al., 2020; Howcroft et al., 2020) that first called out the lack of standardisation and comparability in the quality criteria assessed in NLP system evaluations, researchers still use widely differing quality criteria names and definitions, meaning that it continues to be unclear when the same aspect of quality is being assessed in two evaluations. While normalised quality criteria were proposed at the time, the list was unwieldy and using it came with a steep learning curve. In this demo paper, our aim is to address these issues with an interactive taxonomy tool that enables quick perusal and selection of the quality criteria, and provides decision support and examples of use at each node.
Anthology ID:
2024.inlg-demos.4
Volume:
Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations
Month:
September
Year:
2024
Address:
Tokyo, Japan
Editors:
Saad Mahamood, Nguyen Le Minh, Daphne Ippolito
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–12
Language:
URL:
https://aclanthology.org/2024.inlg-demos.4/
DOI:
Bibkey:
Cite (ACL):
Anya Belz, Simon Mille, Craig Thomson, and Rudali Huidrom. 2024. QCET: An Interactive Taxonomy of Quality Criteria for Comparable and Repeatable Evaluation of NLP Systems. In Proceedings of the 17th International Natural Language Generation Conference: System Demonstrations, pages 9–12, Tokyo, Japan. Association for Computational Linguistics.
Cite (Informal):
QCET: An Interactive Taxonomy of Quality Criteria for Comparable and Repeatable Evaluation of NLP Systems (Belz et al., INLG 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.inlg-demos.4.pdf