Combining Feature and Instance Attribution to Detect Artifacts

Pouya Pezeshkpour, Sarthak Jain, Sameer Singh, Byron Wallace


Abstract
Training the deep neural networks that dominate NLP requires large datasets. These are often collected automatically or via crowdsourcing, and may exhibit systematic biases or annotation artifacts. By the latter we mean spurious correlations between inputs and outputs that do not represent a generally held causal relationship between features and classes; models that exploit such correlations may appear to perform a given task well, but fail on out of sample data. In this paper, we evaluate use of different attribution methods for aiding identification of training data artifacts. We propose new hybrid approaches that combine saliency maps (which highlight important input features) with instance attribution methods (which retrieve training samples influential to a given prediction). We show that this proposed training-feature attribution can be used to efficiently uncover artifacts in training data when a challenging validation set is available. We also carry out a small user study to evaluate whether these methods are useful to NLP researchers in practice, with promising results. We make code for all methods and experiments in this paper available.
Anthology ID:
2022.findings-acl.153
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1934–1946
Language:
URL:
https://aclanthology.org/2022.findings-acl.153
DOI:
10.18653/v1/2022.findings-acl.153
Bibkey:
Cite (ACL):
Pouya Pezeshkpour, Sarthak Jain, Sameer Singh, and Byron Wallace. 2022. Combining Feature and Instance Attribution to Detect Artifacts. In Findings of the Association for Computational Linguistics: ACL 2022, pages 1934–1946, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Combining Feature and Instance Attribution to Detect Artifacts (Pezeshkpour et al., Findings 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.findings-acl.153.pdf
Data
BoolQIMDb Movie ReviewsMultiNLISuperGLUE