skip to main content
10.1145/3531146.3533127acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article

Human Interpretation of Saliency-based Explanation Over Text

Published: 20 June 2022 Publication History

Abstract

While a lot of research in explainable AI focuses on producing effective explanations, less work is devoted to the question of how people understand and interpret the explanation. In this work, we focus on this question through a study of saliency-based explanations over textual data. Feature-attribution explanations of text models aim to communicate which parts of the input text were more influential than others towards the model decision. Many current explanation methods, such as gradient-based or Shapley value-based methods, provide measures of importance which are well-understood mathematically. But how does a person receiving the explanation (the explainee) comprehend it? And does their understanding match what the explanation attempted to communicate? We empirically investigate the effect of various factors of the input, the feature-attribution explanation, and visualization procedure, on laypeople’s interpretation of the explanation. We query crowdworkers for their interpretation on tasks in English and German, and fit a GAMM model to their responses considering the factors of interest. We find that people often mis-interpret the explanations: superficial and unrelated factors, such as word length, influence the explainees’ importance assignment despite the explanation communicating importance directly. We then show that some of this distortion can be attenuated: we propose a method to adjust saliencies based on model estimates of over- and under-perception, and explore bar charts as an alternative to heatmap saliency visualization. We find that both approaches can attenuate the distorting effect of specific factors, leading to better-calibrated understanding of the explanation.

References

[1]
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian J. Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity Checks for Saliency Maps. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 9525–9536. https://proceedings.neurips.cc/paper/2018/hash/294a8ed24b1ad22ec2e7efea049b8737-Abstract.html
[2]
Siddhant Arora, Danish Pruthi, Norman M. Sadeh, William W. Cohen, Zachary C. Lipton, and Graham Neubig. 2021. Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations. CoRR abs/2112.09669(2021). arXiv:2112.09669https://arxiv.org/abs/2112.09669
[3]
Leila Arras, Franziska Horn, Grégoire Montavon, Klaus-Robert Müller, and Wojciech Samek. 2017. ” What is relevant in a text document?”: An interpretable machine learning approach. PloS one 12, 8 (2017), e0181142. Publisher: Public Library of Science San Francisco, CA USA.
[4]
Nadia Burkart and Marco F. Huber. 2021. A Survey on the Explainability of Supervised Machine Learning. J. Artif. Intell. Res. 70 (2021), 245–317. https://doi.org/10.1613/jair.1.12228
[5]
Diogo V. Carvalho, Eduardo M. Pereira, and Jaime S. Cardoso. 2019. Machine Learning Interpretability: A Survey on Methods and Metrics. Electronics 8, 8 (Jul 2019), 832. https://doi.org/10.3390/electronics8080832
[6]
Mike Dacey. 2017. Anthropomorphism as Cognitive Bias. Philosophy of Science 84, 5 (2017), 1152–1164. https://doi.org/10.1086/694039
[7]
Marina Danilevsky, Kun Qian, Ranit Aharonov, Yannis Katsis, Ban Kawas, and Prithviraj Sen. 2020. A Survey of the State of Explainable AI for Natural Language Processing. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, AACL/IJCNLP 2020, Suzhou, China, December 4-7, 2020, Kam-Fai Wong, Kevin Knight, and Hua Wu (Eds.). Association for Computational Linguistics, 447–459. https://aclanthology.org/2020.aacl-main.46/
[8]
Kate Darling. 2015. ’Who’s Johnny?’ Anthropomorphic Framing in Human-Robot Interaction, Integration, and Policy. SSRN Electronic Journal (01 2015). https://doi.org/10.2139/ssrn.2588669
[9]
Jonathan Dinu, Jeffrey P. Bigham, and J. Zico Kolter. 2020. Challenging common interpretability assumptions in feature attribution explanations. CoRR abs/2012.02748(2020). https://arxiv.org/abs/2012.02748 arXiv:2012.02748.
[10]
Dagmar Divjak and Harald Baayen. 2017. Ordinal GAMMs: a new window on human ratings. In Each venture, a new beginning: Studies in Honor of Laura A. Janda. Slavica Publishers, 39–56.
[11]
Upol Ehsan, Samir Passi, Q. Vera Liao, Larry Chan, I-Hsiang Lee, Michael J. Muller, and Mark O. Riedl. 2021. The Who in Explainable AI: How AI Background Shapes Perceptions of AI Explanations. CoRR abs/2107.13509(2021). arXiv:2107.13509https://arxiv.org/abs/2107.13509
[12]
Nicholas Epley, Adam Waytz, and John T. Cacioppo. 2007. On Seeing Human: A Three-Factor Theory of Anthropomorphism. Psychological Review 114, 4 (Oct. 2007), 864–886. https://doi.org/10.1037/0033-295X.114.4.864
[13]
J St BT Evans, Julie L Barston, and Paul Pollard. 1983. On the conflict between logic and belief in syllogistic reasoning. Memory & cognition 11, 3 (1983), 295–306. Publisher: Springer.
[14]
Thomas Fel, Rémi Cadène, Mathieu Chalvidal, Matthieu Cord, David Vigouroux, and Thomas Serre. 2021. Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis. CoRR abs/2111.04138(2021). arXiv:2111.04138https://arxiv.org/abs/2111.04138
[15]
Thomas Fel, Julien Colin, Rémi Cadène, and Thomas Serre. 2021. What I Cannot Predict, I Do Not Understand: A Human-Centered Evaluation Framework for Explainability Methods. CoRR abs/2112.04417(2021). arXiv:2112.04417https://arxiv.org/abs/2112.04417
[16]
Shi Feng and Jordan L. Boyd-Graber. 2019. What can AI do for me?: evaluating machine learning interpretations in cooperative play. In Proceedings of the 24th International Conference on Intelligent User Interfaces, IUI 2019, Marina del Ray, CA, USA, March 17-20, 2019, Wai-Tat Fu, Shimei Pan, Oliver Brdiczka, Polo Chau, and Gaelle Calvary (Eds.). ACM, 229–239. https://doi.org/10.1145/3301275.3302265
[17]
Lorenzo Gatti, Marco Guerini, and Marco Turchi. 2016. SentiWords: Deriving a High Precision and High Coverage Lexicon for Sentiment Analysis. IEEE Trans. Affect. Comput. 7, 4 (2016), 409–421. https://doi.org/10.1109/TAFFC.2015.2476456
[18]
Ana Valeria Gonzalez, Anna Rogers, and Anders Søgaard. 2021. On the Interaction of Belief Bias and Explanations. In Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 2930–2942. https://doi.org/10.18653/v1/2021.findings-acl.259
[19]
Christopher Grimsley, Elijah Mayfield, and Julia R.S. Bursten. 2020. Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 1780–1790. https://aclanthology.org/2020.lrec-1.220
[20]
Woodrow Hartzog. 2015. UNFAIR AND DECEPTIVE ROBOTS. Maryland Law Review 74(2015), 785.
[21]
Trevor J Hastie and Robert J Tibshirani. 1990. Generalized additive models. Vol. 43. CRC press.
[22]
Alon Jacovi and Yoav Goldberg. 2020. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 4198–4205. https://doi.org/10.18653/v1/2020.acl-main.386
[23]
Alon Jacovi and Yoav Goldberg. 2021. Aligning Faithful Interpretations with their Social Attribution. Trans. Assoc. Comput. Linguistics 9 (2021), 294–310. https://transacl.org/ojs/index.php/tacl/article/view/2635
[24]
David Kyle Johnson. 2018. Anthropomorphic Bias. John Wiley & Sons, Ltd, Chapter 69, 305–307. https://doi.org/10.1002/9781119165811.ch69 arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119165811.ch69
[25]
Daniel Kahneman and Shane Frederick. 2002. Representativeness Revisited: Attribute Substitution in Intuitive Judgment. In Heuristics and Biases: The Psychology of Intuitive Judgment, Thomas Gilovich, Dale Griffin, and DanielEditors Kahneman (Eds.). Cambridge University Press, 49–81. https://doi.org/10.1017/CBO9780511808098.004
[26]
Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–14. https://doi.org/10.1145/3313831.3376219
[27]
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T. Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. 2019. The (Un)reliability of Saliency Methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, Wojciech Samek, Grégoire Montavon, Andrea Vedaldi, Lars Kai Hansen, and Klaus-Robert Müller (Eds.). Lecture Notes in Computer Science, Vol. 11700. Springer, 267–280. https://doi.org/10.1007/978-3-030-28954-6_14
[28]
Tao Lei, Regina Barzilay, and Tommi S. Jaakkola. 2016. Rationalizing Neural Predictions. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016, Jian Su, Xavier Carreras, and Kevin Duh (Eds.). The Association for Computational Linguistics, 107–117. https://doi.org/10.18653/v1/d16-1011
[29]
Zheyuan Li and Simon N. Wood. 2020. Faster model matrix crossproducts for large generalized linear models with discretized covariates. Stat. Comput. 30, 1 (2020), 19–25. https://doi.org/10.1007/s11222-019-09864-2
[30]
Andreas Madsen, Nicholas Meade, Vaibhav Adlakha, and Siva Reddy. 2021. Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining. CoRR abs/2110.08412(2021). https://arxiv.org/abs/2110.08412 arXiv:2110.08412.
[31]
Andreas Madsen, Siva Reddy, and Sarath Chandar. 2021. Post-hoc Interpretability for Neural NLP: A Survey. CoRR abs/2108.04840(2021). arXiv:2108.04840https://arxiv.org/abs/2108.04840
[32]
Bertram F. Malle. 2003. Folk Theory of Mind: Conceptual Foundations of Social Cognition. http://cogprints.org/3315/
[33]
Zahia Marzouk. 2018. Text Marking: A Metacognitive Perspective.
[34]
Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 267(2019), 1–38. https://doi.org/10.1016/j.artint.2018.07.007
[35]
Manisha Natarajan and Matthew Gombolay. 2020. Effects of Anthropomorphism and Accountability on Trust in Human Robot Interaction. In Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction(Cambridge, United Kingdom) (HRI ’20). Association for Computing Machinery, New York, NY, USA, 33–42. https://doi.org/10.1145/3319502.3374839
[36]
John Ashworth Nelder and Robert WM Wedderburn. 1972. Generalized linear models. Journal of the Royal Statistical Society: Series A (General) 135, 3(1972), 370–384. Publisher: Wiley Online Library.
[37]
Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016, Balaji Krishnapuram, Mohak Shah, Alexander J. Smola, Charu C. Aggarwal, Dou Shen, and Rajeev Rastogi (Eds.). ACM, 1135–1144. https://doi.org/10.1145/2939672.2939778
[38]
Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, and Francisco Guzmán. 2021. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021, Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty (Eds.). Association for Computational Linguistics, 1351–1361. https://aclanthology.org/2021.eacl-main.115/
[39]
Natalia Silveira, Timothy Dozat, Marie-Catherine de Marneffe, Samuel Bowman, Miriam Connor, John Bauer, and Christopher D. Manning. 2014. A Gold Standard Dependency Corpus for English. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014).
[40]
Alvy Ray Smith. 1978. Color gamut transform pairs. ACM Siggraph Computer Graphics 12, 3 (1978), 12–19. Publisher: ACM New York, NY, USA.
[41]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Doina Precup and Yee Whye Teh (Eds.). PMLR, 3319–3328. http://proceedings.mlr.press/v70/sundararajan17a.html
[42]
Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, and Ann Yuan. 2020. The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models., 107–118 pages. https://www.aclweb.org/anthology/2020.emnlp-demos.15
[43]
Erico Tjoa and Cuntai Guan. 2021. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI. IEEE transactions on neural networks and learning systems 32, 11 (November 2021), 4793—4813. https://doi.org/10.1109/tnnls.2020.3027314
[44]
Po-He Tseng, Ran Carmi, Ian GM Cameron, Douglas P Munoz, and Laurent Itti. 2009. Quantifying center bias of observers in free viewing of dynamic natural scenes. Journal of vision 9, 7 (2009), 4–4. Publisher: The Association for Research in Vision and Ophthalmology.
[45]
David Tuckey, Krysia Broda, and Alessandra Russo. 2019. Saliency Maps Generation for Automatic Text Summarization. CoRR abs/1907.05664(2019). http://arxiv.org/abs/1907.05664 arXiv:1907.05664.
[46]
Junlin Wang, Jens Tuyls, Eric Wallace, and Sameer Singh. 2020. Gradient-based Analysis of NLP Models is Manipulable. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020(Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 247–258. https://doi.org/10.18653/v1/2020.findings-emnlp.24
[47]
David Watson. 2020. The Rhetoric and Reality of Anthropomorphism in Artificial Intelligence. 45–65. https://doi.org/10.1007/978-3-030-29145-7_4
[48]
Sarah Wiegreffe and Yuval Pinter. 2019. Attention is not not Explanation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 11–20. https://doi.org/10.18653/v1/D19-1002
[49]
S.N. Wood, N., Pya, and B. S”afken. 2016. Smoothing parameter and model selection for general smooth models (with discussion). J. Amer. Statist. Assoc. 111 (2016), 1548–1575.
[50]
S. N. Wood. 2003. Thin-plate regression splines. Journal of the Royal Statistical Society (B) 65, 1 (2003), 95–114.
[51]
S. N. Wood. 2004. Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99, 467 (2004), 673–686.
[52]
S. N. Wood. 2011. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73, 1 (2011), 3–36.
[53]
Simon N Wood. 2017. Generalized additive models: an introduction with R. CRC press.
[54]
Simon N Wood, Zheyuan Li, Gavin Shaddick, and Nicole H Augustin. 2017. Generalized additive models for gigadata: modeling the UK black smoke network daily data. J. Amer. Statist. Assoc. 112, 519 (2017), 1199–1210. Publisher: Taylor & Francis.
[55]
Yu Zhang, Peter Tiño, Ales Leonardis, and Ke Tang. 2021. A Survey on Neural Network Interpretability. IEEE Trans. Emerg. Top. Comput. Intell. 5, 5 (2021), 726–742. https://doi.org/10.1109/TETCI.2021.3100641
[56]
Jakub Zlotowski, Diane Proudfoot, Kumar Yogeeswaran, and Christoph Bartneck. 2015. Anthropomorphism: Opportunities and Challenges in Human–Robot Interaction. International Journal of Social Robotics 7 (2015), 347–360.

Cited By

View all

Index Terms

  1. Human Interpretation of Saliency-based Explanation Over Text
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
      June 2022
      2351 pages
      ISBN:9781450393522
      DOI:10.1145/3531146
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 20 June 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. cognitive bias
      2. explainability
      3. feature attribution
      4. generalized additive mixed model
      5. human
      6. interpretability
      7. perception
      8. saliency
      9. text

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • European Research Council (ERC) under the European Union?s Horizon 2020 research and innovation programme
      • Carl Zeiss Foundation

      Conference

      FAccT '22
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)119
      • Downloads (Last 6 weeks)8
      Reflects downloads up to 30 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media