skip to main content
10.1145/3580305.3599337acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

ExplainableFold: Understanding AlphaFold Prediction with Explainable AI

Published: 04 August 2023 Publication History

Abstract

This paper presents ExplainableFold (xFold), which is an Explainable AI framework for protein structure prediction. Despite the success of AI-based methods such as AlphaFold (αFold) in this field, the underlying reasons for their predictions remain unclear due to the black-box nature of deep learning models. To address this, we propose a counterfactual learning framework inspired by biological principles to generate counterfactual explanations for protein structure prediction, enabling a dry-lab experimentation approach. Our experimental results demonstrate the ability of ExplainableFold to generate high-quality explanations for AlphaFold's predictions, providing near-experimental understanding of the effects of amino acids on 3D protein structure. This framework has the potential to facilitate a deeper understanding of protein structures. Source code and data of the ExplainableFold project are available at https://github.com/rutgerswiselab/ExplainableFold.

Supplementary Material

MP4 File (rtfp1134-2min-promo.mp4)
This concise video provides an insightful exploration into the fundamental principles and overall concept of ExplainableFold paper in KDD2023.

References

[1]
Gary K Ackers and Francine R Smith. Effects of site-specific amino acid modifica- tion on protein interactions and biological function. Annual review of biochemistry, 54(1):597--629, 1985.
[2]
Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Qinghui Xia, William Gerecke, Timothy J O'Donnell, Daniel Berenberg, Ian Fisk, Niccolò Zanichelli, Bo Zhang, Arkadiusz Nowaczynski, Bei Wang, Marta M Stepniewska-Dziubinska, Shang Zhang, Adegoke Ojewole, Murat Efe Guney, Stella Biderman, Andrew M Watkins, Stephen Ra, Pablo Ribalta Lorenzo, Lucas Nivon, Brian Weitzner, Yih-En Andrew Ban, Peter K Sorger, Emad Mostaque, Zhao Zhang, Richard Bonneau, and Mohammed AlQuraishi. Openfold: Retraining alphafold2 yields new insights into its learning mechanisms and capacity for generalization. bioRxiv, 2022.
[3]
Mohammed AlQuraishi. Machine learning in protein structure prediction. Current opinion in chemical biology, 65:1--8, 2021.
[4]
James AJ Arpino, Samuel C Reddington, Lisa M Halliwell, Pierre J Rizkallah, and D Dafydd Jones. Random single amino acid deletion sampling unveils structural tolerance and the benefits of helical registry shift on gfp folding and structure. Structure, 22(6):889--898, 2014.
[5]
Matthew J Betts and Robert B Russell. Amino acid properties and consequences of substitutions. Bioinformatics for geneticists, 317:289, 2003.
[6]
Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Occam's razor. Information processing letters, 24(6):377--380, 1987.
[7]
Domenico Bordo and Patrick Argos. Suggestions for ?safe" residue substitutions in site-directed mutagenesis. Journal of molecular biology, 217(4):721--729, 1991.
[8]
Emmanuel J Candes and Terence Tao. Decoding by linear programming. IEEE transactions on information theory, 51(12):4203--4215, 2005.
[9]
Paul Carter. Site-directed mutagenesis. Biochemical Journal, 237(1):1, 1986.
[10]
Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
[11]
Yongwook Choi and Agnes P Chan. Provean web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics, 31(16): 2745--2747, 2015.
[12]
Jürgen Cito, Isil Dillig, Vijayaraghavan Murali, and Satish Chandra. Counter-factual explanations for models of code. In Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, pages 125--134, 2022.
[13]
David R Clemmons. Use of mutagenesis to probe igf-binding protein structure/function relationships. Endocrine reviews, 22(6):800--817, 2001.
[14]
Israel Cohen, Yiteng Huang, Jingdong Chen, Jacob Benesty, Jacob Benesty, Jingdong Chen, Yiteng Huang, and Israel Cohen. Pearson correlation coefficient. Noise reduction in speech processing, pages 1--4, 2009.
[15]
Tal Dagan, Yael Talmor, and Dan Graur. Ratios of radical to conservative amino acid replacement are affected by mutational and compositional factors and may not be indicative of positive darwinian selection. Molecular biology and evolution, 19(7):1022--1025, 2002.
[16]
Ken A Dill and Justin L MacCallum. The protein-folding problem, 50 years on. science, 338(6110):1042--1046, 2012.
[17]
Ken A Dill, S Banu Ozkan, M Scott Shell, and Thomas R Weikl. The protein folding problem. Annual review of biophysics, 37:289, 2008.
[18]
Jon A Doering, Sehan Lee, Kurt Kristiansen, Linn Evenseth, Mace G Barron, Ingebrigt Sylte, and Carlie A LaLone. In silico site-directed mutagenesis informs species-specific predictions of chemical susceptibility derived from the sequence alignment to predict across species susceptibility (seqapass) tool. Toxicological Sciences, 166(1):131--145, 2018.
[19]
Clifford N Dominy and David W Andrews. Site-directed mutagenesis by inverse pcr. In E. coli Plasmid Vectors, pages 209--223. Springer, 2003.
[20]
Robert C Edgar and Serafim Batzoglou. Multiple sequence alignment. Current opinion in structural biology, 16(3):368--373, 2006.
[21]
Martin Egli, Andy Flavell, Anna Marie Pyle, W David Wilson, S Ihtshamul Haq, Ben Luisi, Julie Fisher, Charlie Laughton, Stephanie Allen, and Joachim Engels. Chapter 5.6 Nucleic Acids in Biotechnology. The Royal Society of Chemistry, 2006. ISBN 978-0-85404-654-6.
[22]
Charles J Epstein. Non-randomness of ammo-acid changes in the evolution of homologous proteins. Nature, 215(5099):355--359, 1967.
[23]
Gabriela Flores-Ramírez, Manuel Rivera, Alfredo Morales-Pablos, Joel Osuna, Xavier Soberón, and Paul Gaytán. The effect of amino acid deletions and substitutions in the longest loop of gfp. BMC chemical biology, 7(1):1--10, 2007.
[24]
Anton Glück and Ira G Wool. Analysis by systematic deletion of amino acids of the action of the ribotoxin restrictocin. Biochimica et Biophysica Acta (BBA)-Protein Structure and Molecular Enzymology, 1594(1):115--126, 2002.
[25]
Madelyn Glymour, Judea Pearl, and Nicholas P Jewell. Causal inference in statistics: A primer. John Wiley & Sons, 2016.
[26]
Yash Goyal, Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee. Counterfactual visual explanations. In International Conference on Machine Learning, pages 2376--2384. PMLR, 2019.
[27]
Riccardo Guidotti, Anna Monreale, Fosca Giannotti, Dino Pedreschi, Salvatore Ruggieri, and Franco Turini. Factual and counterfactual explanations for black box decision making. IEEE Intelligent Systems, 34(6):14--23, 2019.
[28]
Haiwei H Guo, Juno Choe, and Lawrence A Loeb. Protein tolerance to random amino acid change. Proceedings of the National Academy of Sciences, 101(25): 9205--9210, 2004.
[29]
Clyde A Hutchison, Sandra Phillips, Marshall H Edgell, Shirley Gillam, Patricia Jahnke, and Michael Smith. Mutagenesis at a specific position in a dna sequence. Journal of Biological Chemistry, 253(18):6551--6560, 1978.
[30]
Andrea Ilari and Carmelinda Savino. Protein structure determination by x-ray crystallography. Bioinformatics, pages 63--87, 2008.
[31]
John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin ?ídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583--589, 2021.
[32]
Gregory J Kato, Frédéric B Piel, Clarice D Reid, Marilyn H Gaston, Kwaku Ohene-Frempong, Lakshmanan Krishnamurti, Wally R Smith, Julie A Panepinto, David J Weatherall, Fernando F Costa, et al. Sickle cell disease. Nature Reviews Disease Primers, 4(1):1--22, 2018.
[33]
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[34]
Orestis Lampridis, Riccardo Guidotti, and Salvatore Ruggieri. Explaining senti-ment classification with synthetic exemplars and counter-exemplars. In International Conference on Discovery Science, pages 357--373. Springer, 2020.
[35]
Zelong Li, Jianchao Ji, and Yongfeng Zhang. From Kepler to Newton: Explainable AI for Science Discovery. In ICML 2022 2nd AI for Science Workshop, 2022.
[36]
Wanyu Lin, Hao Lan, and Baochun Li. Generative causal explanations for graph neural networks. In International Conference on Machine Learning, pages 6666--6679. PMLR, 2021.
[37]
Andrew L Maas, Awni Y Hannun, Andrew Y Ng, et al. Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, volume 30, page 3. Atlanta, Georgia, USA, 2013.
[38]
Dailén G Martínez, Stefan Hüttelmaier, and Jean B Bertoldo. Unveiling druggable pockets by site-specific protein modification: Beyond antibody-drug conjugates. Frontiers in Chemistry, 8:586942, 2020.
[39]
Majid Masso and Iosif I Vaisman. Accurate prediction of stability changes in pro-tein mutants by combining machine learning with structure based computational mutagenesis. Bioinformatics, 24(18):2002--2009, 2008.
[40]
Majid Masso, Zhibin Lu, and Iosif I Vaisman. Computational mutagenesis studies of protein structure-function correlations. Proteins: Structure, Function, and Bioinformatics, 64(1):234--245, 2006.
[41]
Takashi Miyata, Sanzo Miyazawa, and Teruo Yasunaga. Two types of amino acid substitutions in protein evolution. Journal of molecular evolution, 12:219--236, 1979.
[42]
Ken Motohashi. A simple and efficient seamless dna cloning method using slice from escherichia coli laboratory strains and its application to slip site-directed mutagenesis. BMC biotechnology, 15(1):1--9, 2015.
[43]
J Moult, K Fidelis, A Kryshtafovych, T Schwede, and M Topf. Critical assessment of techniques for protein structure prediction, fourteenth round. CASP 14 Abstract Book.
[44]
Gobinda Sarkar and Steve S Sommer. The "megaprimer" method of site-directed mutagenesis. Biotechniques, 8(4):404--407, 1990.
[45]
Andrew W Senior, Richard Evans, John Jumper, James Kirkpatrick, Laurent Sifre, Tim Green, Chongli Qin, Augustin ?ídek, Alexander WR Nelson, Alex Bridgland, et al. Improved protein structure prediction using potentials from deep learning. Nature, 577(7792):706--710, 2020.
[46]
Cristina Sotomayor-Vivas, Enrique Hernández-Lemus, and Rodrigo Dorantes-Gilardi. Linking protein structural and functional change to mutation using amino acid networks. Plos one, 17(1):e0261829, 2022.
[47]
Christopher D Spicer and Benjamin G Davis. Selective chemical protein modification. Nature communications, 5(1):1--14, 2014.
[48]
Peter D Stenson, Matthew Mort, Edward V Ball, Katy Evans, Matthew Hayden, Sally Heywood, Michelle Hussain, Andrew D Phillips, and David N Cooper. The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human genetics, 136:665--677, 2017.
[49]
Romain A Studer, Benoit H Dessailly, and Christine A Orengo. Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochemical journal, 449(3):581--594, 2013.
[50]
David E Szymkowski. Creating the next generation of protein therapeutics through rational drug design. CURRENT OPINION IN DRUG DISCOVERY AND DEVELOPMENT, 8(5):590, 2005.
[51]
Juntao Tan, Shuyuan Xu, Yingqiang Ge, Yunqi Li, Xu Chen, and Yongfeng Zhang. Counterfactual explainable recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 1784--1793, 2021.
[52]
Juntao Tan, Shijie Geng, Zuohui Fu, Yingqiang Ge, Shuyuan Xu, Yunqi Li, and Yongfeng Zhang. Learning and evaluating graph neural network explanations based on counterfactual and factual reasoning. In Proceedings of the ACM Web Conference 2022, pages 1018--1027, 2022.
[53]
Yi Tan, Hongxiang Wu, Tongyao Wei, and Xuechen Li. Chemical protein synthesis: advances, challenges, and outlooks. Journal of the American Chemical Society, 142(48):20288--20298, 2020.
[54]
George Tolkachev, Stephen Mell, Stephan Zdancewic, and Osbert Bastani. Counterfactual explanations for natural language interfaces. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, pages 113--118, 2022.
[55]
Mirko Torrisi, Gianluca Pollastri, and Quan Le. Deep learning methods in protein structure prediction. Computational and Structural Biotechnology Journal, 18: 1301--1310, 2020.
[56]
Tom Vermeire, Dieter Brughmans, Sofie Goethals, Raphael Mazzine Barbossa de Oliveira, and David Martens. Explainable image classification with evidence counterfactual. Pattern Analysis and Applications, pages 1--21, 2022.
[57]
Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations without opening the black box: Automated decisions and the gdpr. Harv. JL & Tech., 31:841, 2017.
[58]
Marco Wiltgen. Structural bioinformatics: From the sequence to structure and function. Current Bioinformatics, 4:54--87, 01 2009. 157489309787158170.
[59]
Jinrui Xu and Yang Zhang. How significant is a protein structure similarity with tm-score= 0.5? Bioinformatics, 26(7):889--895, 2010.
[60]
Lev Y Yampolsky and Arlin Stoltzfus. The exchangeability of amino acids in proteins. Genetics, 170(4):1459--1472, 2005.
[61]
Linyi Yang, Eoin M Kenny, Tin Lok James Ng, Yi Yang, Barry Smyth, and Ruihai Dong. Generating plausible counterfactual explanations for deep transformers in financial text classification. arXiv preprint arXiv:2010.12512, 2020.
[62]
Adam Zemla. Lga: a method for finding 3d similarities in protein structures. Nucleic acids research, 31(13):3370--3374, 2003.
[63]
Jianzhi Zhang. Rates of conservative and radical nonsynonymous nucleotide substitutions in mammalian nuclear genes. Journal of molecular evolution, 50(1): 56--68, 2000.
[64]
Meiling Zhang, David A Case, and Jeffrey W Peng. Propagated perturbations from a peripheral mutation show interactions supporting ww domain thermostability. Structure, 26(11):1474--1485, 2018.
[65]
Yang Zhang and Jeffrey Skolnick. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics, 57(4):702--710, 2004.
[66]
Yang Zhang and Jeffrey Skolnick. Tm-align: a protein structure alignment algorithm based on the tm-score. Nucleic acids research, 33(7):2302--2309, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
August 2023
5996 pages
ISBN:9798400701030
DOI:10.1145/3580305
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. alphafold
  2. counterfactual reasoning
  3. explainable ai
  4. protein structure prediction

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)255
  • Downloads (Last 6 weeks)45
Reflects downloads up to 09 Oct 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media