skip to main content
research-article

Documentation Matters: Human-Centered AI System to Assist Data Science Code Documentation in Computational Notebooks

Published: 16 January 2022 Publication History

Abstract

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.

References

[1]
Rajas Agashe, Srinivasan Iyer, and Luke Zettlemoyer. 2019. Juice: A large scale distantly supervised dataset for open domain context-based code generation. arXiv:1910.02216. Retrieved from https://arxiv.org/abs/1910.02216.
[2]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv:1808.01400. Retrieved from https://arxiv.org/abs/1808.01400.
[3]
Matthew Arnold, Rachel K. E. Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, Aleksandra Mojsilovic̀, Ravi Nair, K. Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, D. Reimer, J. Richards, J. Tsay, and K. R. Varshney. 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6–1.
[4]
Liang Bai and Yanli Hu. 2018. Problem-driven teaching activities for the capstone project course of data science. In Proceedings of the ACM Turing Celebration Conference-China. 130–131.
[5]
Carrie J. Cai, Emily Reif, Narayan Hegde, Jason Hipp, Been Kim, Daniel Smilkov, Martin Wattenberg, Fernanda Viegas, Greg S. Corrado, Martin C. Stumpe, and Michael A Terry. 2019. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–14.
[6]
Souti Chattopadhyay, Ishita Prasad, Austin Z. Henley, Anita Sarma, and Titus Barik. 2020. What’s wrong with computational notebooks? Pain points, needs, and design opportunities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
[7]
Sergio Cozzetti B. de Souza, Nicolas Anquetil, and Káthia M. de Oliveira. 2005. A study of the documentation essential to software maintenance. In Proceedings of the 23rd Annual International Conference on Design of Communication: Documenting & Designing for Pervasive Information. 68–75.
[8]
Joan Morris DiMicco, Werner Geyer, David R. Millen, Casey Dugan, and Beth Brownholtz. 2009. People sensemaking and relationship building on an enterprise social network site. In Proceedings of the 42nd Hawaii International Conference on System Sciences. IEEE, 1–10.
[9]
Jaimie Drozdal, Justin Weisz, Dakuo Wang, Gaurav Dass, Bingsheng Yao, Changruo Zhao, Michael Muller, Lin Ju, and Hui Su. 2020. Trust in automl: Exploring information needs for establishing trust in automated machine learning systems. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 297–307.
[10]
Brian P. Eddy, Jeffrey A. Robinson, Nicholas A. Kraft, and Jeffrey C. Carver. 2013. Evaluating source code summarization techniques: Replication and expansion. In Proceedings of the 21st International Conference on Program Comprehension. IEEE, 13–22.
[11]
Jesus Fernandez-Bes, Jerónimo Arenas-García, and Jesús Cid-Sueiro. 2016. Energy generation prediction: Lessons learned from the use of kaggle in machine learning course. Group 7, 8 (2016), 9.
[12]
Golara Garousi, Vahid Garousi, Mahmoud Moussavi, Guenther Ruhe, and Brian Smith. 2013. Evaluating usage and quality of technical software documentation: An empirical study. In Proceedings of the 17th International Conference on Evaluation and Assessment in Software Engineering. 24–35.
[13]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv:1803.09010. Retrieved from https://arxiv.org/abs/1803.09010.
[14]
R. Stuart Geiger, Nelle Varoquaux, Charlotte Mazel-Cabasse, and Chris Holdgraf. 2018. The types, roles, and practices of documentation in data analytics open source software libraries. Computer Supported Cooperative Work 27, 3–6 (2018), 767–802.
[15]
Andrew Head, Fred Hohman, Titus Barik, Steven M. Drucker, and Robert DeLine. 2019. Managing messes in computational notebooks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk). ACM, New York, NY, Article 270, 12 pages. DOI:https://doi.org/10.1145/3290605.3300500
[16]
Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv:1805.03677. Retrieved from https://arxiv.org/abs/1805.03677.
[17]
Amber Horvath, Mariann Nagy, Finn Voichick, Mary Beth Kery, and Brad A. Myers. 2019. Methods for investigating mental models for learners of APIs. In Proceedings of the Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. 1–6.
[18]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 159–166.
[19]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 2018 IEEE/ACM 26th International Conference on Program Comprehension. IEEE, 200–20010.
[20]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, 2073–2083. DOI:https://doi.org/10.18653/v1/P16-1195
[21]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2073–2083.
[22]
Project Jupyter. 2016. JupyterLab: The Next Generation of the Jupyter Notebook. Retrieved 01 September, 2021 from https://blog.jupyter.org/jupyterlab-the-next-generation-of-the-jupyter-notebook-5c949dabea3.
[23]
Project Jupyter. 2015. Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science. Retrieved September 15, 2019 from Retrieved from https://blog.jupyter.org/project-jupyter-computational-narratives-as-the-engine-of-collaborative-data-science-2b5fb94c3c58.
[24]
Mira Kajko-Mattsson. 2005. A survey of documentation practice within corrective maintenance. Empirical Software Engineering 10, 1 (2005), 31–55.
[25]
Mary Beth Kery, Bonnie E. John, Patrick O’Flaherty, Amber Horvath, and Brad A. Myers. 2019. Towards effective foraging by data scientists to find past analysis choices. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, Article 92, 13 pages. DOI:https://doi.org/10.1145/3290605.3300322
[26]
Mary Beth Kery and Brad A. Myers. 2017. Exploring exploratory programming. In Proceedings of the 2017 IEEE Symposium on Visual Languages and Human-Centric Computing. 25–29. DOI:https://doi.org/10.1109/VLHCC.2017.8103446
[27]
Mary Beth Kery, Marissa Radensky, Mahima Arya, Bonnie E. John, and Brad A. Myers. 2018. The story in the notebook: Exploratory data science using a literate programming tool. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, Article 174, 11 pages. DOI:https://doi.org/10.1145/3173574.3173748
[28]
Donald E. Knuth. 1984. Literate programming. The Computer Journal 27, 2 (1984), 97–111. DOI:https://doi.org/10.1093/comjnl/27.2.97
[29]
Markus Konkol, Daniel Nüst, and Laura Goulier. 2020. Publishing computational research–a review of infrastructures for reproducible and transparent scholarly communication. arXiv:2001.00484. Retrieved from https://arxiv.org/abs/2001.00484.
[30]
Sean Kross and Philip J. Guo. 2021. Orienting, framing, bridging, magic, and counseling: How data scientists navigate the outer loop of client collaborations in industry and academia. arXiv:2105.05849. Retrieved from https://arxiv.org/abs/2105.05849.
[31]
Sam Lau, Ian Drosos, Julia M. Markel, and Philip J. Guo. 2020. The design space of computational notebooks: An analysis of 60 systems in academia and industry. In Proceedings of the 2020 IEEE Symposium on Visual Languages and Human-Centric Computing. IEEE, 1–11.
[32]
Alexander LeClair, Sakib Haque, Linfgei Wu, and Collin McMillan. 2020. Improved code summarization via a graph neural network. arXiv:2004.02843. Retrieved from https://arxiv.org/abs/2004.02843.
[33]
Jiali Liu, Nadia Boukhelifa, and James R. Eagan. 2019. Understanding the role of alternatives in data analysis practices. IEEE Transactions on Visualization and Computer Graphics 26, 1 (2019), 66–76.
[34]
Sijia Liu, Parikshit Ram, Deepak Vijaykeerthy, Djallel Bouneffouf, Gregory Bramble, Horst Samulowitz, Dakuo Wang, Andrew Conn, and Alexander Gray. 2020. An ADMM based framework for automl pipeline configuration. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4892–4899.
[35]
Xuye Liu, Dakuo Wang, April Yi Wang, Yufang Hou, and Lingfei Wu. 2021. HAConvGNN: Hierarchical attention based convolutional graph neural network for code documentation generation in jupyter notebooks. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Findings.
[36]
Ryan Louie, Andy Coenen, Cheng Zhi Huang, Michael Terry, and Carrie J. Cai. 2020. Novice-AI music co-creation via ai-steering tools for deep generative models. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
[37]
Julia S. Stewart Lowndes, Benjamin D. Best, Courtney Scarborough, Jamie C. Afflerbach, Melanie R. Frazier, Casey C. O’ Hara, Ning Jiang, and Benjamin S. Halpern. 2017. Our path to better science in less time using open data science tools. Nature Ecology & Evolution 1, 6 (2017), 1–7.
[38]
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv:1508.04025. Retrieved from https://arxiv.org/abs/1508.04025.
[39]
Walid Maalej and Martin P. Robillard. 2013. Patterns of knowledge in API reference documentation. IEEE Transactions on Software Engineering 39, 9 (2013), 1264–1282.
[40]
Thomas W. Malone. 2018. How human–computer‘Superminds’ are redefining the future of work. MIT Sloan Management Review 59, 4 (2018), 34–41.
[41]
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 220–229.
[42]
Michael Muller, Ingrid Lange, Dakuo Wang, David Piorkowski, Jason Tsay, Q. Vera Liao, Casey Dugan, and Thomas Erickson. 2019. How data science workers work with data: Discovery, capture, curation, design, creation. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, New York, NY, Article 126, 15 pages. DOI:https://doi.org/10.1145/3290605.3300356
[43]
Michael Muller, April Yi Wang, Steven I. Ross, Justin D. Weisz, Mayank Agarwal, Kartik Talamadupula, Stephanie Houde, Fernando Martinez, John Richards, Jaimie Drozdal, Xuye Liu, David Piorkowski, and Dakuo Wang. 2021. How data scientists improve generated code documentation in jupyter notebooks.
[44]
Stephen Oney and Joel Brandt. 2012. Codelets: Linking interactive documentation and example code in the editor. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2697–2706.
[45]
Yoann Padioleau, Lin Tan, and Yuanyuan Zhou. 2009. Listening to programmers-Taxonomies and characteristics of comments in operating system code. In Proceedings of the 2009 IEEE 31st International Conference on Software Engineering. IEEE, 331–341.
[46]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 311–318.
[47]
Raja Parasuraman, Thomas B. Sheridan, and Christopher D. Wickens. 2000. A model for types and levels of human interaction with automation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 30, 3 (2000), 286–297.
[48]
Soya Park, Amy X. Zhang, and David R. Karger. 2018. Post-literate programming: Linking discussion and code in software development teams. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings. ACM, New York, NY, 51–53. DOI:https://doi.org/10.1145/3266037.3266098
[49]
Jeffrey M. Perkel. 2018. Why jupyter is data scientists’ computational notebook of choice. Nature 563, 7732 (2018), 145. DOI:https://doi.org/10.1038/d41586-018-07196-1
[50]
João Felipe Pimentel, Saumen Dey, Timothy McPhillips, Khalid Belhajjame, David Koop, Leonardo Murta, Vanessa Braganholo, and Bertram Ludäscher. 2016. Yin & Yang: Demonstrating complementary provenance from noWorkflow & YesWorkflow. In Proceedings of the International Provenance and Annotation Workshop. Springer, 161–165.
[51]
David Piorkowski, Soya Park, April Yi Wang, Dakuo Wang, Michael Muller, and Felix Portnoy. 2021. How ai developers overcome communication challenges in a multidisciplinary team: A case study. Proceedings of the ACM on Human–Computer Interaction 5, CSCW1 (2021), 1–25.
[52]
Mohammed Suhail Rehman. 2019. Towards understanding data analysis workflows using a large notebook corpus. In Proceedings of the 2019 International Conference on Management of Data. 1841–1843.
[53]
Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. 2020. Beyond accuracy: Behavioral testing of NLP models with checklist. arXiv:2005.04118. Retrieved from https://arxiv.org/abs/2005.04118.
[54]
Tobias Roehm, Rebecca Tiarks, Rainer Koschke, and Walid Maalej. 2012. How do professional developers comprehend software? In Proceedings of the 2012 34th International Conference on Software Engineering. IEEE, 255–265.
[55]
Adam Rule, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, Mai H. Nguyen, Sara Brin Rosenthal, Fernando Pérez, and Peter W. Rose. 2019. Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks.
[56]
Adam Rule, Ian Drosos, Aurélien Tabard, and James D. Hollan. 2018. Aiding collaborative reuse of computational notebooks with annotated cell folding. Proceedings of the ACM on Human–Computer Interaction 2, CSCW (2018), 150:1–150:12. DOI:https://doi.org/10.1145/3274419
[57]
Adam Rule, Aurélien Tabard, and James D. Hollan. 2018. Exploration and explanation in computational notebooks. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–12.
[58]
Jeffrey Saltz, Kevin Crowston, and Ivan Shamshurin. 2017. Comparing data science project management methodologies via a controlled experiment. In Proceedings of the Hawaii International Conference on System Sciences.
[59]
Sheeba Samuel and Birgitta König-Ries. 2018. ProvBook: Provenance-based semantic enrichment of interactive notebooks for reproducibility. In Proceedings of the 17th International Semantic Web Conference.
[60]
Sheeba Samuel and Birgitta König-Ries. 2020. ReproduceMeGit: A visualization tool for analyzing reproducibility of jupyter notebooks. arXiv:2006.12110. Retrieved from https://arxiv.org/abs/2006.12110.
[61]
Isabella Seeber, Eva Bittner, Robert O. Briggs, Triparna de Vreede, Gert-Jan De Vreede, Aaron Elkins, Ronald Maier, Alexander B. Merz, Sarah Oeste-Reiß, Nils Randrup, Gerhard Schwabe, and Matthias Söllner. 2020. Machines as teammates: A research agenda on AI in team collaboration. Information & Management 57, 2 (2020), 103174.
[62]
Lin Shi, Hao Zhong, Tao Xie, and Mingshu Li. 2011. An empirical study on evolution of API documentation. In Proceedings of the International Conference on Fundamental Approaches to Software Engineering. Springer, 416–431.
[63]
Ben Shneiderman. 2020. Human-centered artificial intelligence: Reliable, safe & trustworthy. International Journal of Human–Computer Interaction 36, 6 (2020), 495–504.
[64]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for java methods. In Proceedings of the IEEE/ACM International Conference on Automated Software Engineering. 43–52.
[65]
April Yi Wang, Zihan Wu, Christopher Brooks, and Steve Oney. 2020. Callisto: Capturing the “Why” by connecting conversations with computational narratives. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM.
[66]
Dakuo Wang. 2016. How people write together now: Exploring and supporting today’s computer-supported collaborative writing. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion. ACM, 175–179.
[67]
Dakuo Wang, Josh Andres, Justin Weisz, Erick Oduor, and Casey Dugan. 2021. AutoDS: Towards human-centered automation of data science. In Proceeding of the 2021 CHI Conference on Human Factors in Computing Systems.
[68]
Dakuo Wang, Q. Vera Liao, Yunfeng Zhang, Udayan Khurana, Horst Samulowitz, Soya Park, Michael Muller, and Lisa Amini. 2021. How much automation does a data scientist want?. arXiv:2101.03970. Retrieved from https://arxiv.org/abs/2101.03970.
[69]
Dakuo Wang, Parikshit Ram, Daniel Karl I. Weidele, Sijia Liu, Michael Muller, Justin D. Weisz, Abel Valente, Arunima Chaudhary, Dustin Torres, Horst Samulowitz, and Lisa Amini. 2020. Autoai: Automating the end-to-end ai lifecycle with humans-in-the-loop. In Proceedings of the 25th International Conference on Intelligent User Interfaces Companion. 77–78.
[70]
Dakuo Wang, Justin D. Weisz, Michael Muller, Parikshit Ram, Werner Geyer, Casey Dugan, Yla Tausczik, Horst Samulowitz, and Alexander Gray. 2019. Human-AI collaboration in data science: Exploring data scientists’ perceptions of automated AI. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–24.
[71]
Daniel Karl I. Weidele, Justin D. Weisz, Erick Oduor, Michael Muller, Josh Andres, Alexander Gray, and Dakuo Wang. 2020. AutoAIViz: Opening the blackbox of automated artificial intelligence with conditional parallel coordinates. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 308–312.
[72]
Justin D. Weisz, Mohit Jain, Narendra Nath Joshi, James Johnson, and Ingrid Lange. 2019. BigBlueBot: Teaching strategies for successful human-agent interactions. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 448–459.
[73]
John Wenskovitch, Jian Zhao, Scott Carter, Matthew Cooper, and Chris North. 2019. Albireo: An interactive tool for visually summarizing computational notebook structure. In Proceedings of the 2019 IEEE Visualization in Data Science. IEEE, 1–10.
[74]
Jo Wood, Alexander Kachkaev, and Jason Dykes. 2018. Design exposition with literate visualization. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2018), 759–768.
[75]
Kun Xu, Lingfei Wu, Zhiguo Wang, Yansong Feng, Michael Witbrock, and Vadim Sheinin. 2018. Graph2seq: Graph to sequence learning with attention-based neural networks. arXiv:1804.00823. Retrieved from https://arxiv.org/abs/1804.00823.
[76]
Ying Xu, Dakuo Wang, Penelope Collins, Hyelim Lee, and Mark Warschauer. 2021. Same benefits, different communication patterns: Comparing Children’s reading with a conversational agent vs. a human partner. Computers & Education 161 (2021), 104059.
[77]
Amy X. Zhang, Michael Muller, and Dakuo Wang. 2020. How do data science workers collaborate? Roles, workflows, and tools. arXiv:2001.06684. Retrieved from https://arxiv.org/abs/2001.06684.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer-Human Interaction
ACM Transactions on Computer-Human Interaction  Volume 29, Issue 2
April 2022
347 pages
ISSN:1073-0516
EISSN:1557-7325
DOI:10.1145/3505202
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 January 2022
Accepted: 01 September 2021
Revised: 01 September 2021
Received: 01 June 2021
Published in TOCHI Volume 29, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code summarization
  2. computational notebooks
  3. code documentation

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)913
  • Downloads (Last 6 weeks)107
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media