skip to main content
10.1145/3657604.3662039acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesl-at-sConference Proceedingsconference-collections
research-article
Open access

Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt Writing

Published: 15 July 2024 Publication History

Abstract

Learning to program requires the development of a variety of skills including the ability to read, comprehend, and communicate the purpose of code. In the age of large language models (LLMs), where code can be generated automatically, developing these skills is more important than ever for novice programmers. The ability to write precise natural language descriptions of desired behavior is essential for eliciting code from an LLM, and the code that is generated must be understood in order to evaluate its correctness and suitability. In introductory computer science courses, a common question type used to develop and assess code comprehension skill is the 'Explain in Plain English' (EiPE) question. In these questions, students are shown a segment of code and asked to provide a natural language description of that code's purpose. The adoption of EiPE questions at scale has been hindered by: 1) the difficulty of automatically grading short answer responses and 2) the ability to provide effective and transparent feedback to students. To address these shortcomings, we explore and evaluate a grading approach where a student's EiPE response is used to generate code via an LLM, and that code is evaluated against test cases to determine if the description of the code was accurate. This provides a scalable approach to creating code comprehension questions and enables feedback both through the code generated from a student's description and the results of test cases run on that code. We evaluate students' success in completing these tasks, their use of the feedback provided by the system, and their perceptions of the activity.

References

[1]
Sushmita Azad. 2020. Lessons learnt developing and deploying grading mechanisms for EiPE code-reading questions in CS1 classes. Ph.,D. Dissertation.
[2]
Sushmita Azad, Binglin Chen, Maxwell Fowler, Matthew West, and Craig Zilles. 2020. Strategies for deploying unreliable AI graders in high-transparency high-stakes exams. In Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6--10, 2020, Proceedings, Part I 21. Springer, 16--28. https://doi.org/10.1007/978--3-030--52237--7_2
[3]
John B Biggs and Kevin F Collis. 2014. Evaluating the quality of learning: The SOLO taxonomy (Structure of the Observed Learning Outcome). Academic Press.
[4]
Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, Vol. 3, 2 (2006), 77--101.
[5]
Steven Burrows, Iryna Gurevych, and Benno Stein. 2015. The eras and trends of automatic short answer grading. International journal of artificial intelligence in education, Vol. 25 (2015), 60--117. https://doi.org/10.1007/s40593-014-0026--8
[6]
Binglin Chen, Sushmita Azad, Rajarshi Haldar, Matthew West, and Craig Zilles. 2020. A validated scoring rubric for explain-in-plain-english questions. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 563--569. https://doi.org/10.1145/3328778.3366879
[7]
Bruno Pereira Cipriano and Pedro Alves. 2023. Gpt-3 vs object oriented programming assignments: An experience report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 61--67. https://doi.org/10.1145/3587102.3588814
[8]
Bruno Pereira Cipriano, Pedro Alves, and Paul Denny. 2024. A Picture Is Worth a Thousand Words: Exploring Diagram and Video-Based OOP Exercises to Counter LLM Over-Reliance. arxiv: 2403.08396 [cs.SE]
[9]
J.H.I.I. Cross, T.D. Hendrix, and L.A. Barowski. 2002. Using the debugger as an integral part of teaching CS1. In 32nd Annual Frontiers in Education, Vol. 2. F1G--F1G. https://doi.org/10.1109/FIE.2002.1158137
[10]
Paul Denny, Brett A Becker, Juho Leinonen, and James Prather. 2023. Chat Overflow: Artificially Intelligent Models for Computing Education-renAIssance or apocAIypse?. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 3--4. https://doi.org/10.1145/3587102.3588773
[11]
Paul Denny, David H. Smith IV, Max Fowler, James Prather, Brett A. Becker, and Juho Leinonen. 2024. Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills. In Proceedings of the 2024 Conference on Innovation and Technology in Computer Science Education V. 1 (Milan, Italy) (ITiCSE 2024). Association for Computing Machinery, New York, NY, USA, 7 pages. https://doi.org/10.1145/3649217.3653587
[12]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1136--1142. https://doi.org/10.1145/3545945.3569823
[13]
Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, USA) (SIGCSE 2024). ACM, New York, NY, USA, 296--302. https://doi.org/10.1145/3626252.3630909
[14]
Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011. CodeWrite: Supporting Student-Driven Practice of Java. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education (Dallas, TX, USA) (SIGCSE '11). Association for Computing Machinery, New York, NY, USA, 471--476. https://doi.org/10.1145/1953163.1953299
[15]
Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2024. Computing Education in the Era of Generative AI. Commun. ACM, Vol. 67, 2 (Jan 2024), 56--67. https://doi.org/10.1145/3624720
[16]
John Edwards, Joseph Ditton, Dragan Trninic, Hillary Swanson, Shelsey Sullivan, and Chad Mano. 2020. Syntax exercises in CS1. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 216--226. https://doi.org/10.1145/3372782.3406259
[17]
Nigel Fernandez, Aritra Ghosh, Naiming Liu, Zichao Wang, Beno^it Choffin, Richard Baraniuk, and Andrew Lan. 2022. Automated scoring for reading comprehension via in-context bert tuning. In International Conference on Artificial Intelligence in Education. Springer, 691--697. https://doi.org/10.1007/978--3-031--11644--5_69
[18]
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In Proceedings of the 24th Australasian Computing Education Conference. 10--19. https://doi.org/10.1145/3511861.3511863
[19]
Max Fowler, Binglin Chen, Sushmita Azad, Matthew West, and Craig Zilles. 2021. Autograding" Explain in Plain English" questions using NLP. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 1163--1169. https://doi.org/10.1145/3408877.3432539
[20]
Max Fowler, Binglin Chen, and Craig Zilles. 2021. How should we ?Explain in plain English'? Voices from the Community. In Proceedings of the 17th ACM conference on international computing education research. 69--80. https://doi.org/10.1145/3446871.3469738
[21]
Max Fowler, David H Smith IV, Mohammed Hassan, Seth Poulsen, Matthew West, and Craig Zilles. 2022. Reevaluating the relationship between explaining, tracing, and writing skills in CS1 in a replication study. Computer Science Education, Vol. 32, 3 (2022), 355--383. https://doi.org/10.1080/08993408.2022.2079866
[22]
Lucas Busatta Galhardi and Jacques Duílio Brancher. 2018. Machine learning approach for automatic short answer grading: A systematic review. In Advances in Artificial Intelligence-IBERAMIA 2018: 16th Ibero-American Conference on AI, Trujillo, Peru, November 13--16, 2018, Proceedings 16. Springer, 380--391. https://doi.org/10.1007/978--3-030-03928--8_31
[23]
Qiang Hao, David H Smith IV, Lu Ding, Amy Ko, Camille Ottaway, Jack Wilson, Kai H Arakawa, Alistair Turcan, Timothy Poehlman, and Tyler Greer. 2022. Towards understanding the effective design of automated formative feedback for programming assignments. Computer Science Education, Vol. 32, 1 (2022), 105--127. https://doi.org/10.1080/08993408.2020.1860408
[24]
Qiang Hao, Jack P Wilson, Camille Ottaway, Naitra Iriumi, Kai Arakawa, and David H Smith. 2019. Investigating the Essential of Meaningful Automated Formative Feedback for Programming Assignments. In 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 151--155. https://doi.org/10.1109/VLHCC.2019.8818922
[25]
Owen Henkel, Libby Hills, Bill Roberts, and Joshua McGrane. 2023. Can LLMs Grade Short-answer Reading Comprehension Questions: Foundational Literacy Assessment in LMICs. arXiv preprint arXiv:2310.18373 (2023).
[26]
Silas Hsu, Tiffany Wenting Li, Zhilin Zhang, Max Fowler, Craig Zilles, and Karrie Karahalios. 2021. Attitudes surrounding an imperfect AI autograder. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1--15. https://doi.org/10.1145/3411764.3445424
[27]
Jacqueline Hundley. 2008. A review of using design patterns in CS1. In Proceedings of the 46th Annual Southeast Regional Conference on XX. 30--33. https://doi.org/10.1145/1593105.1593113
[28]
Cruz Izu and Claudio Mirolo. 2023. Exploring CS1 Student's Notions of Code Quality. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 12--18. https://doi.org/10.1145/3587102.3588808
[29]
Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara J Ericson, David Weintrop, and Tovi Grossman. 2023. How novices use LLM-based code generators to solve CS1 coding tasks in a self-paced learning environment. arXiv preprint arXiv:2309.14049 (2023).
[30]
Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Z. Henley, Paul Denny, Michelle Craig, and Tovi Grossman. 2024. CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, Hawaii, USA) (CHI '24). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3613904.3642773
[31]
Natalie Kiesler and Daniel Schiffner. 2023. Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments. arXiv preprint arXiv:2308.08572 (2023).
[32]
Viraj Kumar. 2021. Refute: An Alternative to "Explain in Plain English' Questions. In Proceedings of the 17th ACM Conference on International Computing Education Research. 438--440. https://doi.org/10.1145/3446871.3469791
[33]
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).
[34]
Tiffany Wenting Li, Silas Hsu, Max Fowler, Zhilin Zhang, Craig Zilles, and Karrie Karahalios. 2023. Am I Wrong, or Is the Autograder Wrong? Effects of AI Grading Mistakes on Learning. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1. 159--176. https://doi.org/10.1145/3568813.3600124
[35]
Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further evidence of a relationship between explaining, tracing and writing skills in introductory programming. Acm sigcse bulletin, Vol. 41, 3 (2009), 161--165. https://doi.org/10.1145/1595496.1562930
[36]
Raymond Lister, Beth Simon, Errol Thompson, Jacqueline L Whalley, and Christine Prasad. 2006. Not seeing the forest for the trees: novice programmers and the SOLO taxonomy. ACM SIGCSE Bulletin, Vol. 38, 3 (2006), 118--122. https://doi.org/10.1145/1140123.1140157
[37]
Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the fourth international workshop on computing education research. 101--112. https://doi.org/10.1145/1404520.1404531
[38]
Laurie Murphy, Renée McCauley, and Sue Fitzgerald. 2012. 'Explain in plain English'questions: implications for teaching. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 385--390. https://doi.org/10.1145/2157136.2157249
[39]
Greg L Nelson, Benjamin Xie, and Amy J Ko. 2017. Comprehension first: evaluating a novel pedagogy and tutoring system for program tracing in CS1. In Proceedings of the 2017 ACM conference on international computing education research. 2--11. https://doi.org/10.1145/3105726.3106178
[40]
Priti Oli, Rabin Banjade, Jeevan Chapagain, and Vasile Rus. 2023. Automated Assessment of Students' Code Comprehension using LLMs. arXiv preprint arXiv:2401.05399 (2023).
[41]
Linus Östlund, Niklas Wicklund, and Richard Glassey. 2023. It's Never too Early to Learn About Code Quality: A Longitudinal Study of Code Quality in First-year Computer Science Students. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 792--798. https://doi.org/10.1145/3545945.3569829
[42]
James Prather, Paul Denny, Juho Leinonen, Brett A Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, et al. 2023. The robots are here: Navigating the generative ai revolution in computing education. In Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education. 108--159. https://doi.org/10.1145/3623762.3633499
[43]
James Prather, Paul Denny, Juho Leinonen, David H Smith IV, Brent N Reeves, Stephen MacNeil, Brett A Becker, Andrew Luxton-Reilly, Thezyrie Amarouche, and Bailey Kimmel. 2024. Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models. arXiv preprint arXiv:2401.10759 (2024).
[44]
Brent Reeves, Sami Sarsa, James Prather, Paul Denny, Brett A Becker, Arto Hellas, Bailey Kimmel, Garrett Powell, and Juho Leinonen. 2023. Evaluating the performance of code generation models for solving Parsons problems with small prompt variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 299--305. https://doi.org/10.1145/3587102.3588805
[45]
Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).
[46]
David H Smith IV and Craig Zilles. 2023. Code Generation Based Grading: Evaluating an Auto-grading Mechanism for" Explain-in-Plain-English" Questions. arXiv preprint arXiv:2311.14903 (2023).
[47]
David H Smith IV and Craig Zilles. 2024. Evaluating Large Language Model Code Generation as an Autograding Mechanism for" Explain in Plain English" Questions. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2. 1824--1825. https://doi.org/10.1145/3626253.3635542
[48]
Allison Elliott Tew and Mark Guzdial. 2010. Developing a validated assessment of fundamental CS1 concepts. In Proceedings of the 41st ACM technical symposium on Computer science education. 97--101. https://doi.org/10.1145/1734263.1734297
[49]
Matthew West, Geoffrey L Herman, and Craig Zilles. 2015. PrairieLearn: Mastery-based online problem solving with adaptive scoring and recommendations driven by machine learning. In 2015 ASEE Annual Conference & Exposition. 26--1238.
[50]
Benjamin Xie, Dastyni Loksa, Greg L Nelson, Matthew J Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Amy J Ko. 2019. A theory of instruction for introductory programming skills. Computer Science Education, Vol. 29, 2--3 (2019), 205--253. https://doi.org/10.1080/08993408.2019.1565235

Cited By

View all

Index Terms

  1. Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt Writing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    L@S '24: Proceedings of the Eleventh ACM Conference on Learning @ Scale
    July 2024
    582 pages
    ISBN:9798400706332
    DOI:10.1145/3657604
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 July 2024

    Check for updates

    Author Tags

    1. CS1
    2. EIPE
    3. LLMs
    4. code comprehension
    5. explain in plain English
    6. introductory programming
    7. large language models
    8. prompting

    Qualifiers

    • Research-article

    Conference

    L@S '24

    Acceptance Rates

    Overall Acceptance Rate 117 of 440 submissions, 27%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)443
    • Downloads (Last 6 weeks)120
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media