research-article

Open access

Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt Writing

Authors:

David H. Smith, IV,

Max FowlerAuthors Info & Claims

L@S '24: Proceedings of the Eleventh ACM Conference on Learning @ Scale

Pages 39 - 50

https://doi.org/10.1145/3657604.3662039

Published: 15 July 2024 Publication History

Abstract

Learning to program requires the development of a variety of skills including the ability to read, comprehend, and communicate the purpose of code. In the age of large language models (LLMs), where code can be generated automatically, developing these skills is more important than ever for novice programmers. The ability to write precise natural language descriptions of desired behavior is essential for eliciting code from an LLM, and the code that is generated must be understood in order to evaluate its correctness and suitability. In introductory computer science courses, a common question type used to develop and assess code comprehension skill is the 'Explain in Plain English' (EiPE) question. In these questions, students are shown a segment of code and asked to provide a natural language description of that code's purpose. The adoption of EiPE questions at scale has been hindered by: 1) the difficulty of automatically grading short answer responses and 2) the ability to provide effective and transparent feedback to students. To address these shortcomings, we explore and evaluate a grading approach where a student's EiPE response is used to generate code via an LLM, and that code is evaluated against test cases to determine if the description of the code was accurate. This provides a scalable approach to creating code comprehension questions and enables feedback both through the code generated from a student's description and the results of test cases run on that code. We evaluate students' success in completing these tasks, their use of the feedback provided by the system, and their perceptions of the activity.

References

[1]

Sushmita Azad. 2020. Lessons learnt developing and deploying grading mechanisms for EiPE code-reading questions in CS1 classes. Ph.,D. Dissertation.

[2]

Sushmita Azad, Binglin Chen, Maxwell Fowler, Matthew West, and Craig Zilles. 2020. Strategies for deploying unreliable AI graders in high-transparency high-stakes exams. In Artificial Intelligence in Education: 21st International Conference, AIED 2020, Ifrane, Morocco, July 6--10, 2020, Proceedings, Part I 21. Springer, 16--28. https://doi.org/10.1007/978--3-030--52237--7_2

Digital Library

[3]

John B Biggs and Kevin F Collis. 2014. Evaluating the quality of learning: The SOLO taxonomy (Structure of the Observed Learning Outcome). Academic Press.

[4]

Virginia Braun and Victoria Clarke. 2006. Using thematic analysis in psychology. Qualitative research in psychology, Vol. 3, 2 (2006), 77--101.

[5]

Steven Burrows, Iryna Gurevych, and Benno Stein. 2015. The eras and trends of automatic short answer grading. International journal of artificial intelligence in education, Vol. 25 (2015), 60--117. https://doi.org/10.1007/s40593-014-0026--8

[6]

Binglin Chen, Sushmita Azad, Rajarshi Haldar, Matthew West, and Craig Zilles. 2020. A validated scoring rubric for explain-in-plain-english questions. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 563--569. https://doi.org/10.1145/3328778.3366879

Digital Library

[7]

Bruno Pereira Cipriano and Pedro Alves. 2023. Gpt-3 vs object oriented programming assignments: An experience report. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 61--67. https://doi.org/10.1145/3587102.3588814

Digital Library

[8]

Bruno Pereira Cipriano, Pedro Alves, and Paul Denny. 2024. A Picture Is Worth a Thousand Words: Exploring Diagram and Video-Based OOP Exercises to Counter LLM Over-Reliance. arxiv: 2403.08396 [cs.SE]

[9]

J.H.I.I. Cross, T.D. Hendrix, and L.A. Barowski. 2002. Using the debugger as an integral part of teaching CS1. In 32nd Annual Frontiers in Education, Vol. 2. F1G--F1G. https://doi.org/10.1109/FIE.2002.1158137

[10]

Paul Denny, Brett A Becker, Juho Leinonen, and James Prather. 2023. Chat Overflow: Artificially Intelligent Models for Computing Education-renAIssance or apocAIypse?. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 3--4. https://doi.org/10.1145/3587102.3588773

Digital Library

[11]

Paul Denny, David H. Smith IV, Max Fowler, James Prather, Brett A. Becker, and Juho Leinonen. 2024. Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills. In Proceedings of the 2024 Conference on Innovation and Technology in Computer Science Education V. 1 (Milan, Italy) (ITiCSE 2024). Association for Computing Machinery, New York, NY, USA, 7 pages. https://doi.org/10.1145/3649217.3653587

Digital Library

[12]

Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring prompt engineering for solving CS1 problems using natural language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 1136--1142. https://doi.org/10.1145/3545945.3569823

Digital Library

[13]

Paul Denny, Juho Leinonen, James Prather, Andrew Luxton-Reilly, Thezyrie Amarouche, Brett A. Becker, and Brent N. Reeves. 2024. Prompt Problems: A New Programming Exercise for the Generative AI Era. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, USA) (SIGCSE 2024). ACM, New York, NY, USA, 296--302. https://doi.org/10.1145/3626252.3630909

Digital Library

[14]

Paul Denny, Andrew Luxton-Reilly, Ewan Tempero, and Jacob Hendrickx. 2011. CodeWrite: Supporting Student-Driven Practice of Java. In Proceedings of the 42nd ACM Technical Symposium on Computer Science Education (Dallas, TX, USA) (SIGCSE '11). Association for Computing Machinery, New York, NY, USA, 471--476. https://doi.org/10.1145/1953163.1953299

Digital Library

[15]

Paul Denny, James Prather, Brett A. Becker, James Finnie-Ansley, Arto Hellas, Juho Leinonen, Andrew Luxton-Reilly, Brent N. Reeves, Eddie Antonio Santos, and Sami Sarsa. 2024. Computing Education in the Era of Generative AI. Commun. ACM, Vol. 67, 2 (Jan 2024), 56--67. https://doi.org/10.1145/3624720

Digital Library

[16]

John Edwards, Joseph Ditton, Dragan Trninic, Hillary Swanson, Shelsey Sullivan, and Chad Mano. 2020. Syntax exercises in CS1. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 216--226. https://doi.org/10.1145/3372782.3406259

Digital Library

[17]

Nigel Fernandez, Aritra Ghosh, Naiming Liu, Zichao Wang, Beno^it Choffin, Richard Baraniuk, and Andrew Lan. 2022. Automated scoring for reading comprehension via in-context bert tuning. In International Conference on Artificial Intelligence in Education. Springer, 691--697. https://doi.org/10.1007/978--3-031--11644--5_69

Digital Library

[18]

James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In Proceedings of the 24th Australasian Computing Education Conference. 10--19. https://doi.org/10.1145/3511861.3511863

Digital Library

[19]

Max Fowler, Binglin Chen, Sushmita Azad, Matthew West, and Craig Zilles. 2021. Autograding" Explain in Plain English" questions using NLP. In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education. 1163--1169. https://doi.org/10.1145/3408877.3432539

Digital Library

[20]

Max Fowler, Binglin Chen, and Craig Zilles. 2021. How should we ?Explain in plain English'? Voices from the Community. In Proceedings of the 17th ACM conference on international computing education research. 69--80. https://doi.org/10.1145/3446871.3469738

Digital Library

[21]

Max Fowler, David H Smith IV, Mohammed Hassan, Seth Poulsen, Matthew West, and Craig Zilles. 2022. Reevaluating the relationship between explaining, tracing, and writing skills in CS1 in a replication study. Computer Science Education, Vol. 32, 3 (2022), 355--383. https://doi.org/10.1080/08993408.2022.2079866

[22]

Lucas Busatta Galhardi and Jacques Duílio Brancher. 2018. Machine learning approach for automatic short answer grading: A systematic review. In Advances in Artificial Intelligence-IBERAMIA 2018: 16th Ibero-American Conference on AI, Trujillo, Peru, November 13--16, 2018, Proceedings 16. Springer, 380--391. https://doi.org/10.1007/978--3-030-03928--8_31

Digital Library

[23]

Qiang Hao, David H Smith IV, Lu Ding, Amy Ko, Camille Ottaway, Jack Wilson, Kai H Arakawa, Alistair Turcan, Timothy Poehlman, and Tyler Greer. 2022. Towards understanding the effective design of automated formative feedback for programming assignments. Computer Science Education, Vol. 32, 1 (2022), 105--127. https://doi.org/10.1080/08993408.2020.1860408

[24]

Qiang Hao, Jack P Wilson, Camille Ottaway, Naitra Iriumi, Kai Arakawa, and David H Smith. 2019. Investigating the Essential of Meaningful Automated Formative Feedback for Programming Assignments. In 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC). 151--155. https://doi.org/10.1109/VLHCC.2019.8818922

[25]

Owen Henkel, Libby Hills, Bill Roberts, and Joshua McGrane. 2023. Can LLMs Grade Short-answer Reading Comprehension Questions: Foundational Literacy Assessment in LMICs. arXiv preprint arXiv:2310.18373 (2023).

[26]

Silas Hsu, Tiffany Wenting Li, Zhilin Zhang, Max Fowler, Craig Zilles, and Karrie Karahalios. 2021. Attitudes surrounding an imperfect AI autograder. In Proceedings of the 2021 CHI conference on human factors in computing systems. 1--15. https://doi.org/10.1145/3411764.3445424

Digital Library

[27]

Jacqueline Hundley. 2008. A review of using design patterns in CS1. In Proceedings of the 46th Annual Southeast Regional Conference on XX. 30--33. https://doi.org/10.1145/1593105.1593113

Digital Library

[28]

Cruz Izu and Claudio Mirolo. 2023. Exploring CS1 Student's Notions of Code Quality. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 12--18. https://doi.org/10.1145/3587102.3588808

Digital Library

[29]

Majeed Kazemitabaar, Xinying Hou, Austin Henley, Barbara J Ericson, David Weintrop, and Tovi Grossman. 2023. How novices use LLM-based code generators to solve CS1 coding tasks in a self-paced learning environment. arXiv preprint arXiv:2309.14049 (2023).

[30]

Majeed Kazemitabaar, Runlong Ye, Xiaoning Wang, Austin Z. Henley, Paul Denny, Michelle Craig, and Tovi Grossman. 2024. CodeAid: Evaluating a Classroom Deployment of an LLM-based Programming Assistant that Balances Student and Educator Needs. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (Honolulu, Hawaii, USA) (CHI '24). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3613904.3642773

Digital Library

[31]

Natalie Kiesler and Daniel Schiffner. 2023. Large Language Models in Introductory Programming Education: ChatGPT's Performance and Implications for Assessments. arXiv preprint arXiv:2308.08572 (2023).

[32]

Viraj Kumar. 2021. Refute: An Alternative to "Explain in Plain English' Questions. In Proceedings of the 17th ACM Conference on International Computing Education Research. 438--440. https://doi.org/10.1145/3446871.3469791

Digital Library

[33]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, et al. 2023. StarCoder: may the source be with you! arXiv preprint arXiv:2305.06161 (2023).

[34]

Tiffany Wenting Li, Silas Hsu, Max Fowler, Zhilin Zhang, Craig Zilles, and Karrie Karahalios. 2023. Am I Wrong, or Is the Autograder Wrong? Effects of AI Grading Mistakes on Learning. In Proceedings of the 2023 ACM Conference on International Computing Education Research-Volume 1. 159--176. https://doi.org/10.1145/3568813.3600124

Digital Library

[35]

Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further evidence of a relationship between explaining, tracing and writing skills in introductory programming. Acm sigcse bulletin, Vol. 41, 3 (2009), 161--165. https://doi.org/10.1145/1595496.1562930

Digital Library

[36]

Raymond Lister, Beth Simon, Errol Thompson, Jacqueline L Whalley, and Christine Prasad. 2006. Not seeing the forest for the trees: novice programmers and the SOLO taxonomy. ACM SIGCSE Bulletin, Vol. 38, 3 (2006), 118--122. https://doi.org/10.1145/1140123.1140157

Digital Library

[37]

Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the fourth international workshop on computing education research. 101--112. https://doi.org/10.1145/1404520.1404531

Digital Library

[38]

Laurie Murphy, Renée McCauley, and Sue Fitzgerald. 2012. 'Explain in plain English'questions: implications for teaching. In Proceedings of the 43rd ACM technical symposium on Computer Science Education. 385--390. https://doi.org/10.1145/2157136.2157249

Digital Library

[39]

Greg L Nelson, Benjamin Xie, and Amy J Ko. 2017. Comprehension first: evaluating a novel pedagogy and tutoring system for program tracing in CS1. In Proceedings of the 2017 ACM conference on international computing education research. 2--11. https://doi.org/10.1145/3105726.3106178

Digital Library

[40]

Priti Oli, Rabin Banjade, Jeevan Chapagain, and Vasile Rus. 2023. Automated Assessment of Students' Code Comprehension using LLMs. arXiv preprint arXiv:2401.05399 (2023).

[41]

Linus Östlund, Niklas Wicklund, and Richard Glassey. 2023. It's Never too Early to Learn About Code Quality: A Longitudinal Study of Code Quality in First-year Computer Science Students. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1. 792--798. https://doi.org/10.1145/3545945.3569829

Digital Library

[42]

James Prather, Paul Denny, Juho Leinonen, Brett A Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton-Reilly, et al. 2023. The robots are here: Navigating the generative ai revolution in computing education. In Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education. 108--159. https://doi.org/10.1145/3623762.3633499

Digital Library

[43]

James Prather, Paul Denny, Juho Leinonen, David H Smith IV, Brent N Reeves, Stephen MacNeil, Brett A Becker, Andrew Luxton-Reilly, Thezyrie Amarouche, and Bailey Kimmel. 2024. Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models. arXiv preprint arXiv:2401.10759 (2024).

[44]

Brent Reeves, Sami Sarsa, James Prather, Paul Denny, Brett A Becker, Arto Hellas, Bailey Kimmel, Garrett Powell, and Juho Leinonen. 2023. Evaluating the performance of code generation models for solving Parsons problems with small prompt variations. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1. 299--305. https://doi.org/10.1145/3587102.3588805

Digital Library

[45]

Baptiste Roziere, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950 (2023).

[46]

David H Smith IV and Craig Zilles. 2023. Code Generation Based Grading: Evaluating an Auto-grading Mechanism for" Explain-in-Plain-English" Questions. arXiv preprint arXiv:2311.14903 (2023).

[47]

David H Smith IV and Craig Zilles. 2024. Evaluating Large Language Model Code Generation as an Autograding Mechanism for" Explain in Plain English" Questions. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2. 1824--1825. https://doi.org/10.1145/3626253.3635542

Digital Library

[48]

Allison Elliott Tew and Mark Guzdial. 2010. Developing a validated assessment of fundamental CS1 concepts. In Proceedings of the 41st ACM technical symposium on Computer science education. 97--101. https://doi.org/10.1145/1734263.1734297

Digital Library

[49]

Matthew West, Geoffrey L Herman, and Craig Zilles. 2015. PrairieLearn: Mastery-based online problem solving with adaptive scoring and recommendations driven by machine learning. In 2015 ASEE Annual Conference & Exposition. 26--1238.

[50]

Benjamin Xie, Dastyni Loksa, Greg L Nelson, Matthew J Davidson, Dongsheng Dong, Harrison Kwik, Alex Hui Tan, Leanne Hwa, Min Li, and Amy J Ko. 2019. A theory of instruction for introductory programming skills. Computer Science Education, Vol. 29, 2--3 (2019), 205--253. https://doi.org/10.1080/08993408.2019.1565235

Cited By

Alkafaween UAlbluwi IDenny P(2024)Automating Autograding: Large Language Models as Test Suite Generators for Introductory ProgrammingJournal of Computer Assisted Learning10.1111/jcal.1310041:1Online publication date: 25-Dec-2024
https://doi.org/10.1111/jcal.13100

Index Terms

Prompting for Comprehension: Exploring the Intersection of Explain in Plain English Questions and Prompt Writing
1. Social and professional topics
  1. Professional topics
    1. Computing education

Recommendations

Explaining Code with a Purpose: An Integrated Approach for Developing Code Comprehension and Prompting Skills
ITiCSE 2024: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1

Reading, understanding and explaining code have traditionally been important skills for novices learning programming. As large language models (LLMs) become prevalent, these foundational skills are more important than ever given the increasing need to ...
'explain in plain english' questions revisited: data structures problems
SIGCSE '14: Proceedings of the 45th ACM technical symposium on Computer science education

Recent studies have linked the ability of novice (CS1) programmers to read and explain code with their ability to write code. This study extends earlier work by asking CS2 students to explain object-oriented data structures problems that involve ...
Examples of Unsuccessful Use of Code Comprehension Strategies: A Resource for Developing Code Comprehension Pedagogy
ICER '23: Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1

Background: Code comprehension research has identified gaps between the strategies experts and novices use in comprehending code. In computer science (CS) education, code comprehension has recently received increased attention, and research has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

L@S '24: Proceedings of the Eleventh ACM Conference on Learning @ Scale

July 2024

582 pages

ISBN:9798400706332

DOI:10.1145/3657604

General Chair:
David Joyner
Georgia Tech, USA
,
Program Chairs:
Min Kyu Kim
Georgia State University, USA
,
Xu Wang
University of Michigan, USA
,
Meng Xia
Texas A&M University, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 July 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

L@S '24

L@S '24: Eleventh ACM Conference on Learning @ Scale

July 18 - 20, 2024

GA, Atlanta, USA

Acceptance Rates

Overall Acceptance Rate 117 of 440 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
443
Total Downloads

Downloads (Last 12 months)443
Downloads (Last 6 weeks)120

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Alkafaween UAlbluwi IDenny P(2024)Automating Autograding: Large Language Models as Test Suite Generators for Introductory ProgrammingJournal of Computer Assisted Learning10.1111/jcal.1310041:1Online publication date: 25-Dec-2024
https://doi.org/10.1111/jcal.13100

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents