skip to main content
10.1145/3576882.3617916acmconferencesArticle/Chapter ViewAbstractPublication PagescompedConference Proceedingsconference-collections
research-article

A Bug's New Life: Creating Refute Questions from Filtered CS1 Student Code Snapshots

Published: 05 December 2023 Publication History

Abstract

In an introductory programming (CS1) context, a Refute question asks students for a counter-example which proves that a given code fragment is an incorrect solution for a given task. Such a question can be used as an assessment item to (formatively) develop or (summatively) demonstrate a student's abilities to comprehend the task and the code well enough to recognize a mismatch. These abilities assume greater significance with the emergence of generative AI technologies capable of writing code that is plausible (at least to novice programmers) but not always correct.
Instructors must address three concerns while designing an effective Refute question, each influenced by their specific teaching-learning context: (1) Is the task comprehensible? (2) Is the incorrect code a plausible solution for the task? (3) Is the complexity of finding a counter-example acceptable? While the first concern can often be addressed by reusing tasks from previous code writing questions, addressing the latter concerns may require substantial instructor effort. We therefore investigate whether concerns (2) and (3) can be addressed by buggy student solutions for the corresponding code writing question from a previous course offering. For 6 code writing questions (from a Fall 2015 C programming course), our automated evaluation system logged 13,847 snapshots of executable student code, of which 10,574 were buggy (i.e., they failed at least one instructor-supplied test case). Code selected randomly from this pool rarely addresses these concerns, and manual selection is infeasible. Our paper makes three contributions. First, we propose an automated mechanism to filter this pool to a more manageable number of snapshots from which appropriate code can be selected manually. Second, we evaluate our semi-automated mechanism with respect to concerns (2) and (3) by surveying a diverse set of 56 experienced participants (instructors, tutors, and teaching assistants). Third, we use this mechanism to seed a public repository of Refute questions and provide a template to create additional questions using a public resource (CodeCheck).

References

[1]
G.S. Adithi, Akshay Adiga, K. Pavithra, Prajwal P. Vasisht, and Viraj Kumar. 2015. Secure, Offline Feedback to Convey Instructor Intent. In 2015 IEEE Seventh International Conference on Technology for Education (T4E). 105--108.
[2]
Marzieh Ahmadzadeh, Dave Elliman, and Colin Higgins. 2005. An analysis of patterns of debugging among novice computer science students. In Proceedings of the 10th annual SIGCSE conference on Innovation and technology in computer science education. 84--88.
[3]
Ibrahim Albluwi. 2019. Plagiarism in Programming Assessments: A Systematic Review. ACM Trans. Comput. Educ. 20, 1, Article 6 (dec 2019), 28 pages. https://doi.org/10.1145/3371156
[4]
Nabeel Alzahrani and Frank Vahid. 2021. Common Logic Errors for Programming Learners: A Three-decade Literature Survey. In 2021 ASEE Virtual Annual Conference Content Access.
[5]
Desai Ankur and Deo Atul. 2022. Introducing Amazon CodeWhisperer, the ML-powered Coding Companion. Retrieved Jan 21, 2023 from https://aws.amazon.com/blogs/machine-learning/introducing-amazon-codewhisperer-the-ml-powered-coding-companion/
[6]
Ruven Brooks. 1983. Towards a theory of the comprehension of computer programs. International Journal of Man-Machine Studies 18, 6 (1983), 543--554. https://doi.org/10.1016/S0020--7373(83)80031--5
[7]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.
[8]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, et al. 2021. Evaluating Large Language Models Trained on Code. https://doi.org/10.48550/ARXIV.2107.03374
[9]
Rajdeep Das, Umair Z Ahmed, Amey Karkare, and Sumit Gulwani. 2016. Prutor: A system for tutoring CS1 and collecting student programs for analysis. arXiv preprint arXiv:1608.03828 (2016).
[10]
Michael de Raadt, Richard Watson, and Mark Toleman. 2009. Teaching and Assessing Programming Strategies Explicitly. In Proceedings of the Eleventh Australasian Conference on Computing Education - Volume 95 (Wellington, New Zealand) (ACE '09). 45--54.
[11]
M Decasse and A-M Emde. 1988. A review of automated debugging systems: Knowledge, strategies and techniques. In Proceedings. [1989] 11th International Conference on Software Engineering. IEEE Computer Society, 162--163.
[12]
Paul Denny, Viraj Kumar, and Nasser Giacaman. 2023. Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (Toronto ON, Canada) (SIGCSE 2023). 1136--1142. https://doi.org/10.1145/3545945.3569823
[13]
Andrew Ettles, Andrew Luxton-Reilly, and Paul Denny. 2018. Common logic errors made by novice programmers. In Proceedings of the 20th Australasian Computing Education Conference. 83--89.
[14]
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming. In Australasian Computing Education Conference. 10--19.
[15]
John D Gould. 1975. Some psychological evidence on how people debug computer programs. International Journal of Man-Machine Studies 7, 2 (1975), 151--182.
[16]
Sumit Gulwani, Ivan Radivcek, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. SIGPLAN Not. 53, 4 (jun 2018), 465--480. https://doi.org/10.1145/3296979.3192387
[17]
Cay S. Horstmann. [n.,d.]. CodeCheck. Retrieved May 12, 2023 from https://horstmann.com/codecheck/index.html
[18]
Theresia Devi Indriasari, Andrew Luxton-Reilly, and Paul Denny. 2020. A review of peer code review in higher education. ACM Transactions on Computing Education (TOCE) 20, 3 (2020), 1--25.
[19]
Cruz Izu, Carsten Schulte, Ashish Aggarwal, Quintin Cutts, Rodrigo Duran, Mirela Gutica, Birte Heinemann, Eileen Kraemer, Violetta Lonati, Claudio Mirolo, et al. 2019. Fostering program comprehension in novice programmers-learning activities and learning trajectories. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. 27--52.
[20]
Matthew C Jadud. 2005. A first look at novice compilation behaviour using BlueJ. Computer Science Education 15, 1 (2005), 25--40.
[21]
Amey Karkare and Purushottam Kar. 2022. Prutor: an intelligent learning and management system for programming courses. Commun. ACM 65, 11 (2022), 62--64.
[22]
Irvin R Katz and John R Anderson. 1987. Debugging: An analysis of bug-location strategies. Human-Computer Interaction 3, 4 (1987), 351--399.
[23]
Marja Kuittinen and Jorma Sajaniemi. 2004. Teaching Roles of Variables in Elementary Programming Courses. SIGCSE Bull. 36, 3 (jun 2004), 57--61.
[24]
Viraj Kumar. 2021. Refute: An Alternative to 'Explain in Plain English' Questions. In Proceedings of the 17th ACM Conference on International Computing Education Research (Virtual Event, USA) (ICER 2021). 438--440.
[25]
Viraj Kumar and Arun Raman. 2023. Helping Students Develop a Critical Eye with Refute Questions. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2. 1181. https://doi.org/10.1145/3545947.3569636
[26]
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, et al. 2022. Competition-level code generation with alphacode. arXiv preprint arXiv:2203.07814 (2022).
[27]
Dastyni Loksa and Amy J. Ko. 2016. The Role of Self-Regulation in Programming Problem Solving Process and Success. In Proceedings of the 2016 ACM Conference on International Computing Education Research (Melbourne, VIC, Australia) (ICER '16). 83--91. https://doi.org/10.1145/2960310.2960334
[28]
Manuel Maarek and Léon McGregor. 2020. Development of a Web Platform for Code Peer-Testing. arXiv preprint arXiv:2008.06102 (2020).
[29]
Rifat Sabbir Mansur, Ayaan M Kazerouni, Stephen H Edwards, and Clifford A Shaffer. 2020. Exploring the Bug Investigation Techniques of Intermediate Student Programmers. In Koli Calling'20: Proceedings of the 20th Koli Calling International Conference on Computing Education Research. 1--10.
[30]
Renee McCauley, Sue Fitzgerald, Gary Lewandowski, Laurie Murphy, Beth Simon, Lynda Thomas, and Carol Zander. 2008. Debugging: a review of the literature from an educational perspective. Computer Science Education 18, 2 (2008), 67--92.
[31]
Léon McGregor and Manuel Maarek. 2020. Software Testing as Medium for Peer Feedback. In United Kingdom & Ireland Computing Education Research conference. 66--72.
[32]
Robert C Metzger. 2004. Debugging by thinking: A multidisciplinary approach. Digital Press.
[33]
OpenAI. 2023. GPT-4 Technical Report. arxiv: 2303.08774 [cs.CL]
[34]
José Carlos Paiva, José Paulo Leal, and Álvaro Figueira. 2022. Automated assessment in computer science education: A state-of-the-art review. ACM Transactions on Computing Education (TOCE) 22, 3 (2022), 1--40.
[35]
Joe Gibbs Politz, Joseph M Collard, Arjun Guha, Kathi Fisler, and Shriram Krishnamurthi. 2016. The sweep: Essential examples for in-flow peer review. In Proceedings of the 47th ACM Technical Symposium on Computing Science Education. 243--248.
[36]
James Prather, Raymond Pettit, Brett A. Becker, Paul Denny, Dastyni Loksa, Alani Peters, Zachary Albrecht, and Krista Masci. 2019. BEST PAPER AT SIGCSE 2019 IN THE CS EDUCATION TRACK: First Things First: Providing Metacognitive Scaffolding for Interpreting Problem Prompts. ACM Inroads 10, 2 (apr 2019).
[37]
James Prather, Raymond Pettit, Kayla McMurry, Alani Peters, John Homer, and Maxine Cohen. 2018. Metacognitive Difficulties Faced by Novice Programmers in Automated Assessment Tools. In Proceedings of the 2018 ACM Conference on International Computing Education Research (ICER '18). 41--50.
[38]
Peter C Rigby and Christian Bird. 2013. Convergent contemporary software peer review practices. In Proceedings of the 2013 9th joint meeting on foundations of software engineering. 202--212.
[39]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27--43.
[40]
Carsten Schulte. 2008. Block Model: An Educational Model of Program Comprehension as a Tool for a Scholarly Approach to Teaching. In Proceedings of the Fourth International Workshop on Computing Education Research (Sydney, Australia) (ICER '08). 149--160. https://doi.org/10.1145/1404520.1404535
[41]
Joanna Smith, Joe Tessler, Elliot Kramer, and Calvin Lin. 2012. Using peer review to teach software testing. In Proceedings of the ninth annual international conference on international computing education research. 93--98.
[42]
Rebecca Smith and Scott Rixner. 2019. The error landscape: Characterizing the mistakes of novice programmers. In Proceedings of the 50th ACM technical symposium on computer science education. 538--544.
[43]
Peter Sovietov. 2021. Automatic generation of programming exercises. In 2021 1st International Conference on Technology Enhanced Learning in Higher Education (TELE). IEEE, 111--114.
[44]
James G Spohrer and Elliot Soloway. 1986. Analyzing the high frequency bugs in novice programs. In Papers presented at the first workshop on empirical studies of programmers on Empirical studies of programmers. 230--251.
[45]
Thomas A. Standish. 1984. An Essay on Software Reuse. IEEE Transactions on Software Engineering SE-10, 5 (1984), 494--497. https://doi.org/10.1109/TSE.1984.5010272
[46]
Andrew Thangaraj. 2022. Learning from IITM's Data Science Program. In Proceedings of the 15th Annual ACM India Compute Conference. 2--2.
[47]
Thomas James Tiam-Lee and Kaoru Sumi. 2018. Procedural generation of programming exercises with guides based on the student's emotion. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE.
[48]
Rebecca Tiarks. 2011. What Programmers Really Do - An Observational Study. Softwaretechnik-Trends 31 (2011).
[49]
Michel Wermelinger. 2023. Using GitHub Copilot to Solve Simple Programming Problems. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 (SIGCSE 2023). 172--178.
[50]
Jacqueline Whalley, Amber Settle, and Andrew Luxton-Reilly. 2023. A Think-aloud Study of Novice Debugging. ACM Transactions on Computing Education (2023).
[51]
Susan Wiedenbeck and Nancy J. Evans. 1986. BEACONS IN PROGRAM COMPREHENSION. SIGCHI Bull. 18, 2 (oct 1986), 56--57. https://doi.org/10.1145/15683.1044090
[52]
John Wrenn and Shriram Krishnamurthi. 2019. Executable Examples for Programming Problem Comprehension. In Proceedings of the 2019 ACM Conference on International Computing Education Research (Toronto ON, Canada) (ICER '19). Association for Computing Machinery, New York, NY, USA, 131--139. https://doi.org/10.1145/3291279.3339416
[53]
Andreas Zeller. 2009. Why programs fail: a guide to systematic debugging. Elsevier.
[54]
Rui Zhi, Thomas W Price, Nicholas Lytle, Yihuan Dong, and Tiffany Barnes. 2018. Reducing the state space of programming problems through data-driven feature detection. In Educational Data Mining in Computer Science Education (CSEDM) Workshop@ EDM, Vol. 18.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CompEd 2023: Proceedings of the ACM Conference on Global Computing Education Vol 1
December 2023
180 pages
ISBN:9798400700484
DOI:10.1145/3576882
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2023

Permissions

Request permissions for this article.

Check for updates

Badges

  • Best Paper

Author Tags

  1. CS1
  2. assessment
  3. refute questions

Qualifiers

  • Research-article

Funding Sources

Conference

CompEd 2023
Sponsor:

Acceptance Rates

Overall Acceptance Rate 33 of 100 submissions, 33%

Upcoming Conference

CompEd '25
ACM Global Computing Education Conference 2025
October 21 - 25, 2025
Gaborone , Botswana

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)179
  • Downloads (Last 6 weeks)13
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media