skip to main content
10.1145/3491140.3528282acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesl-at-sConference Proceedingsconference-collections
research-article

LEGenT: Localizing Errors and Generating Testcases for CS1

Published: 01 June 2022 Publication History

Abstract

In a CS1 course, testcases are the most common way of providing feedback. However, manually designed testcases, even if carefully crafted, may miss out on certain crucial corner cases. These testcases are only generated once for the whole class and do not take into account errors generated by specific students. This paper presents LEGenT, an automated tool that generates personalized testcases for student submission. LEGenT first localizes a statement in the program that causes deviation from the expected behaviour. Then it generates testcases that expose the deviation to the student. Our premise is that such a targeted test would help students identify one of the early reasons for the deviation.
LEGenT works by separating buggy programs from the correct ones using an off-the-shelf formal equivalence checker. It uses another off-the-shelf tool to cluster buggy as well as correct programs. It aligns an incorrect program cluster with a nearby correct program cluster to generate relevant testcases for the incorrect program. We have evaluated our technique on 26082 (2764 correct + 23318 buggy) real student submissions across 11 different programming problems. LEGenT was able to localize an erroneous statement and generated testcases for 7696 buggy submissions. LEGenT fails to generate a testcase when the buggy submission is nowhere close to any correct submission. For the cases where LEGenT successfully generated testcases, we use a novel validation method to prove its accuracy, that LEGenT correctly identifies a cause of deviation from the correct behaviour.

Supplementary Material

MP4 File (lsfp068.mp4)
Due to large class sizes, providing frequent feedback to CS1 students for programming assignments is a demanding task. We present LEGenT, an automated tool to provide personalized feedback in the form of testcases for student submissions. LEGenT uses a few off-the-shelf tools to divide the submissions into structurally similar and semantically equivalent clusters. LEGenT then generates a testcase for the incorrect submissions in a cluster using a structurally similar correct submission such that the generated testcase exposes the deviation from the correct behaviour. We have evaluated our technique on 26082 real student submissions(2764 correct + 23318 incorrect) across 11 different programming problems. LEGenT localized erroneous statements and generated testcases for 7696 incorrect submissions. We have also presented a novel validation method to prove the accuracy of our results identification of the cause of deviation). We are able to validate 77.46% of LEGenT?s results using our novel validation process.

References

[1]
Kofi Adu-Manu, John Arthur, and Prince Amoako. 2013. Causes of Failure of Students in Computer Programming Courses: The Teacher Learner Perspective. International Journal of Computer Applications, Vol. 77 (09 2013), 27--32. https://doi.org/10.5120/13448--1311
[2]
Rohiza Ahmad, Aliza Sarlan, Ahmad Sobri Hashim, and Mohd Fadzil Hassan. 2017. Relationship between Hands-on and Written Coursework Assessments with Critical Thinking Skills in Structured Programming Course. In 2017 7th World Engineering Education Forum (WEEF). IEEE, New York, USA, 231--235. https://doi.org/10.1109/WEEF.2017.8466975
[3]
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2013. Compilers: Principles, Techniques and Tools .Pearson, United States.
[4]
Andrew W. Appel. 2004. Modern Compiler Implementation in C .Cambridge University Press, Cambridge, United Kingdom.
[5]
Thomas Ball, Mayur Naik, and Sriram K. Rajamani. 2003. From symptom to cause: Localizing errors in counterexample traces. ACM SIGPLAN Notices, Vol. 38, 1 (jan 2003), 97--105. https://doi.org/10.1145/640128.604140
[6]
Olle B"alter and Duane A. Bailey. 2010. Enjoying Python, Processing, and Java in CS1. ACM Inroads, Vol. 1, 4 (Dec. 2010), 28--32. https://doi.org/10.1145/1869746.1869758
[7]
Dennis Bouvier, Ellie Lovellette, and John Matta. 2021. Overnight Feedback Reduces Late Submissions on Programming Projects in CS1. In Australasian Computing Education Conference (Virtual, SA, Australia) (ACE '21). Association for Computing Machinery, New York, NY, USA, 176--180. https://doi.org/10.1145/3441636.3442319
[8]
Clang. 2007. https://clang.llvm.org/
[9]
Rajdeep Das, Umair Z. Ahmed, Amey Karkare, and Sumit Gulwani. 2016. Prutor: A system for tutoring CS1 and collecting student programs for analysis. arXiv preprint arXiv:1608.03828 (2016).
[10]
Dennis Felsing, Sarah Grebing, Vladimir Klebanov, Philipp Rümmer, and Mattias Ulbrich. 2014. Automating Regression Verification. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering (Vasteras, Sweden) (ASE '14). Association for Computing Machinery, New York, NY, USA, 349--360. https://doi.org/10.1145/2642937.2642987
[11]
Elena L. Glassman, Jeremy Scott, Rishabh Singh, Philip J. Guo, and Robert C. Miller. 2015. OverCode: Visualizing Variation in Student Solutions to Programming Problems at Scale. ACM Trans. Comput.-Hum. Interact., Vol. 22, 2, Article 7 (March 2015), 35 pages. https://doi.org/10.1145/2699751
[12]
Sumit Gulwani, Ivan Radicek, and Florian Zuleger. 2014. Feedback Generation for Performance Problems in Introductory Programming Assignments. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Vol. 16--21-Nove (mar 2014), 41--51. https://doi.org/10.1145/2635868.2635912 arxiv: 1403.4064
[13]
Sumit Gulwani, Ivan Radicek, and Florian Zuleger. 2018. Automated Clustering and Program Repair for Introductory Programming Assignments. SIGPLAN Not., Vol. 53, 4 (June 2018), 465--480. https://doi.org/10.1145/3296979.3192387
[14]
Yang Hu, Umair Z. Ahmed, Sergey Mechtaev, Ben Leong, and Abhik Roychoudhury. 2019. Re-Factoring Based Program Repair Applied to Programming Assignments. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, San Diego, CA, USA, 388--398. https://doi.org/10.1109/ASE.2019.00044
[15]
Jinru Hua, Mengshi Zhang, Kaiyuan Wang, and Sarfraz Khurshid. 2018. Towards practical program repair with on-demand candidate generation. In Proceedings of the 40th International Conference on Software Engineering. Association for Computing Machinery (ACM), New York, NY, USA, 12--23. https://doi.org/10.1145/3180155.3180245
[16]
I. Huet, O.R. Pacheco, J. Tavares, and G. Weir. 2004. New challenges in teaching introductory programming courses: a case study. In 34th Annual Frontiers in Education, 2004. FIE 2004. IEEE, New York, USA, T2H/5--T2H/9 Vol. 1. https://doi.org/10.1109/FIE.2004.1408514
[17]
Victor J. Marin, Tobin Pereira, Srinivas Sridharan, and Carlos R. Rivero. 2017. Automated personalized feedback in introductory Java programming MOOCs. In Proceedings - International Conference on Data Engineering. IEEE Computer Society, San Diego, CA, USA, 1259--1270. https://doi.org/10.1109/ICDE.2017.169
[18]
Microsoft. 2022. https://github.com/Z3Prover/z3
[19]
Shramos. 2018. https://github.com/shramos/pyc-cfg
[20]
Simon, Oscar Karnalim, Judy Sheard, Ilir Dema, Amey Karkare, Juho Leinonen, Michael Liut, and Renée McCauley. 2020. Choosing Code Segments to Exclude from Code Similarity Detection. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education (Trondheim, Norway) (ITiCSE-WGR '20). Association for Computing Machinery, New York, NY, USA, 1--19. https://doi.org/10.1145/3437800.3439201
[21]
Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013a. Automated Feedback Generation for Introductory Programming Assignments. SIGPLAN Not., Vol. 48, 6 (June 2013), 15--26. https://doi.org/10.1145/2499370.2462195
[22]
Rishabh Singh, Sumit Gulwani, and Armando Solar-Lezama. 2013b. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13. Association for Computing Machinery (ACM), New York, New York, USA, 15. https://doi.org/10.1145/2491956.2462195
[23]
Dowon Song, Myungho Lee, and Hakjoo Oh. 2019. Automatic and scalable detection of logical errors in functional programming assignments. Proceedings of the ACM on Programming Languages, Vol. 3, OOPSLA (2019), 1--30. https://doi.org/10.1145/3360614
[24]
Ke Wang, Benjamin Lin, Bjorn Rettig, Paul Pardi, and Rishabh Singh. 2017. Data-driven feedback generator for online programing courses. In L@S 2017 - Proceedings of the 4th (2017) ACM Conference on Learning at Scale. Association for Computing Machinery, Inc, New York, New York, USA, 257--260. https://doi.org/10.1145/3051457.3053999
[25]
Ke Wang, Rishabh Singh, and Zhendong Su. 2018. Search, align, and repair: Data-driven feedback generation for introductory programming exercises. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). Association for Computing Machinery, New York, NY, USA, 481--495. https://doi.org/10.1145/3192366.3192384 arxiv: 1711.07148
[26]
Douglas B. West. 2000. Introduction to Graph Theory 2 ed.). Prentice Hall.
[27]
Qi Xin and Steven Reiss. 2019. Better Code Search and Reuse for Better Program Repair. In 2019 IEEE/ACM International Workshop on Genetic Improvement (GI). IEEE, Montreal, QC, Canada, 10--17. https://doi.org/10.1109/GI.2019.00012

Cited By

View all
  • (2024)Automating Autograding: Large Language Models as Test Suite Generators for Introductory ProgrammingJournal of Computer Assisted Learning10.1111/jcal.1310041:1Online publication date: 25-Dec-2024
  • (2023)Proving and Disproving Equivalence of Functional Programming AssignmentsProceedings of the ACM on Programming Languages10.1145/35912587:PLDI(928-951)Online publication date: 6-Jun-2023
  • (2023)Investigating the Potential of GPT-3 in Providing Feedback for Programming AssessmentsProceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 110.1145/3587102.3588852(292-298)Online publication date: 29-Jun-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
L@S '22: Proceedings of the Ninth ACM Conference on Learning @ Scale
June 2022
491 pages
ISBN:9781450391580
DOI:10.1145/3491140
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CS1
  2. automated feedback
  3. clustering
  4. error localization
  5. targeted testcases
  6. testcase generation

Qualifiers

  • Research-article

Conference

L@S '22
L@S '22: Ninth (2022) ACM Conference on Learning @ Scale
June 1 - 3, 2022
NY, New York City, USA

Acceptance Rates

Overall Acceptance Rate 117 of 440 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automating Autograding: Large Language Models as Test Suite Generators for Introductory ProgrammingJournal of Computer Assisted Learning10.1111/jcal.1310041:1Online publication date: 25-Dec-2024
  • (2023)Proving and Disproving Equivalence of Functional Programming AssignmentsProceedings of the ACM on Programming Languages10.1145/35912587:PLDI(928-951)Online publication date: 6-Jun-2023
  • (2023)Investigating the Potential of GPT-3 in Providing Feedback for Programming AssessmentsProceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 110.1145/3587102.3588852(292-298)Online publication date: 29-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media