skip to main content
10.1145/3544548.3580919acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article

Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming

Published: 19 April 2023 Publication History

Abstract

AI code generators like OpenAI Codex have the potential to assist novice programmers by generating code from natural language descriptions, however, over-reliance might negatively impact learning and retention. To explore the implications that AI code generators have on introductory programming, we conducted a controlled experiment with 69 novices (ages 10-17). Learners worked on 45 Python code-authoring tasks, for which half of the learners had access to Codex, each followed by a code-modification task. Our results show that using Codex significantly increased code-authoring performance (1.15x increased completion rate and 1.8x higher scores) while not decreasing performance on manual code-modification tasks. Additionally, learners with access to Codex during the training phase performed slightly better on the evaluation post-tests conducted one week later, although this difference did not reach statistical significance. Of interest, learners with higher Scratch pre-test scores performed significantly better on retention post-tests, if they had prior access to Codex.

Supplementary Material

MP4 File (3544548.3580919-video-preview.mp4)
Video Preview

References

[1]
Amjad Altadmri and Neil CC Brown. 2015. 37 million compilations: Investigating novice programming mistakes in large-scale student data. In Proceedings of the 46th ACM technical symposium on computer science education. 522–527.
[2]
Amazon Web Services. 2022. CodeWhisperer: ML-powered coding companion. https://aws.amazon.com/codewhisperer/. [Online; accessed 9-September-2022].
[3]
Thomas Ball, Abhijith Chatra, Peli de Halleux, Steve Hodges, Michał Moskal, and Jacqueline Russell. 2019. Microsoft MakeCode: embedded programming for education, in blocks and TypeScript. In Proceedings of the 2019 ACM SIGPLAN Symposium on SPLASH-E. 7–12.
[4]
Bruce W Ballard and Alan W Biermann. 1979. Programming in natural language: “NLC” as a prototype. In Proceedings of the 1979 annual conference. 228–237.
[5]
Matej Balog, Alexander L Gaunt, Marc Brockschmidt, Sebastian Nowozin, and Daniel Tarlow. 2016. Deepcoder: Learning to write programs. arXiv preprint arXiv:1611.01989(2016).
[6]
David Bau, D Anthony Bau, Mathew Dawson, and C Sydney Pickens. 2015. Pencil code: block code for a text world. In Proceedings of the 14th international conference on interaction design and children. 445–448.
[7]
Brett A Becker and Keith Quille. 2019. 50 years of cs1 at sigcse: A review of the evolution of introductory programming education research. In Proceedings of the 50th acm technical symposium on computer science education. 338–344.
[8]
Andrew Begel. 1996. LogoBlocks: A graphical programming language for interacting with the world. Electrical Engineering and Computer Science Department, MIT, Boston, MA 2 (1996).
[9]
Andrew Begel and Susan L Graham. 2005. Spoken programs. In 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05). IEEE, 99–106.
[10]
Klara Benda, Amy Bruckman, and Mark Guzdial. 2012. When life and learning do not fit: Challenges of workload and communication in introductory computer science online. ACM Transactions on Computing Education (TOCE) 12, 4 (2012), 1–38.
[11]
Alan W Biermann, Bruce W Ballard, and Anne H Sigmon. 1983. An experimental study of natural language programming. International journal of man-machine studies 18, 1 (1983), 71–87.
[12]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
[13]
Miriam Bruhn and David McKenzie. 2009. In pursuit of balance: Randomization in practice in development field experiments. American economic journal: applied economics 1, 4 (2009), 200–232.
[14]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374(2021).
[15]
Parmit K Chilana, Rishabh Singh, and Philip J Guo. 2016. Understanding conversational programmers: A perspective from the software industry. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1462–1472.
[16]
Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311(2022).
[17]
Stephen Cooper, Wanda Dann, and Randy Pausch. 2000. Alice: a 3-D tool for introductory programming concepts. Journal of computing sciences in colleges 15, 5 (2000), 107–116.
[18]
Aditya Desai, Sumit Gulwani, Vineet Hingorani, Nidhi Jain, Amey Karkare, Mark Marron, and Subhajit Roy. 2016. Program synthesis using natural language. In Proceedings of the 38th International Conference on Software Engineering. 345–356.
[19]
Edsger W Dijkstra. 1979. On the foolishness of” natural language programming”. Program Construction, International Summer School (1979), 51–53.
[20]
Pierre Dragicevic. 2015. HCI Statistics without p-values. Ph. D. Dissertation. Inria.
[21]
Benedict Du Boulay. 1986. Some difficulties of learning to program. Journal of Educational Computing Research 2, 1 (1986), 57–73.
[22]
Rodrigo Duran, Juha Sorva, and Sofia Leite. 2018. Towards an analysis of program complexity from a cognitive perspective. In Proceedings of the 2018 ACM Conference on International Computing Education Research. 21–30.
[23]
Rodrigo Duran, Albina Zavgorodniaia, and Juha Sorva. 2022. Cognitive Load Theory in Computing Education Research: A Review. ACM Transactions on Computing Education (TOCE) 22, 4 (2022), 1–27.
[24]
Barbara J Ericson, James D Foley, and Jochen Rick. 2018. Evaluating the efficiency and effectiveness of adaptive parsons problems. In Proceedings of the 2018 ACM Conference on International Computing Education Research. 60–68.
[25]
Barbara J Ericson, Lauren E Margulieux, and Jochen Rick. 2017. Solving parsons problems versus fixing and writing code. In Proceedings of the 17th koli calling international conference on computing education research. 20–29.
[26]
Garry Falloon. 2016. An analysis of young students’ thinking when completing basic coding tasks using Scratch Jnr. On the iPad. Journal of Computer Assisted Learning 32, 6 (2016), 576–593.
[27]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, 2020. Codebert: A pre-trained model for programming and natural languages. arXiv preprint arXiv:2002.08155(2020).
[28]
Georgios Fessakis, Evangelia Gouli, and Elisavet Mavroudi. 2013. Problem solving by 5–6 years old kindergarten children in a computer programming environment: A case study. Computers & Education 63 (2013), 87–97.
[29]
James Finnie-Ansley, Paul Denny, Brett A Becker, Andrew Luxton-Reilly, and James Prather. 2022. The robots are coming: Exploring the implications of openai codex on introductory programming. In Australasian Computing Education Conference. 10–19.
[30]
Diana Franklin, Merijke Coenraad, Jennifer Palmer, Donna Eatinger, Anna Zipp, Marco Anaya, Max White, Hoang Pham, Ozan Gökdemir, and David Weintrop. 2020. An Analysis of Use-Modify-Create Pedagogical Approach’s Success in Balancing Structure and Student Agency. In Proceedings of the 2020 ACM Conference on International Computing Education Research. 14–24.
[31]
Github. 2022. Copilot: Your AI pair programmer. https://github.com/features/copilot. [Online; accessed 9-September-2022].
[32]
Shuchi Grover and Roy Pea. 2013. Computational thinking in K–12: A review of the state of the field. Educational researcher 42, 1 (2013), 38–43.
[33]
Sumit Gulwani. 2011. Automating string processing in spreadsheets using input-output examples. ACM Sigplan Notices 46, 1 (2011), 317–330.
[34]
Sumit Gulwani and Mark Marron. 2014. Nlyze: Interactive programming by natural language for spreadsheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 803–814.
[35]
Maria Hristova, Ananya Misra, Megan Rutter, and Rebecca Mercuri. 2003. Identifying and correcting Java programming errors for introductory computer science students. ACM Sigcse Bulletin 35, 1 (2003), 153–156.
[36]
Edwin L Hutchins, James D Hollan, and Donald A Norman. 1985. Direct manipulation interfaces. Human–computer interaction 1, 4 (1985), 311–338.
[37]
Paul Jaccard. 1901. Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull Soc Vaudoise Sci Nat 37 (1901), 547–579.
[38]
Dhanya Jayagopal, Justin Lubin, and Sarah E Chasins. 2022. Exploring the Learnability of Program Synthesizers by Novice Programmers. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–15.
[39]
Ellen Jiang, Edwin Toh, Alejandra Molina, Kristen Olson, Claire Kayacik, Aaron Donsbach, Carrie J Cai, and Michael Terry. 2022. Discovering the syntax and strategies of natural language programming with generative language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
[40]
Filiz Kalelioğlu. 2015. A new way of teaching programming skills to K-12 students: Code. org. Computers in Human Behavior 52 (2015), 200–210.
[41]
Filiz Kalelioglu and Yasemin Gülbahar. 2014. The Effects of Teaching Programming via Scratch on Problem Solving Skills: A Discussion from Learners’ Perspective.Informatics in education 13, 1 (2014), 33–50.
[42]
Majeed Kazemitabaar, Viktar Chyhir, David Weintrop, and Tovi Grossman. 2022. CodeStruct: Design and Evaluation of an Intermediary Programming Environment for Novices to Transition from Scratch to Python. In Interaction Design and Children. 261–273.
[43]
Majeed Kazemitabaar, Viktar Chyhir, David Weintrop, and Tovi Grossman. 2023. Scaffolding Progress: How Structured Editors Shape Novice Errors When Transitioning from Blocks to Text. In Proceedings of the 54th acm technical symposium on computer science education.
[44]
Paivi Kinnunen and Beth Simon. 2010. Experiencing programming assignments in CS1: the emotional toll. In Proceedings of the Sixth international workshop on Computing education research. 77–86.
[45]
Päivi Kinnunen and Beth Simon. 2011. CS majors’ self-efficacy perceptions in CS1: results in light of social cognitive theory. In Proceedings of the seventh international workshop on Computing education research. 19–26.
[46]
Paul Kirschner, John Sweller, and Richard E Clark. 2006. Why unguided learning does not work: An analysis of the failure of discovery learning, problem-based learning, experiential learning and inquiry-based learning. Educational Psychologist 41, 2 (2006), 75–86.
[47]
Roman Knöll and Mira Mezini. 2006. Pegasus: first steps toward a naturalistic programming language. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications. 542–559.
[48]
Michael Kölling, Neil CC Brown, and Amjad Altadmri. 2015. Frame-based editing: Easing the transition from blocks to text-based programming. In Proceedings of the Workshop in Primary and Secondary Computing Education. 29–38.
[49]
Amruth N Kumar. 2013. A study of the influence of code-tracing problems on code-writing skills. In Proceedings of the 18th ACM conference on Innovation and technology in computer science education. 183–188.
[50]
Mathias Landhäußer, Sebastian Weigelt, and Walter F Tichy. 2017. NLCI: a natural language command interpreter. Automated Software Engineering 24 (2017), 839–861.
[51]
Vu Le, Sumit Gulwani, and Zhendong Su. 2013. Smartsynth: Synthesizing smartphone automation scripts from natural language. In Proceeding of the 11th annual international conference on Mobile systems, applications, and services. 193–206.
[52]
Irene Lee, Fred Martin, Jill Denner, Bob Coulter, Walter Allan, Jeri Erickson, Joyce Malyn-Smith, and Linda Werner. 2011. Computational thinking for youth in practice. Acm Inroads 2, 1 (2011), 32–37.
[53]
Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, 2022. Competition-level code generation with alphacode. Science 378, 6624 (2022), 1092–1097.
[54]
Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiskỳ, Andrew Senior, Fumin Wang, and Phil Blunsom. 2016. Latent predictor networks for code generation. arXiv preprint arXiv:1603.06744(2016).
[55]
Raymond Lister, Colin Fidge, and Donna Teague. 2009. Further evidence of a relationship between explaining, tracing and writing skills in introductory programming. Acm sigcse bulletin 41, 3 (2009), 161–165.
[56]
Greg Little and Robert C Miller. 2006. Translating keyword commands into executable code. In Proceedings of the 19th annual ACM symposium on User interface software and technology. 135–144.
[57]
Mike Lopez, Jacqueline Whalley, Phil Robbins, and Raymond Lister. 2008. Relationships between reading, tracing and writing skills in introductory programming. In Proceedings of the fourth international workshop on computing education research. 101–112.
[58]
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, Duyu Tang, 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664(2021).
[59]
Andrew Luxton-Reilly, Ibrahim Albluwi, Brett A Becker, Michail Giannakos, Amruth N Kumar, Linda Ott, James Paterson, Michael James Scott, Judy Sheard, and Claudia Szabo. 2018. Introductory programming: a systematic literature review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education. 55–106.
[60]
I Scott MacKenzie. 2012. Human-computer interaction: An empirical research perspective. (2012).
[61]
Lauren E Margulieux, Mark Guzdial, and Richard Catrambone. 2012. Subgoal-labeled instructional material improves performance and transfer in learning to develop mobile applications. In Proceedings of the ninth annual international conference on International computing education research. 71–78.
[62]
Raymond B Miller, Gwendolyn N Kelly, and Joseph T Kelly. 1988. Effects of Logo computer programming experience on problem solving and spatial relations ability. Contemporary Educational Psychology 13, 4 (1988), 348–357.
[63]
Brad A Myers, John F Pane, and Amy J Ko. 2004. Natural programming languages and environments. Commun. ACM 47, 9 (2004), 47–52.
[64]
John Francis Pane. 2002. A programming system for children that is designed for usability. Carnegie Mellon University.
[65]
John F Pane and Brad A Myers. 2006. More natural programming languages and environments. End user development(2006), 31–50.
[66]
Dale Parsons and Patricia Haden. 2006. Parson’s programming puzzles: a fun and effective learning tool for first programming courses. In Proceedings of the 8th Australasian Conference on Computing Education-Volume 52. 157–163.
[67]
Shahira Popat and Louise Starkey. 2019. Learning to code or coding to learn? A systematic review. Computers & Education 128 (2019), 365–376.
[68]
David Price, Ellen Rilofff, Joseph Zachary, and Brandon Harvey. 2000. NaturalJava: A natural language interface for programming in Java. In Proceedings of the 5th international conference on Intelligent user interfaces. 207–211.
[69]
Sarantos Psycharis and Maria Kallia. 2017. The effects of computer programming on high school students’ reasoning skills and mathematical self-efficacy and problem solving. Instructional science 45, 5 (2017), 583–602.
[70]
Yizhou Qian and James Lehman. 2017. Students’ misconceptions and other difficulties in introductory programming: A literature review. ACM Transactions on Computing Education (TOCE) 18, 1 (2017), 1–24.
[71]
Chris Quirk, Raymond Mooney, and Michel Galley. 2015. Language to code: Learning semantic parsers for if-this-then-that recipes. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 878–888.
[72]
Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: synthesizing what I mean: code search and idiomatic snippet synthesis. In Proceedings of the 38th International Conference on Software Engineering. 357–367.
[73]
Mohammad Raza, Sumit Gulwani, and Natasa Milic-Frayling. 2015. Compositional program synthesis from natural language and examples. In IJCAI 2015.
[74]
Alexander Renkl. 2005. The worked-out-example principle in multimedia learning. The Cambridge handbook of multimedia learning (2005), 229–245.
[75]
Mitchel Resnick. 2014. Give P’sa chance: Projects, peers, passion, play. In Constructionism and creativity: Proceedings of the third international constructionism conference. Austrian computer society, Vienna. 13–20.
[76]
Mitchel Resnick, John Maloney, Andrés Monroy-Hernández, Natalie Rusk, Evelyn Eastmond, Karen Brennan, Amon Millner, Eric Rosenbaum, Jay Silver, Brian Silverman, 2009. Scratch: programming for all. Commun. ACM 52, 11 (2009), 60–67.
[77]
Mitchel Resnick and David Siegel. 2015. A different approach to coding. International Journal of People-Oriented Programming 4, 1(2015), 1–4.
[78]
Ma Mercedes T Rodrigo and Ryan SJd Baker. 2009. Coarse-grained detection of student frustration in an introductory programming course. In Proceedings of the fifth international workshop on Computing education research workshop. 75–80.
[79]
Sami Sarsa, Paul Denny, Arto Hellas, and Juho Leinonen. 2022. Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1. 27–43.
[80]
Viktor Schlegel, Benedikt Lang, Siegfried Handschuh, and André Freitas. 2019. Vajra: step-by-step programming with natural language. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 30–39.
[81]
Teemu Sirkiä and Juha Sorva. 2012. Exploring programming misconceptions: an analysis of student mistakes in visual program simulation exercises. In Proceedings of the 12th Koli Calling International Conference on Computing Education Research. 19–28.
[82]
Gail M Sullivan and Richard Feinn. 2012. Using effect size—or why the P value is not enough. Journal of graduate medical education 4, 3 (2012), 279–282.
[83]
Jiao Sun, Q Vera Liao, Michael Muller, Mayank Agarwal, Stephanie Houde, Kartik Talamadupula, and Justin D Weisz. 2022. Investigating explainability of generative AI for code through scenario-based design. In 27th International Conference on Intelligent User Interfaces. 212–228.
[84]
John Sweller, Jeroen JG van Merriënboer, and Fred Paas. 2019. Cognitive architecture and instructional design: 20 years later. Educational Psychology Review 31 (2019), 261–292.
[85]
Tabnine. 2022. Tabnine: AI assistant for software developers. https://www.tabnine.com/. [Online; accessed 9-September-2022].
[86]
Priyan Vaithilingam, Tianyi Zhang, and Elena L Glassman. 2022. Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In Chi conference on human factors in computing systems extended abstracts. 1–7.
[87]
Jeroen JG Van Merriënboer, Paul A Kirschner, and Liesbeth Kester. 2003. Taking the load off a learner’s mind: Instructional design for complex learning. Educational psychologist 38, 1 (2003), 5–13.
[88]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
[89]
Lev Semenovich Vygotsky and Michael Cole. 1978. Mind in society: Development of higher psychological processes. Harvard university press.
[90]
Mary Webb, Niki Davis, Tim Bell, Yaacov J Katz, Nicholas Reynolds, Dianne P Chambers, and Maciej M Sysło. 2017. Computer science in K-12 school curricula of the 2lst century: Why, what and when?Education and Information Technologies 22 (2017), 445–468.
[91]
Jeannette M Wing. 2006. Computational thinking. Commun. ACM 49, 3 (2006), 33–35.
[92]
David Wolber. 2011. App inventor and real-world motivation. In Proceedings of the 42nd ACM technical symposium on Computer science education. 601–606.
[93]
Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. arXiv preprint arXiv:1704.01696(2017).
[94]
Rui Zhi, Min Chi, Tiffany Barnes, and Thomas W Price. 2019. Evaluating the effectiveness of parsons problems for block-based programming. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 51–59.
[95]
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103(2017).

Cited By

View all

Index Terms

  1. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CHI '23: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
      April 2023
      14911 pages
      ISBN:9781450394215
      DOI:10.1145/3544548
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 19 April 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. AI Coding Assistants
      2. AI-Assisted Pair-Programming
      3. ChatGPT
      4. Copilot
      5. GPT-3
      6. Introductory Programming
      7. K-12 Computer Science Education
      8. Large Language Models
      9. OpenAI Codex

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      CHI '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 6,199 of 26,314 submissions, 24%

      Upcoming Conference

      CHI 2025
      ACM CHI Conference on Human Factors in Computing Systems
      April 26 - May 1, 2025
      Yokohama , Japan

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3,191
      • Downloads (Last 6 weeks)394
      Reflects downloads up to 06 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media