skip to main content
10.1145/3613905.3650896acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Work in Progress

Evaluating Human-AI Partnership for LLM-based Code Migration

Published: 11 May 2024 Publication History

Abstract

The potential of Generative AI, especially Large Language Models (LLMs), to transform software development is remarkable. In this paper, we focus on one area in software development called “code migration”. We define code migration as the process of transitioning the language version of a code repository by converting both the source code and its dependencies. Carefully designing an effective human-AI partnership is essential for boosting developer productivity and faster migrations when performing code migrations. Though human-AI partnerships have been generally explored in the literature, their application to code migrations remains largely unexamined. In this work, we leverage an LLM-based code migration tool called Amazon Q Code Transformation to conduct semi-structured interviews with 11 participants undertaking code migrations. We discuss human’s role in the human-AI partnership (human as a director and a reviewer) and define a trust framework based on various model outcomes to earn trust with LLMs. The guidelines presented in this paper offer a vital starting point for designing human-AI partnerships that effectively augment and complement human capabilities in software development with Generative AI.

Supplemental Material

MP4 File - Video Preview
Video Preview
MP4 File
Talk Video

References

[1]
Kamel Alrashedy. 2023. Language Models are Better Bug Detector Through Code-Pair Classification. arXiv preprint arXiv:2311.07957 (2023).
[2]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N. Bennett, Kori Inkpen, Jaime Teevan, Ruth Kikin-Gil, and Eric Horvitz. 2019. Guidelines for Human-AI Interaction(CHI ’19).
[3]
Shraddha Barke, Michael B James, and Nadia Polikarpova. 2023. Grounded copilot: How programmers interact with code-generating models. Proceedings of the ACM on Programming Languages 7, OOPSLA1 (2023), 85–111.
[4]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).
[5]
Xinyun Chen, Maxwell Lin, Nathanael Schärli, and Denny Zhou. 2023. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128 (2023).
[6]
Nazli Cila. 2022. Designing Human-Agent Collaborations: Commitment, Responsiveness, and Support(CHI ’22).
[7]
Lucas Colusso, Cynthia L Bennett, Gary Hsieh, and Sean A Munson. 2017. Translational resources: Reducing the gap between academic research and HCI practice. In Proceedings of the 2017 conference on designing interactive systems. 957–968.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[9]
Erik Du Plessis. 1994. Recognition versus recall. Journal of Advertising Research 34, 3 (1994), 75–92.
[10]
Shubhang Shekhar Dvivedi, Vyshnav Vijay, Sai Leela Rahul Pujari, Shoumik Lodh, and Dhruv Kumar. 2023. A Comparative Analysis of Large Language Models for Code Documentation Generation. arXiv preprint arXiv:2312.10349 (2023).
[11]
Upol Ehsan, Q. Vera Liao, Michael Muller, Mark O. Riedl, and Justin D. Weisz. 2021. Expanding Explainability: Towards Social Transparency in AI Systems(CHI ’21).
[12]
Judith E Fan, Monica Dinculescu, and David Ha. 2019. Collabdraw: an environment for collaborative sketching with an artificial agent. In Proceedings of the 2019 on Creativity and Cognition. 556–561.
[13]
Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, and Jacob Howcroft. 2019. The bach doodle: Approachable music composition with machine learning at scale. arXiv preprint arXiv:1907.06637 (2019).
[14]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. 2023. SWE-bench: Can Language Models Resolve Real-World GitHub Issues?arXiv preprint arXiv:2310.06770 (2023).
[15]
Sungmin Kang, Gabin An, and Shin Yoo. 2023. A Preliminary Evaluation of LLM-Based Fault Localization. arXiv preprint arXiv:2308.05487 (2023).
[16]
Pegah Karimi, Jeba Rezwana, Safat Siddiqui, Mary Lou Maher, and Nasrin Dehbozorgi. 2020. Creative sketching partner: an analysis of human-AI co-creativity. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 221–230.
[17]
Sunnie S. Y. Kim, Elizabeth Anne Watkins, Olga Russakovsky, Ruth Fong, and Andrés Monroy-Hernández. 2023. "Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction(CHI ’23).
[18]
Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. arXiv preprint arXiv:2006.03511 (2020).
[19]
Chen Ling, Xujiang Zhao, Jiaying Lu, Chengyuan Deng, Can Zheng, Junxiang Wang, Tanmoy Chowdhury, Yun Li, Hejie Cui, Tianjiao Zhao, 2023. Beyond One-Model-Fits-All: A Survey of Domain Specialization for Large Language Models. arXiv preprint arXiv:2305.18703 (2023).
[20]
Shuai Ma, Ying Lei, Xinru Wang, Chengbo Zheng, Chuhan Shi, Ming Yin, and Xiaojuan Ma. 2023. Who Should I Trust: AI or Myself? Leveraging Human and AI Correctness Likelihood to Promote Appropriate Trust in AI-Assisted Decision-Making(CHI ’23).
[21]
Andrew M McNutt, Chenglong Wang, Robert A Deline, and Steven M Drucker. 2023. On the design of ai-powered code assistants for notebooks. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–16.
[22]
Jane N Mosier and Sidney L Smith. 1986. Application of guidelines for designing user interface software. Behaviour & information technology 5, 1 (1986), 39–46.
[23]
Changhoon Oh, Jungwoo Song, Jinhan Choi, Seonghyeon Kim, Sungwoo Lee, and Bongwon Suh. 2018. I lead, you help but only with enough details: Understanding user experience of co-creation with artificial intelligence. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
[24]
Cheng Peng, Xi Yang, Aokun Chen, Kaleb E Smith, Nima PourNejatian, Anthony B Costa, Cheryl Martin, Mona G Flores, Ying Zhang, Tanja Magoc, 2023. A Study of Generative Large Language Model for Medical Research and Healthcare. arXiv preprint arXiv:2305.13523 (2023).
[25]
Marc Pinski, Martin Adam, and Alexander Benlian. 2023. AI Knowledge: Improving AI Delegation through Human Enablement. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
[26]
Atsushi Shirafuji, Yusuke Oda, Jun Suzuki, Makoto Morishita, and Yutaka Watanobe. 2023. Refactoring Programs Using Large Language Models with Few-Shot Examples. arXiv preprint arXiv:2311.11690 (2023).
[27]
Lingyun Sun, Zhuoshu Li, Yuyang Zhang, Yanzhen Liu, Shanghua Lou, and Zhibin Zhou. 2021. Capturing the Trends, Applications, Issues, and Potential Strategies of Designing Transparent AI Agents(CHI EA ’21).
[28]
AWS AI Team. [n. d.]. Amazon Q Code Transformation (Preview). https://aws.amazon.com/q/aws/code-transformation/. Accessed: 2023-12-30.
[29]
Linda Tetzlaff and David R Schwartz. 1991. The use of guidelines in interface design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 329–333.
[30]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2018. Deep learning similarities from different representations of source code. In Proceedings of the 15th international conference on mining software repositories. 542–553.
[31]
Fengjie Wang, Xuye Liu, Oujing Liu, Ali Neshati, Tengfei Ma, Min Zhu, and Jian Zhao. 2023. Slide4N: Creating Presentation Slides from Computational Notebooks with Human-AI Collaboration(CHI ’23).
[32]
Thomas Weber, Heinrich Hußmann, Zhiwei Han, Stefan Matthes, and Yuanting Liu. 2020. Draw with me: Human-in-the-loop for image restoration. In Proceedings of the 25th International Conference on Intelligent User Interfaces. 243–253.
[33]
Justin D Weisz, Michael Muller, Jessica He, and Stephanie Houde. 2023. Toward general design principles for generative AI applications. arXiv preprint arXiv:2301.05578 (2023).
[34]
Justin D Weisz, Michael Muller, Stephanie Houde, John Richards, Steven I Ross, Fernando Martinez, Mayank Agarwal, and Kartik Talamadupula. 2021. Perfection not required? Human-AI partnerships in code translation. In 26th International Conference on Intelligent User Interfaces. 402–412.
[35]
Burak Yetiştiren, Işık Özsoy, Miray Ayerdem, and Eray Tüzün. 2023. Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT. arXiv preprint arXiv:2304.10778 (2023).
[36]
Qiaoning Zhang, Matthew L Lee, and Scott Carter. 2022. You Complete Me: Human-AI Teams and Complementary Expertise(CHI ’22).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems
May 2024
4761 pages
ISBN:9798400703317
DOI:10.1145/3613905
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 May 2024

Check for updates

Author Tags

  1. Application Modernization
  2. Code Migration
  3. Human-AI Partnership
  4. Human-in-the-Loop Techniques
  5. Trust Framework

Qualifiers

  • Work in progress
  • Research
  • Refereed limited

Conference

CHI '24

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 731
    Total Downloads
  • Downloads (Last 12 months)731
  • Downloads (Last 6 weeks)79
Reflects downloads up to 01 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media