skip to main content
10.1145/3663529.3664458acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
short-paper

Detecting Code Comment Inconsistencies using LLM and Program Analysis

Published: 10 July 2024 Publication History

Abstract

Code comments are the most important medium for documenting program logic and design. Nevertheless, as modern software undergoes frequent updates and modifications, maintaining the accuracy and relevance of comments becomes a labor-intensive endeavor. Drawing inspiration from the remarkable performance of Large Language Model (LLM) in comprehending software programs, this paper introduces a program analysis based and LLM-driven methodology for identifying inconsistencies in code comments. Our approach capitalizes on LLMs' ability to interpret natural language descriptions within code comments, enabling the extraction of design constraints. Subsequently, we employ program analysis techniques to accurately identify the implementation of these constraints. We instantiate this methodology using GPT 4.0, focusing on three prevalent types of constraints. In the experiment on 13 open-source projects, our approach identified 160 inconsistencies, and 23 of them have been confirmed and fixed by the developers.

References

[1]
Alberto Acerbi and Joseph M Stubbersfield. 2023. Large language models show human-like content biases in transmission chain experiments. Proceedings of the National Academy of Sciences, 120, 44 (2023), e2313790120.
[2]
Toufique Ahmed and Premkumar Devanbu. 2022. Few-shot training LLMs for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–5.
[3]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
[4]
Fazle Rabbi and Md Saeed Siddik. 2020. Detecting code comment inconsistency using siamese recurrent network. In Proceedings of the 28th International Conference on Program Comprehension. 371–375.
[5]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1, 8 (2019), 9.
[6]
Nataliia Stulova, Arianna Blasi, Alessandra Gorla, and Oscar Nierstrasz. 2020. Towards detecting inconsistent comments in java source code automatically. In 2020 IEEE 20th international working conference on source code analysis and manipulation (SCAM). 65–69.
[7]
Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T Leavens. 2012. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. 260–269.
[8]
Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A large-scale empirical study on code-comment inconsistencies. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). 53–64.
[9]
Frank F Xu, Uri Alon, Graham Neubig, and Vincent Josua Hellendoorn. 2022. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming. 1–10.
[10]
Zhengran Zeng, Hanzhuo Tan, Haotian Zhang, Jing Li, Yuqun Zhang, and Lingming Zhang. 2022. An extensive study on pre-trained models for program understanding and generation. In Proceedings of the 31st ACM SIGSOFT international symposium on software testing and analysis. 39–51.
[11]
Yichi Zhang. 2024. Possible Incompleteness · Issue #3697 · pgcentralfoundation/pgrx. https://github.com/pgcentralfoundation/pgrx/issues/1613 (Accessed on 03/03/2024)
[12]
Zoie Zhao, Sophie Song, Bridget Duah, Jamie Macbeth, Scott Carter, Monica P Van, Nayeli Suseth Bravo, Matthew Klenk, Kate Sick, and Alexandre LS Filipowicz. 2023. More human than human: LLM-generated narratives outperform human-LLM interleaved narratives. In Proceedings of the 15th Conference on Creativity and Cognition. 368–370.
[13]
Hao Zhong and Zhendong Su. 2013. Detecting API documentation errors. OOPSLA ’13. Association for Computing Machinery, New York, NY, USA. 803–816. isbn:9781450323741 https://doi.org/10.1145/2509136.2509523
[14]
Yu Zhou, Ruihang Gu, Taolue Chen, Zhiqiu Huang, Sebastiano Panichella, and Harald Gall. 2017. Analyzing APIs documentation and code to detect directive defects. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 27–37.
[15]
Yu Zhou, Changzhi Wang, Xin Yan, Taolue Chen, Sebastiano Panichella, and Harald Gall. 2018. Automatic detection and repair recommendation of directive defects in Java API documentation. IEEE Transactions on Software Engineering, 46, 9 (2018), 1004–1023.
[16]
Yu Zhou, Xin Yan, Taolue Chen, Sebastiano Panichella, and Harald Gall. 2019. DRONE: a tool to detect and repair directive defects in Java APIs documentation. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 115–118.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FSE 2024: Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering
July 2024
715 pages
ISBN:9798400706585
DOI:10.1145/3663529
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 July 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bug detection
  2. code comment inconsistency
  3. large language model
  4. program analysis

Qualifiers

  • Short-paper

Conference

FSE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 150
    Total Downloads
  • Downloads (Last 12 months)150
  • Downloads (Last 6 weeks)73
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media