research-article

Code completion with statistical language models

Authors:

Veselin Raychev,

Eran YahavAuthors Info & Claims

PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation

Pages 419 - 428

https://doi.org/10.1145/2594291.2594321

Published: 09 June 2014 Publication History

Abstract

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.

Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

References

[1]

Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.

[2]

Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.

[3]

Stack overflow. http://www.stackoverflow.com/.

[4]

Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.

[5]

Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.

[6]

Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.

[7]

Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.

[8]

Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002).

Digital Library

[9]

Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11.

Digital Library

[10]

Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155.

Digital Library

[11]

Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249.

Digital Library

[12]

Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328.

Digital Library

[13]

Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.

[14]

Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10.

Digital Library

[15]

Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013).

Digital Library

[16]

Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011.

Digital Library

[17]

Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012).

Digital Library

[18]

Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05.

Digital Library

[19]

Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240.

Digital Library

[20]

Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.

[21]

Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.

[22]

Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.

[23]

Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005).

Digital Library

[24]

Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.

[25]

Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012).

Digital Library

[26]

Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012).

Digital Library

[27]

Reiss, S. P. Semantics-based code search. In ICSE'09.

Digital Library

[28]

Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.

[29]

Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007).

Digital Library

[30]

Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009).

Digital Library

[31]

Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415.

Digital Library

[32]

Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010).

Digital Library

[33]

Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41.

Digital Library

[34]

Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).

[35]

Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007).

Digital Library

[36]

Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.

[37]

Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008).

Digital Library

[38]

Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18.

Digital Library

[39]

Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476.

Digital Library

[40]

Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094.

Digital Library

[41]

Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291.

Digital Library

[42]

Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011).

Digital Library

[43]

Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09.

Digital Library

Cited By

Liu YYin YDeng JLi WPeng Z(2024)A Combinatorial Strategy for API Completion: Deep Learning and HeuristicsElectronics10.3390/electronics1318366913:18(3669)Online publication date: 15-Sep-2024
https://doi.org/10.3390/electronics13183669
Bibi NMaqbool ARana TAfzal FKhan A(2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
https://doi.org/10.3390/app14135795
Weber TBrandmaier MSchmidt AMayer S(2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661145
Show More Cited By

Index Terms

Code completion with statistical language models
1. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program verification

Recommendations

Code completion with statistical language models
PLDI '14

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.

Our main idea is to reduce the problem of code completion to ...
Exploring and Improving Code Completion for Test Code
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

Code completion is an important feature in Integrated Development Environments (IDEs). These years, researchers have been making efforts for intelligent code completion. However, existing work on intelligent code completion either only considered ...
Language-parametric static semantic code completion

Code completion is an editor service in IDEs that proposes code fragments for the user to insert at the caret position in their code. Code completion should be sound and complete. It should be sound, such that it only proposes fragments that do not ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation

June 2014

619 pages

ISBN:9781450327848

DOI:10.1145/2594291

General Chair:
Michael O'Boyle
University of Edinburgh
,
Program Chair:
Keshav Pingali
University of Texas, Austin

ACM SIGPLAN Notices Volume 49, Issue 6
PLDI '14
June 2014
598 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2666356
Editor:
Andy Gill
University of Kansas, Lawrence, KS
Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

PLDI '14

Sponsor:

PLDI '14: ACM SIGPLAN Conference on Programming Language Design and Implementation

June 9 - 11, 2014

Edinburgh, United Kingdom

Acceptance Rates

PLDI '14 Paper Acceptance Rate 52 of 287 submissions, 18%;

Overall Acceptance Rate 406 of 2,067 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

486
Total Citations
View Citations
2,832
Total Downloads

Downloads (Last 12 months)301
Downloads (Last 6 weeks)19

Reflects downloads up to 14 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu YYin YDeng JLi WPeng Z(2024)A Combinatorial Strategy for API Completion: Deep Learning and HeuristicsElectronics10.3390/electronics1318366913:18(3669)Online publication date: 15-Sep-2024
https://doi.org/10.3390/electronics13183669
Bibi NMaqbool ARana TAfzal FKhan A(2024)C2B: A Semantic Source Code Retrieval Model Using CodeT5 and Bi-LSTMApplied Sciences10.3390/app1413579514:13(5795)Online publication date: 2-Jul-2024
https://doi.org/10.3390/app14135795
Weber TBrandmaier MSchmidt AMayer S(2024)Significant Productivity Gains through Programming with Large Language ModelsProceedings of the ACM on Human-Computer Interaction10.1145/36611458:EICS(1-29)Online publication date: 17-Jun-2024
https://dl.acm.org/doi/10.1145/3661145
Wang CZhang JWu RZhang C(2024)DAInfer: Inferring API Aliasing Specifications from Library Documentation via Neurosymbolic OptimizationProceedings of the ACM on Software Engineering10.1145/36608161:FSE(2469-2492)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660816
Chen SLi ZYang WLiu C(2024)DeciX: Explain Deep Learning Based Code Generation ApplicationsProceedings of the ACM on Software Engineering10.1145/36608141:FSE(2424-2446)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660814
Wang XYu HMeng XCao HZhang HSun HLiu XHu C(2024)MTL-TRANSFER: Leveraging Multi-task Learning and Transferred Knowledge for Improving Fault Localization and Program RepairACM Transactions on Software Engineering and Methodology10.1145/365444133:6(1-31)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3654441
Velasco APalacio DRodriguez-Cardenas DPoshyvanyk DRoychoudhury APaiva AAbreu RStorey MHierons RMadeira H(2024)Which Syntactic Capabilities Are Statistically Learned by Masked Language Models for Code?Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results10.1145/3639476.3639768(72-76)Online publication date: 14-Apr-2024
https://dl.acm.org/doi/10.1145/3639476.3639768
Li ZLi CTang ZHuang WGe JLuo BNg VWang THu YZhang X(2024)PTM-APIRec: Leveraging Pre-trained Models of Source Code in API RecommendationACM Transactions on Software Engineering and Methodology10.1145/363274533:3(1-30)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3632745
Xiao YSong WAhmed SGe XViswanath BMeng NYao D(2024)Measurement of Embedding Choices on Cryptographic API Completion TasksACM Transactions on Software Engineering and Methodology10.1145/362529133:3(1-30)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3625291
Choi KHwang SMoon HSasano IHong JPark J(2024)Ranked Syntax Completion With LR ParsingProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing10.1145/3605098.3635944(1242-1251)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3605098.3635944
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents