skip to main content
10.1145/2594291.2594321acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Code completion with statistical language models

Published: 09 June 2014 Publication History

Abstract

We address the problem of synthesizing code completions for programs using APIs. Given a program with holes, we synthesize completions for holes with the most likely sequences of method calls.
Our main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences. We design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model. We then employ the language model to find the highest ranked sentences, and use them to synthesize a code completion. Our approach is able to synthesize sequences of calls across multiple objects together with their arguments.
Experiments show that our approach is fast and effective. Virtually all computed completions typecheck, and the desired completion appears in the top 3 results in 90% of the cases.

References

[1]
Android-er. http://android-er.blogspot.ch/2011/03/set-wallpaper-using-wallpapermanager.html.
[2]
Android how-to's. https://sites.google.com/site/androidhowto/how-to-1/display-a-web-page.
[3]
Stack overflow. http://www.stackoverflow.com/.
[4]
Tutorial for android. http://www.tutorialforandroid.com/2009/01/changing-screen-brightness.html.
[5]
Tutorial for android. http://www.tutorialforandroid.com/2009/10/turn-off-turn-on-wifi-in-android-using.html.
[6]
Vogella tutorials. http://www.vogella.com/articles/AndroidMedia/article.html.
[7]
Alnusair, A., Zhao, T., and Bodden, E. Effective API navigation and reuse. In IRI (aug. 2010), pp. 7--12.
[8]
Ammons, G., Bodík, R., and Larus, J. R. Mining specifications. In POPL '02 (2002).
[9]
Beckman, N., Kim, D., and Aldrich, J. An empirical study of object protocols in the wild. In ECOOP'11.
[10]
Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. J. Mach. Learn. Res. 3 (Mar. 2003), 1137--1155.
[11]
Cook, J. E., and Wolf, A. L. Discovering models of software processes from event-based data. ACM Trans. Softw. Eng. Methodol. 7, 3 (1998), 215--249.
[12]
Dagenais, B., and Hendren, L. J. Enabling static analysis for partial Java programs. In OOPSLA'08, pp. 313--328.
[13]
Elman, J. L. Finding structure in time. Cognitive Science 14, 2 (1990), 179--211.
[14]
Gulwani, S. Dimensions in program synthesis. In symp. on Principles and practice of declarative programming (2010), PPDP '10.
[15]
Gvero, T., Kuncak, V., Kuraj, I., and Piskac, R. Complete completion using types and weights. In PLDI '13 (2013).
[16]
Gvero, T., Kuncak, V., and Piskac, R. Interactive synthesis of code snippets. In CAV'11, vol. 6806 of LNCS. 2011.
[17]
Hindle, A., Barr, E. T., Su, Z., Gabel, M., and Devanbu, P. On the naturalness of software. In ICSE 2012 (2012).
[18]
Holmes, R., and Murphy, G. C. Using structural context to recommend source code examples. In ICSE '05.
[19]
Holmes, R., Walker, R. J., and Murphy, G. C. Strathcona example recommendation tool. In FSE'05, pp. 237--240.
[20]
Katz, S. M. Estimation of probabilities from sparse data for the language model component of a speech recognizer. In IEEE Trans. on Acoustics, Speech and Singal processing (March 1987), vol. ASSP-35.
[21]
Kneser, R., and Ney, H. Improved backing-off for m-gram language modeling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (May 1995), vol. I.
[22]
Kombrink, S., Mikolov, T., Karafiát, M., and Burget, L. Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH (2011), pp. 2877--2880.
[23]
Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. Jungloid mining: Helping to navigate the api jungle. In PLDI '05 (2005).
[24]
Mikolov, T., Deoras, A., Povey, D., Burget, L., and Cernocky, J. Strategies for training large scale neural network language models. In ASRU 2011 (2011), IEEE Signal Processing Society.
[25]
Mishne, A., Shoham, S., and Yahav, E. Typestate-based semantic code search over partial programs. In OOPSLA '12 (2012).
[26]
Perelman, D., Gulwani, S., Ball, T., and Grossman, D. Type-directed completion of partial expressions. In PLDI (2012).
[27]
Reiss, S. P. Semantics-based code search. In ICSE'09.
[28]
Rosenfeld, R. Two decades of statistical language modeling: Where do we go from here. In Proceedings of the IEEE (2000), p. 2000.
[29]
Shoham, S., Yahav, E., Fink, S., and Pistoia, M. Static specification mining using automata-based abstractions. In ISSTA '07 (2007).
[30]
Solar-Lezama, A. The sketching approach to program synthesis. In APLAS '09 (2009).
[31]
Solar-Lezama, A., Tancau, L., Bodík, R., Seshia, S. A., and Saraswat, V. A. Combinatorial sketching for finite programs. In ASPLOS (2006), pp. 404--415.
[32]
Srivastava, S., Gulwani, S., and Foster, J. S. From program verification to program synthesis. In POPL '10 (2010).
[33]
Steensgaard, B. Points-to analysis in almost linear time. In Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages (1996), POPL '96, pp. 32--41.
[34]
Stolcke, A. SRILM-an Extensible Language Modeling Toolkit. International Conference on Spoken Language Processing (2002).
[35]
Thummalapenta, S., and Xie, T. Parseweb: a programmer assistant for reusing open source code on the web. In ASE '07 (2007).
[36]
Vallée-Rai, R., et al. Soot - a Java Optimization Framework. In Proceedings of CASCON 1999 (1999), pp. 125--135.
[37]
Vechev, M., and Yahav, E. Deriving linearizable fine-grained concurrent objects. In PLDI '08 (2008).
[38]
Wasylkowski, A., and Zeller, A. Mining temporal specifications from object usage. In Autom. Softw. Eng. (2011), vol. 18.
[39]
Weimer, W., and Necula, G. Mining temporal specifications for error detection. In TACAS'05, vol. 3440 of LNCS. 2005, pp. 461--476.
[40]
Witten, I. H., and Bell, T. C. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory 37, 4 (1991), 1085--1094.
[41]
Yang, J., Evans, D., Bhardwaj, D., Bhat, T., and Das, M. Perracotta: mining temporal API rules from imperfect traces. In ICSE '06, pp. 282--291.
[42]
Yessenov, K., Xu, Z., and Solar-Lezama, A. Data-driven synthesis for object-oriented frameworks. In OOPSLA '11 (2011).
[43]
Zhong, H., Xie, T., Zhang, L., Pei, J., and Mei, H. MAPO: Mining and recommending API usage patterns. In ECOOP'09.

Cited By

View all

Index Terms

  1. Code completion with statistical language models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLDI '14: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2014
    619 pages
    ISBN:9781450327848
    DOI:10.1145/2594291
    • cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 49, Issue 6
      PLDI '14
      June 2014
      598 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2666356
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    PLDI '14
    Sponsor:

    Acceptance Rates

    PLDI '14 Paper Acceptance Rate 52 of 287 submissions, 18%;
    Overall Acceptance Rate 406 of 2,067 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)301
    • Downloads (Last 6 weeks)19
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media