skip to main content
10.1145/2989238.2989240acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Addressing scalability in API method call analytics

Published: 13 November 2016 Publication History

Abstract

Intelligent code completion recommends relevant code to developers by comparing the editor content to code patterns extracted by analyzing large repositories. However, with the vast amount of data available in such repositories, scalability of the recommender system becomes an issue. We propose using Boolean Matrix Factorization (BMF) as a clustering technique for analyzing code in order to improve scalability of the underlying models. We compare model size, inference speed, and prediction quality of an intelligent method call completion engine built on top of canopy clustering versus one built on top of BMF. Our results show that BMF reduces model size up to 80% and increases inference speed up to 78%, without significant change in prediction quality.

References

[1]
M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plemmons. Algorithms and applications for approximate nonnegative matrix factorization. Computational statistics & data analysis, 52(1):155–173, 2007.
[2]
M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proc. of ESECFSE, pages 213–222, 2009.
[3]
E. Cergani and P. Miettinen. Discovering relations using matrix factorization methods. In Proc. of ACM CIKM, pages 1549–1552, 2013.
[4]
A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu. On the naturalness of software. In Proc. of ICSE, pages 837–847, 2012.
[5]
R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proc. of ICSE, pages 117–125, 2005.
[6]
Y. Koren, R. Bell, C. Volinsky, et al. Matrix factorization techniques for recommender systems. Computer, 42:30–37, 2009.
[7]
Z. Li and Y. Zhou. Pr-miner: automatically extracting implicit programming rules and detecting violations in large software code. In ACM SIGSOFT Software Engineering Notes, volume 30, pages 306–315, 2005.
[8]
C. Liu, H.-c. Yang, J. Fan, L.-W. He, and Y.-M. Wang. Distributed nonnegative matrix factorization for webscale dyadic data analysis on mapreduce. In Proc. of WWW, pages 681–690, 2010.
[9]
B. Livshits and T. Zimmermann. Dynamine: finding common error patterns by mining software revision histories. In Proc. of FSE, volume 30, pages 296–305, 2005.
[10]
A. Michail. Data mining library reuse patterns using generalized association rules. In Proc. of ICSE, pages 167–176, 2000.
[11]
P. Miettinen. Matrix decomposition methods for data mining: Computational complexity and algorithms. PhD thesis, Helsingin yliopisto, 2009.
[12]
P. Miettinen and J. Vreeken. Model order selection for boolean matrix factorization. In Proc. of SIGKDD, pages 51–59, 2011.
[13]
G. C. Murphy, M. Kersten, and L. Findlater. How are Java software developers using the Elipse IDE? IEEE Software, 23(4):76–83, 2006.
[14]
N. Nachar. The mann-whitney u: A test for assessing whether two independent samples come from the same distribution. Tutorials in Quantitative Methods for Psychology, 4(1):13–20, 2008.
[15]
A. T. Nguyen and T. N. Nguyen. Graph-based statistical language model for code. In Proc. of ICSE, volume 1, pages 858–868, 2015.
[16]
S. Proksch, J. Lerch, and M. Mezini. Intelligent code completion with bayesian networks. ACM Transactions on Software Engineering and Methodology, 25:3, 2015.
[17]
V. Raychev, M. Vechev, and E. Yahav. Code completion with statistical language models. In ACM SIGPLAN Notices, volume 49, pages 419–428, 2014.
[18]
R. Robbes and M. Lanza. Improving code completion with program history. Automated Software Engineering, 17(2):181–212, 2010.
[19]
V. Snael, P. Kromer, J. Platos, and D. H´ usek. On the implementation of boolean matrix factorization. In Proc. of DEXA, pages 554–558, 2008.
[20]
M. Weimer, A. Karatzoglou, and M. Bruch. Maximum margin matrix factorization for code recommendation. In Proc. of ACM RecSys, pages 309–312, 2009.
[21]
W. Xu, X. Liu, and Y. Gong. Document clustering based on non-negative matrix factorization. In Proc. of ACM SIGIR, pages 267–273, 2003.
[22]
C. Zhang, J. Yang, Y. Zhang, J. Fan, X. Zhang, J. Zhao, and P. Ou. Automatic parameter recommendation for practical api usage. In Proc. of ICSE’12, pages 826–836.
[23]
D. Zhang, Y. Guo, and X. Chen. Automated aspect recommendation through clustering-based fan-in analysis. In Proc. of ASE, pages 278–287, 2008.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SWAN 2016: Proceedings of the 2nd International Workshop on Software Analytics
November 2016
53 pages
ISBN:9781450343954
DOI:10.1145/2989238
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Analytics of code repositories
  2. Boolean Matrix Factorization
  3. Intelligent Method Call Completion
  4. Scalability

Qualifiers

  • Research-article

Conference

FSE'16
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 215
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media