skip to main content
10.1145/2889160.2889165acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Public Access

SourcererCC and SourcererCC-I: tools to detect clones in batch mode and during software development

Published: 14 May 2016 Publication History

Abstract

Given the availability of large source-code repositories, there has been a large number of applications for large-scale clone detection. Unfortunately, despite a decade of active research, there is a marked lack in clone detectors that scale to big software systems or large repositories, specifically for detecting near-miss (Type 3) clones where significant editing activities may take place in the cloned code.
This paper demonstrates: (i) SourcererCC, a token-based clone detector that targets the first three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. It uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone; and (ii) SourcererCC-I, an Eclipse plug-in, that uses SourcererCC's core engine to identify and navigate clones (both inter and intra project) in real-time during software development.
In our experiments, comparing SourcererCC with the state-of-the-art tools 1, we found that it is the only clone detection tool to successfully scale to 250 MLOC on a standard workstation with 12 GB RAM and efficiently detect the first three types of clones (precision 86% and recall 86-100%). Link to the demo: https://youtu.be/17F_9Qp-ks4

References

[1]
C. K. Roy and J. R. Cordy, "A survey on software clone detection research," no. TR 2007-541, 2007. 115 pp.
[2]
T. Ishihara, K. Hotta, Y. Higo, H. Igaki, and S. Kusumoto, "Inter-project functional clone detection toward building libraries - an empirical study on 13,000 projects," in Reverse Engineering (WCRE), 2012 19th Working Conference on, pp. 387--391, Oct 2012.
[3]
K. Chen, P. Liu, and Y. Zhang, "Achieving accuracy and scalability simultaneously in detecting application clones on android markets," in Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, (New York, NY, USA), pp. 175--186, ACM, 2014.
[4]
R. Koschke, "Large-scale inter-system clone detection using suffix trees," in Proceedings of CSMR, pp. 309--318, 2012.
[5]
A. Hemel and R. Koschke, "Reverse engineering variability in source code using clone detection: A case study for linux variants of consumer electronic devices," in Proceedings of Working Conference on Reverse Engineering, pp. 357--366, 2012.
[6]
J. Davies, D. German, M. Godfrey, and A. Hindle, "Software Bertillonage: finding the provenance of an entity," in Proceedings of MSR, 2011.
[7]
I. Keivanloo, J. Rilling, and P. Charland, "Internet-scale real-time code clone search via multi-level indexing," in Proceedings of WCRE, 2011.
[8]
S. Kawaguchi, T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, and H. Iida, "Shinobi: A tool for automatic code clone detection in the ide," vol. 0, (Los Alamitos, CA, USA), pp. 313--314, IEEE Computer Society, 2009.
[9]
J. Svajlenko, I. Keivanloo, and C. Roy, "Scaling classical clone detection tools for ultra-large datasets: An exploratory study," in Software Clones (IWSC), 2013 7th International Workshop on, pp. 16--22, May 2013.
[10]
S. Livieri, Y. Higo, M. Matsushita, and K. Inoue, "Very-large scale code clone analysis and visualization of open source programs using distributed ccfinder: D-ccfinder," in Proceedings of ICSE, 2007.
[11]
B. Hummel, E. Juergens, L. Heinemann, and M. Conradt, "Index-based code clone detection:incremental, distributed, scalable," in Proceedings of ICSM, 2010.
[12]
C. K. Roy and J. R. Cordy, "Near-miss function clones in open source software: An empirical study," J. Softw. Maint. Evol., vol. 22, pp. 165--189, Apr. 2010.
[13]
H. Sajnani, V. Saini, and C. Lopes, "A parallel and efficient approach to large scale clone detection," Journal of Software: Evolution and Process, vol. 27, no. 6, pp. 402--429, 2015.
[14]
H. Sajnani, V. Saini, J. Svajlenko, C. K. Roy, and C. V. Lopes, "Sourcerercc:scaling code clone detection to big-code," in Proceedings of the 38th International Conference on Software Engineering, 2016.
[15]
C. K. Roy, J. R. Cordy, and R. Koschke, "Comparison and evaluation of code clone detection techniques and tools: A qualitative approach," Sci. of Comput. Program., pp. 577--591, 2009.
[16]
B. Lague, D. Proulx, J. Mayrand, E. M. Merlo, and J. Hudepohl, "Assessing the benefits of incorporating function clone detection in a development process," in Software Maintenance, 1997. Proceedings., International Conference on, pp. 314--321, IEEE, 1997.
[17]
D. Hou, P. Jablonski, and F. Jacob, "Cnp: Towards an environment for the proactive management of copy-and-paste programming," in Program Comprehension, 2009. ICPC'09. IEEE 17th International Conference on, pp. 238--242, IEEE, 2009.
[18]
Z. Li, S. Lu, S. Myagmar, and Y. Zhou, "Cp-miner: A tool for finding copy-paste and related bugs in operating system code.," in OSDI, vol. 4, pp. 289--302, 2004.
[19]
E. Duala-Ekoko and M. P. Robillard, "Clonetracker: tool support for code clone management," in Proceedings of the 30th international conference on Software engineering, pp. 843--846, ACM, 2008.
[20]
B. S. Baker, "On finding duplication and near-duplication in large software systems," in Reverse Engineering, 1995., Proceedings of 2nd Working Conference on, pp. 86--95, IEEE, 1995.
[21]
N. Gode and R. Koschke, "Incremental clone detection," in Software Maintenance and Reengineering, 2009. CSMR'09. 13th European Conference on, pp. 219--228, IEEE, 2009.
[22]
T. Kamiya, S. Kusumoto, and K. Inoue, "CCFinder: a multilinguistic token-based code clone detection system for large scale source code," IEEE Trans. Softw. Eng., vol. 28, no. 7, pp. 654--670, 2002.
[23]
C. K. Roy and J. R. Cordy, "Nicad: Accurate detection of near-miss intentional clones using flexible pretty-printing and code normalization," in Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on, pp. 172--181, IEEE, 2008.
[24]
I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier, "Clone detection using abstract syntax trees," in Software Maintenance, 1998. Proceedings., International Conference on, pp. 368--377, IEEE, 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '16: Proceedings of the 38th International Conference on Software Engineering Companion
May 2016
946 pages
ISBN:9781450342056
DOI:10.1145/2889160
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)108
  • Downloads (Last 6 weeks)18
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media