skip to main content
10.1109/MSR.2017.56acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Candoia: a platform for building and sharing mining software repositories tools as apps

Published: 20 May 2017 Publication History

Abstract

We propose Candoia, a novel platform and ecosystem for building and sharing Mining Software Repositories (MSR) tools. Using Candoia, MSR tools are built as apps, and Candoia ecosystem, acting as an appstore, allows effective sharing. Candoia platform provides, data extraction tools for curating custom datasets for user projects, and data abstractions for enabling uniform access to MSR artifacts from disparate sources, which makes apps portable and adoptable across diverse software project settings of MSR researchers and practitioners. The structured design of a Candoia app and the languages selected for building various components of a Candoia app promotes easy customization. To evaluate Candoia we have built over two dozen MSR apps for analyzing bugs, software evolution, project management aspects, and source code and programming practices showing the applicability of the platform for building a variety of MSR apps. For testing portability of apps across diverse project settings, we tested the apps using ten popular project repositories, such as Apache Tomcat, JUnit, Node.js, etc, and found that apps required no changes to be portable. We performed a user study to test customizability and we found that five of eight Candoia users found it very easy to customize an existing app. Candoia is available for download.

References

[1]
Protocol buffers. https://developers.google.com/protocol-buffers/.
[2]
M. Acharya, T. Xie, J. Pei, and J. Xu. Mining API patterns as partial orders from source code: from usage scenarios to specifications. ESEC/FSE '07, pages 25--34. 2007.
[3]
S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An infrastructure for large-scale collection and analysis of open-source code. Sci. Comput. Program., 79:241--259, Jan. 2014.
[4]
V. R. Basili, L. C. Briand, and W. L. Melo. A validation of object-oriented design metrics as quality indicators. IEEE Trans. Softw. Eng., 22(10):751--761, 1996.
[5]
J. Bevan, J. E. James Whitehead, S. Kim, and M. Godfrey. Facilitating software evolution research with kenyon. pages 177--186. 2005.
[6]
J. Bevan, E. J. Whitehead, Jr., S. Kim, and M. Godfrey. Facilitating software evolution research with kenyon. ESEC/FSE-13, pages 177--186, New York, NY, USA, 2005.
[7]
C. Bird, N. Nagappan, P. Devanbu, H. Gall, and B. Murphy. Does distributed development affect software quality? an empirical case study of windows vista. ICSE '09, pages 518--528. 2009.
[8]
Black Duck Software. Black duck open hub. https://www.openhub.net/, 2015.
[9]
Candoia website. http://candoia.github.io.
[10]
M. Cataldo, A. Mockus, J. A. Roberts, and J. D. Herbsleb. Software dependencies, work dependencies, and their impact on failures. IEEE Transactions on Software Engineering, 99:864--878, 2009.
[11]
V. Dallmeier, C. Lindig, and A. Zeller. Lightweight Defect Localization for Java. ECOOP 2005. 2005.
[12]
M. D'Ambros, M. Lanza, and R. Robbes. Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empirical Softw. Engg., 2011.
[13]
S. Ducasse, T. Gîrba, and O. Nierstrasz. Moose: An Agile Reengineering Environment. ESEC/FSE-13, pages 99--102. 2005.
[14]
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. ICSE '13, pages 422--431. 2013.
[15]
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: Ultra-large-scale software repository and source-code mining. ACM Trans. Softw. Eng. Methodol., 25(1):7:1--7:34, Dec. 2015.
[16]
Y. Gao, M. V. Antwerp, S. Christley, and G. Madey. A research collaboratory for open source software research. FLOSS '07, pages 4--, Washington, DC, USA, 2007.
[17]
J. M. González-Barahona and G. Robles. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empirical Software Engineering, 17(1--2):75--89, 2012.
[18]
G. Gousios. The GHTorrent dataset and tool suite. MSR '13, pages 233--236. 2013.
[19]
G. Gousios and D. Spinellis. Alitheia core: An extensible software quality monitoring platform. ICSE '09, pages 579--582. 2009.
[20]
G. Gousios and D. Spinellis. A platform for software engineering research. MSR'09, pages 31--40, 2009.
[21]
G. Gousios and D. Spinellis. GHTorrent: GitHub's data from a firehose. MSR '12, pages 12--21. 2012.
[22]
G. Gousios, B. Vasilescu, A. Serebrenik, and A. Zaidman. Lean GHTorrent: GitHub Data on Demand. MSR'14, pages 384--387. 2014.
[23]
M. Grechanik, C. McMillan, L. DeFerrari, M. Comi, S. Crespi, D. Poshyvanyk, C. Fu, Q. Xie, and C. Ghezzi. An empirical investigation into a large-scale java open source code repository. ESEM '10, page 11. 2010.
[24]
J. Howison, M. Conklin, and K. Crowston. Flossmole: A collaborative repository for floss research data and analyses. IJITWE '06, 2006.
[25]
R. Just, D. Jalali, and M. D. Ernst. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. pages 437--440. 2014.
[26]
Z. Li, S. Lu, and S. Myagmar. Cp-miner: Finding copy-paste and related bugs in large-scale software code. IEEE Trans. Softw. Eng., 32(3):176--192, 2006.
[27]
Z. Li and Y. Zhou. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ESEC/FSE-13, pages 306--315. 2005.
[28]
C. Liu, E. Ye, and D. J. Richardson. Software library usage pattern extraction using a software model checker. ASE '06, pages 301--304. 2006.
[29]
A. Meneely, L. Williams, W. Snipes, and J. Osborne. Predicting failures with developer networks and social network analysis. SIGSOFT '08/FSE-16, pages 13--23. 2008.
[30]
T. Menzies, J. Greenwald, and A. Frank. Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng., 33(1):2--13, 2007.
[31]
A. Mockus. Amassing and indexing a large sample of version control systems: Towards the census of public source code history. MSR '09, pages 11--20, Washington, DC, USA, 2009.
[32]
N. Nagappan, B. Murphy, and V. Basili. The influence of organizational structure on software quality: an empirical case study. ICSE '08, pages 521--530. 2008.
[33]
J. Ossher, S. Bajracharya, E. Linstead, P. Baldi, and C. Lopes. SourcererDB: An Aggregated Repository of Statically Analyzed and Cross-linked Open Source Java Projects. MSR '09, pages 183--186, Washington, DC, USA, 2009.
[34]
D. L. Parnas. On the Criteria to Be Used in Decomposing Systems into Modules. Commun. ACM, 15(12):1053--1058, Dec. 1972.
[35]
G. Pinto, W. Torres, B. Fernandes, F. Castor, and R. S. Barros. A Large-Scale Study on the Usage of Java's Concurrent Programming Constructs. Journal of Systems and Software, 106:59--81, 2015.
[36]
M. Pinzger, N. Nagappan, and B. Murphy. Can developer-module networks predict failures? SIGSOFT '08/FSE-16, pages 2--12. 2008.
[37]
Promise 2009. http://promisedata.org/2009/datasets.html.
[38]
G. Robles. Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings. pages 171--180, 2010.
[39]
D. Rozenberg, I. Beschastnikh, F. Kosmale, V. Poser, H. Becker, M. Palyart, and G. C. Murphy. Comparing Repositories Visually with Repograms. MSR'16.
[40]
The Candoia Project. Candoia: Domain Specific Types. http://candoia.github.io/docs/dsl-types.html.
[41]
The Candoia Project. Candoia: Source code. https://github.com/candoia/candoia.
[42]
The Chromium Project. Chromium: Open source web browser. www.chromium.org, 2008.
[43]
S. Thummalapenta and T. Xie. Alattin: Mining alternative patterns for detecting neglected conditions. ASE'09, pages 283--294. November 2009.
[44]
N. M. Tiwari, G. Upadhyaya, and H. Rajan. Candoia: A platform and ecosystem for mining software repositories tools. ICSE '16, pages 759--764, New York, NY, USA, 2016.
[45]
A. Wasylkowski, A. Zeller, and C. Lindig. Detecting object usage anomalies. ESEC-FSE '07, pages 35--44. 2007.
[46]
W. Weimer and G. C. Necula. Mining temporal specifications for error detection. TACAS '05, pages 461--476, 2005.
[47]
T. Wolf, A. Schroter, D. Damian, and T. Nguyen. Predicting build failures using social network analysis on developer communication. ICSE '09, pages 1--11. 2009.
[48]
H. Zhong, L. Zhang, T. Xie, and H. Mei. Inferring Resource Specifications from Natural Language API Documentation. ASE'09, pages 307--318. November 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MSR '17: Proceedings of the 14th International Conference on Mining Software Repositories
May 2017
567 pages
ISBN:9781538615447

Sponsors

Publisher

IEEE Press

Publication History

Published: 20 May 2017

Check for updates

Qualifiers

  • Research-article

Conference

ICSE '17
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media