skip to main content
10.1145/3379177.3388904acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Action-based Recommendation in Pull-request Development

Published: 16 September 2020 Publication History

Abstract

Pull requests (PRs) selection is a challenging task faced by integrators in pull-based development (PbD), with hundreds of PRs submitted on a daily basis to large open-source projects. Managing these PRs manually consumes integrators' time and resources and may lead to delays in the acceptance, response, or rejection of PRs that can propose bug fixes or feature enhancements. On the one hand, well-known platforms for performing PbD, like GitHub, do not provide built-in recommendation mechanisms for facilitating the management of PRs. On the other hand, prior research on PRs recommendation has focused on the likelihood of either a PR being accepted or receive a response by the integrator. In this paper, we consider both those likelihoods, this to help integrators in the PRs selection process by suggesting to them the appropriate actions to undertake on each specific PR. To this aim, we propose an approach, called CARTESIAN (aCceptance And Response classificaTion-based requESt IdentificAtioN) modeling the PRs recommendation according to PR actions. In particular, CARTESIAN is able to recommend three types of PR actions: accept, respond, and reject. We evaluated CARTESIAN on the PRs of 19 popular GitHub projects. The results of our study demonstrate that our approach can identify PR actions with an average precision and recall of about 86%. Moreover, our findings also highlight that CARTESIAN outperforms the results of two baseline approaches in the task of PRs selection.

References

[1]
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 2011. Modern Information Retrieval the Concepts and Technology Behind Search. DBLP.
[2]
Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M. German, and Premkumar Devanbu. 2012. Cohesive and Isolated Development with Branches. In Fundamental Approaches to Software Engineering, Juan de Lara and Andrea Zisman (Eds.). Springer Berlin Heidelberg, 316--331.
[3]
Yoav Benjamini and Daniel Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29, 4 (08 2001), 1165--1188. https://doi.org/10.1214/aos/1013699998
[4]
Christian Bird and Alberto Bacchelli. 2013. Expectations, Outcomes, and Challenges of Modern Code Review. IEEE. https://www.microsoft.com/en-us/research/publication/expectations-outcomes-and-challenges-of-modern-code-review/
[5]
H. Borges, A. Hora, and M. T. Valente. 2016. Understanding the Factors That Impact the Popularity of GitHub Repositories. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME). 334--344. https://doi.org/10.1109/ICSME.2016.31
[6]
C. Chen, S. Gao, and Z.Xing. 2016. Mining Analogical Libraries in Q A Discussions -- Incorporating Relational and Categorical Knowledge into Word Embedding. In 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), Vol. 1. 338--348. https://doi.org/10.1109/SANER.2016.21
[7]
Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785--794.
[8]
D. J. Dittman, T. M. Khoshgoftaar, and A. Napolitano. 2015. The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data. In 2015 IEEE International Conference on Information Reuse and Integration. 457--463. https://doi.org/10.1109/IRI.2015.76
[9]
Felipe Ebert, Fernando Castor, Nicole Novielli, and Alexander Serebrenik. 2019. Confusion in Code Reviews: Reasons, Impacts, and Coping Strategies. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24--27, 2019, Xinyu Wang, David Lo, and Emad Shihab (Eds.). IEEE, 49--60. https://doi.org/10.1109/SANER.2019.8668024
[10]
Yuanrui Fan, Xin Xia, David Lo, and Shanping Li. 2018. Early prediction of merged code changes to prioritize reviewing tasks. Empirical Software Engineering 23, 6 (01 Dec 2018), 3346--3393. https://doi.org/10.1007/s10664--018--9602--0
[11]
Denae Ford, Mahnaz Behroozi, Alexander Serebrenik, and Chris Parnin. 2019. Beyond the code itself: how programmers really look at pull requests. In Proceedings of the 41st International Conference on Software Engineering: Software Engineering in Society, ICSE 2019, Montreal, QC, Canada, May 25--31, 2019, Rick Kazman and Liliana Pasquale (Eds.). ACM, 51--60. https://doi.org/10.1109/ICSE-SEIS.2019.00014
[12]
Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. 2010. Variable selection using random forests. Pattern Recognition Letters 31, 14 (2010), 2225--2236. https://doi.org/10.1016/j.patrec.2010.03.014
[13]
K. V. Ghag and K. Shah. 2015. Comparative analysis of effect of stopwords removal on sentiment classification. In 2015 International Conference on Computer, Communication and Control (IC4). 1--6. https://doi.org/10.1109/IC4.2015.7375527
[14]
Georgios Gousios, Martin Pinzger, and Arie van Deursen. 2014. An Exploratory Study of the Pull-based Software Development Model. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). ACM, New York, NY, USA, 345--355. https://doi.org/10.1145/2568225.2568260
[15]
G. Gousios, M. Storey, and A. Bacchelli. 2016. Work Practices and Challenges in Pull-Based Development: The Contributor's Perspective. In International Conference on Software Engineering (ICSE). 285--296.
[16]
G. Gousios, A. Zaidman, M. Storey, and A. v. Deursen. 2015. Work Practices and Challenges in Pull-Based Development: The Integrator's Perspective. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 358--368. https://doi.org/10.1109/ICSE.2015.55
[17]
Tin Kam Ho. 1995. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, Vol. 1. IEEE, 278--282.
[18]
Jing Jiang, Yun Yang, Jiahuan He, Xavier Blanc, and Li Zhang. 2017. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development. Information and Software Technology 84 (2017), 48--62. https://doi.org/10.1016/j.infsof.2016.10.006
[19]
Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel M. Germán, and Daniela E. Damian. 2014. The promises and perils of mining GitHub. In 11th Working Conference on Mining Software Repositories, MSR 2014, Proceedings, May 31-June 1, 2014, Hyderabad, India. 92--101. https://doi.org/10.1145/2597073.2597074
[20]
Zhifang Liao, Yanbing Li, Dayu He, Jinsong Wu, Yan Zhang, and Xiaoping Fan. 2017. Topic-Based Integrator Matching for Pull Request. GLOBECOM 2017-2017 IEEE Global Communications Conference (2017), 1--6.
[21]
J. Liu, J. Li, and L. He. 2016. A Comparative Study of the Effects of Pull Request on GitHub Projects. In 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. 313--322. https://doi.org/10.1109/COMPSAC.2016. 27
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. CoRR abs/1310.4546 (2013). arXiv:1310.4546 http://arxiv.org/abs/1310.4546
[23]
Audris Mockus, Roy T. Fielding, and James D. Herbsleb. 2002. Two Case Studies of Open Source Software Development: Apache and Mozilla. ACM Trans. Softw. Eng. Methodol. 11, 3 (July 2002), 309--346. https://doi.org/10.1145/567793.567795
[24]
Abdillah Mohamed, Li Zhang, Jing Jiang, and Ahmed Ktob. 2018. Predicting Which Pull Requests Will Get Reopened in GitHub. In 25th Asia-Pacific Software Engineering Conference, APSEC 2018, Nara, Japan, December 4-7, 2018. 375--385. https://doi.org/10.1109/APSEC.2018.00052
[25]
William S Noble. 2006. What is a support vector machine? Nature biotechnology 24, 12 (2006), 1565.
[26]
Sebastiano Panichella. 2018. Summarization techniques for code, change, testing, and user feedback (Invited paper). In 2018 IEEE Workshop on Validation, Analysis and Evolution of Software Tests, VST@SANER 2018, Campobasso, Italy, March 20, 2018, Cyrille Artho and Rudolf Ramler (Eds.). IEEE, 1--5. https://doi.org/10.1109/VST.2018.8327148
[27]
Sebastiano Panichella, Andrea Di Sorbo, Emitza Guzman, Corrado Aaron Visaggio, Gerardo Canfora, and Harald C. Gall. 2015. How can i improve my app? Classifying user reviews for software maintenance and evolution. In 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29-October 1, 2015, Rainer Koschke, Jens Krinke, and Martin P. Robillard (Eds.). IEEE Computer Society, 281--290. https://doi.org/10.1109/ICSM.2015.7332474
[28]
Martin Porter. [n.d.]. The Porter stemmer Algorithm. http://tartarus.org/~martin/PorterStemmer/. Accessed October 23, 2019.
[29]
Mohammad Masudur Rahman and Chanchal K. Roy. 2014. An Insight into the Pull Requests of GitHub. In Proceedings of the 11th Working Conference on Mining Software Repositories (Hyderabad, India) (MSR 2014). ACM, New York, NY, USA, 364--367. https://doi.org/10.1145/2597073.2597121
[30]
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano. 2010. RUSBoost: A Hybrid Approach to Alleviating Class Imbalance. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 40, 1 (Jan 2010), 185--197. https://doi.org/10.1109/TSMCA.2009.2029559
[31]
Jacek Sliwerski, Thomas Zimmermann, and Andreas Zeller. 2005. When Do Changes Induce Fixes? SIGSOFT Softw. Eng. Notes 30, 4 (May 2005), 1--5. https://doi.org/10.1145/1082983.1083147
[32]
Andrea Di Sorbo, Sebastiano Panichella, Carol V. Alexandru, Junji Shimagaki, Corrado Aaron Visaggio, Gerardo Canfora, and Harald C. Gall. 2016. What would users change in my app? summarizing app reviews for recommending software changes. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016, Seattle, WA, USA, November 13-18, 2016, Thomas Zimmermann, Jane Cleland-Huang, and Zhendong Su (Eds.). ACM, 499--510. https://doi.org/10.1145/2950290.2950299
[33]
Patanamon Thongtanunam, Raula Gaikovina Kula, Ana Erika Camargo Cruz, Norihiro Yoshida, and Hajimu Iida. 2014. Improving Code Review Effectiveness Through Reviewer Recommendations. In Proceedings of the 7th International Workshop on Cooperative and Human Aspects of Software Engineering (Hyderabad, India) (CHASE 2014). ACM, New York, NY, USA, 119--122. https://doi.org/10.1145/2593702.2593705
[34]
Jason Tsay, Laura Dabbish, and James Herbsleb. 2014. Influence of Social and Technical Factors for Evaluating Contribution in GitHub. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India). ACM, New York, NY, USA, 356--366. https://doi.org/10.1145/2568225.2568315
[35]
E. v. d. Veen, G. Gousios, and A. Zaidman. 2015. Automatically Prioritizing Pull Requests. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 357--361. https://doi.org/10.1109/MSR.2015.40
[36]
Strother H Walker and David B Duncan. 1967. Estimation of the probability of an event as a function of several independent variables. Biometrika 54, 1-2 (1967), 167--179.
[37]
Yi Wang and David Redmiles. 2016. Cheap talk, cooperation, and trust in global software engineering. Empirical Software Engineering 21, 6 (01 Dec 2016), 2233--2267. https://doi.org/10.1007/s10664-015-9407-3
[38]
X. Ye, H. Shen, X. Ma, R. Bunescu, and C. Liu. 2016. From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering. In International Conference on Software Engineering. 404--415.
[39]
H. Ying, L. Chen, T. Liang, and J. Wu. 2016. EARec: Leveraging Expertise and Authority for Pull-Request Reviewer Recommendation in GitHub. In 2016 IEEE/ACM 3rd International Workshop on CrowdSourcing in Software Engineering (CSI-SE). 29--35. https://doi.org/10.1109/CSI-SE.2016.013
[40]
Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu. 2015. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. 367--371. https://doi.org/10.1109/MSR.2015.42
[41]
Y. Yu, H. Wang, G. Yin, and C. X. Ling. 2014. Who Should Review this Pull-Request: Reviewer Recommendation to Expedite Crowd Collaboration. In 2014 21st Asia-Pacific Software Engineering Conference, Vol. 1. 335--342. https://doi.org/10.1109/APSEC.2014.57
[42]
Yue Yu, Huaimin Wang, Gang Yin, and Tao Wang. 2016. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? Information and Software Technology 74 (2016), 204--218. https://doi.org/10.1016/j.infsof.2016.01.004
[43]
Yue Yu, Gang Yin, Tao Wang, Cheng Yang, and Huaimin Wang. 2016. Determinants of pull-based development in the context of continuous integration. Science China Information Sciences 59, 8 (18 Jul 2016), 080104. https://doi.org/10.1007/s11432-016-5595-8
[44]
Y. Zhang, G. Yin, Y. Yu, and H. Wang. 2014. A Exploratory Study of @-Mention in GitHub's Pull-Requests. In Asia-Pacific Software Engineering Conference. 343--350.
[45]
Guoliang Zhao, Daniel Alencar da Costa, and Ying Zou. 2019. Improving the pull requests review process using learning-to-rank algorithms. Empirical Software Engineering 24, 4 (2019), 2140--2170.
[46]
Y. Zhou, Y. Su, T. Chen, Z. Huang, H. C. Gall, and S. Panichella. 2020. User Review-Based Change File Localization for Mobile Applications. IEEE Transactions on Software Engineering (2020), 1--1. https://doi.org/10.1109/TSE.2020.2967383

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSSP '20: Proceedings of the International Conference on Software and System Processes
June 2020
208 pages
ISBN:9781450375122
DOI:10.1145/3379177
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Machine learning
  2. Pull Requests recommendation
  3. Software maintenance and evolution

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • National Key Research and Development Program of China

Conference

ICSSP '20
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)1
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media