skip to main content
10.1145/3324884.3416619acmconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Exploring the architectural impact of possible dependencies in Python software

Published: 27 January 2021 Publication History

Abstract

Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question, we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial---higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%~14.18%), recall(31.73%~39.12%), and F1 scores (22.13%~32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence strongly suggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.

References

[1]
Beatrice Åkerblom, Jonathan Stendahl, Mattias Tumlin, and Tobias Wrigstad. 2014. Tracing dynamic features in python programs. In Proceedings of the 11th Working Conference on Mining Software Repositories. ACM, 292--295.
[2]
Beatrice Åkerblom and Tobias Wrigstad. 2015. Measuring polymorphism in Python programs. In ACM SIGPLAN Notices, Vol. 51. ACM, 114--128.
[3]
Periklis Andritsos, Panayiotis Tsaparas, Renée J Miller, and Kenneth C Sevcik. 2004. LIMBO: Scalable clustering of categorical data. In International Conference on Extending Database Technology. Springer, 123--146.
[4]
ArchDia. 2004--2020. https://archdia.com.
[5]
Erik Arisholm, Lionel C Briand, and Audun Foyen. 2004. Dynamic coupling measurement for object-oriented software. IEEE Transactions on software engineering 30, 8 (2004), 491--506.
[6]
John Aycock. 2000. Aggressive type inference. language 1050 (2000), 18.
[7]
Carliss Young Baldwin and Kim B Clark. 2000. Design rules: The power of modularity. Vol. 1. MIT press.
[8]
Gabriele Bavota, Bogdan Dit, Rocco Oliveto, Massimiliano Di Penta, Denys Poshyvanyk, and Andrea De Lucia. 2013. An empirical study on the developers' perception of software coupling. In Proceedings of the 2013 International Conference on Software Engineering. IEEE Press, 692--701.
[9]
Lionel C Briand, John W Daly, and Jürgen Wüst. 1998. A unified framework for cohesion measurement in object-oriented systems. Empirical Software Engineering 3, 1 (1998), 65--117.
[10]
Mainak Chatterjee, Sajal K Das, and Damla Turgut. 2002. WCA: A weighted clustering algorithm for mobile ad hoc networks. Cluster computing 5, 2 (2002), 193--204.
[11]
Zhifei Chen, Yanhui Li, Bihuan Chen, Wanwangying Ma, Len Chen, and Baowen Xu. 2020. An Empirical Study on Dynamic Typing Related Practices in Python Systems. In 2020 IEEE/ACM 28th International Conference on Program Comprehension (ICPC). -.
[12]
Zadia Codabux and Byron J Williams. 2016. Technical debt prioritization using predictive analytics. In Proceedings of the 38th International Conference on Software Engineering Companion. ACM, 704--706.
[13]
Di Cui, Ting Liu, Yuangfang Cai, Qinghua Zheng, Qiong Feng, Wuxia Jin, Jiaqi Guo, and Yu Qu. 2019. Investigating the Impact of Multiple Dependency Structures on Software Defects. In Software Engineering, 2019. ICSE 2019. Proceedings. 41th International Conference on. IEEE, -.
[14]
Hoa Khanh Dam, Trang Pham, Shien Wee Ng, Truyen Tran, John Grundy, Aditya Ghose, Taeksu Kim, and Chul-Joo Kim. 2019. Lessons learned from using a deep tree-based model for software defect prediction in practice. In Proceedings of the 16th International Conference on Mining Software Repositories. IEEE Press, 46--57.
[15]
Python docs. 2001--2020. https://docs.python.org/3/glossary.html#term-duck-typing.
[16]
Michael Furr, Jong-hoon David An, Jeffrey S Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 1859--1866.
[17]
Harald Gall, Karin Hajek, and Mehdi Jazayeri. 1998. Detection of logical coupling based on product release history. In Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272). IEEE, 190--198.
[18]
Erich Gamma. 1995. Design patterns: elements of reusable object-oriented software. Pearson Education India.
[19]
Joshua Garcia, Igor Ivkovic, and Nenad Medvidovic. 2013. A comparative analysis of software architecture recovery techniques. In Proceedings of 28th IEEE/ACM International Conference on Automated Software Engineering. IEEE, 486--496.
[20]
Joshua Garcia, Daniel Popescu, Chris Mattmann, Nenad Medvidovic, and Yuanfang Cai. 2011. Enhancing architectural recovery using concerns. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 552--555.
[21]
Vincent J Hellendoorn, Christian Bird, Earl T Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th acm joint meeting on european software engineering conference and symposium on the foundations of software engineering. 152--162.
[22]
Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type Analysis for JavaScript. In Proc. 16th International Static Analysis Symposium (SAS) (LNCS), Vol. 5673. Springer-Verlag.
[23]
Wuxia Jin, Yuanfang Cai, Rick Kazman, Qinghua Zheng, Di Cui, and Ting Liu. 2019. ENRE: a tool framework for extensible eNtity relation extraction. In Proceedings of the 41st International Conference on Software Engineering: Companion Proceedings. IEEE Press, 67--70.
[24]
Wuxia Jin, Ting Liu, Yu Qu, Qinghua Zheng, Di Cui, and Jianlei Chi. 2018. Dynamic structure measurement for distributed software. Software Quality Journal 26, 3 (2018), 1119--1145.
[25]
Huzefa Kagdi. 2007. Improving change prediction with fine-grained source code mining. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. ACM, 559--562.
[26]
Sebastian Kleinschmager, Romain Robbes, Andreas Stefik, Stefan Hanenberg, and Eric Tanter. 2012. Do static type systems improve the maintainability of software systems? An empirical study. In 2012 20th IEEE International Conference on Program Comprehension (ICPC). IEEE, 153--162.
[27]
Lattix. 2004--2020. https://www.sdcsystems.com/tools/lattix-software/lattix-architect/.
[28]
Duc Minh Le, Pooyan Behnamghader, Joshua Garcia, Daniel Link, Arman Shahbazian, and Nenad Medvidovic. 2015. An empirical study of architectural change in open-source software systems. In 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 235--245.
[29]
Bixin Li, Xiaobing Sun, Hareton Leung, and Sai Zhang. 2013. A survey of code-based change impact analysis techniques. Software Testing, Verification and Reliability 23, 8 (2013), 613--646.
[30]
Thibaud Lutellier, Devin Chollak, Joshua Garcia, Lin Tan, Derek Rayside, Nenad Medvidovic, and Robert Kroeger. 2015. Comparing software architecture recovery techniques using accurate dependencies. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. IEEE, 69--78.
[31]
Thibaud Lutellier, Devin Chollak, Joshua Garcia, Lin Tan, Derek Rayside, Nenad Medvidović, and Robert Kroeger. 2018. Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Transactions on Software Engineering 44, 2 (2018), 159--181.
[32]
Alan MacCormack, John Rusnak, and Carliss Y Baldwin. 2006. Exploring the structure of complex software designs: An empirical study of open source and proprietary code. Management Science 52, 7 (2006), 1015--1030.
[33]
Isela Macia, Roberta Arcoverde, Elder Cirilo, Alessandro Garcia, and Arndt von Staa. 2012. Supporting the identification of architecturally-relevant code anomalies. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 662--665.
[34]
Isela Macia, Joshua Garcia, Daniel Popescu, Alessandro Garcia, Nenad Medvidovic, and Arndt von Staa. 2012. Are automatically-detected code anomalies relevant to architectural modularity?: an exploratory analysis of evolving systems. In Proceedings of the 11th annual international conference on Aspect-oriented Software Development. ACM, 167--178.
[35]
Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: inferring JavaScript function types from natural language information. In Proceedings of the 41st International Conference on Software Engineering. IEEE Press, 304--315.
[36]
Robert C Martin. 2002. Agile software development: principles, patterns, and practices. Prentice Hall.
[37]
Nevena Milojkovic, Mohammad Ghafari, and Oscar Nierstrasz. 2017. It's duck (typing) season!. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). IEEE, 312--315.
[38]
Mltilang-depends. 2018--2020. https://github.com/multilang-depends/depends.
[39]
Ran Mo, Yuanfang Cai, Rick Kazman, and Lu Xiao. 2015. Hotspot patterns: The formal definition and automatic detection of architecture smells. In 2015 12th Working IEEE/IFIP Conference on Software Architecture. IEEE, 51--60.
[40]
Ran Mo, Yuanfang Cai, Rick Kazman, Lu Xiao, and Qiong Feng. 2016. Decoupling level: a new metric for architectural maintenance complexity. In Proceedings of the 38th International Conference on Software Engineering. IEEE, 499--510.
[41]
R. Mo, Y. Cai, R. Kazman, L. Xiao, and Q. Feng. 2019. Architecture Anti-patterns: Automatically Detectable Violations of Design Principles. IEEE Transactions on Software Engineering (2019), 1--1.
[42]
Ran Mo, Will Snipes, Yuanfang Cai, Srini Ramaswamy, Rick Kazman, and Martin Naedele. 2018. Experiences applying automated architecture analysis tool suites. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 779--789.
[43]
Nachiappan Nagappan, Thomas Ball, and Andreas Zeller. 2006. Mining metrics to predict component failures. In Proceedings of the 28th international conference on Software engineering. ACM, 452--461.
[44]
Sebastian Nanz and Carlo A Furia. 2015. A comparative study of programming languages in rosetta code. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. IEEE, 778--788.
[45]
Matheus Paixao, Jens Krinke, DongGyun Han, Chaiyong Ragkhitwetsagul, and Mark Harman. 2017. Are developers aware of the architectural impact of their changes?. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering. IEEE Press, 95--105.
[46]
J Palsberg. 1991. Object-oriented type inference. In Proc. OOPSLA'91. 146--161.
[47]
Chiragkumar Patel, Abdelwahab Hamou-Lhadj, and Juergen Rilling. 2009. Software clustering using dynamic analysis and static dependencies. In Software Maintenance and Reengineering, 2009. CSMR'09. 13th European Conference on. IEEE, 27--36.
[48]
Denys Poshyvanyk, Andrian Marcus, Rudolf Ferenc, and Tibor Gyimóthy. 2009. Using information retrieval based coupling measures for impact analysis. Empirical software engineering 14, 1 (2009), 5--32.
[49]
Kata Praditwong, Mark Harman, and Xin Yao. 2011. Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering 37, 2 (2011), 264--282.
[50]
Python. 2001--2020. https://docs.python.org/3.7/library/trace.html.
[51]
Python. 2001--2020. https://www.python.org/dev/peps/pep-0484/.
[52]
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in github. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 155--165.
[53]
Armin Rigo and Samuele Pedroni. 2006. PyPy's approach to virtual machine construction. In Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications. 944--953.
[54]
Thomas Rolfsnes, Leon Moonen, and David Binkley. 2017. Predicting relevance of change recommendations. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 694--705.
[55]
Structure101. 2004--2020. https://structure101.com/.
[56]
Yida Tao, Yingnong Dang, Tao Xie, Dongmei Zhang, and Sunghun Kim. 2012. How do software engineers understand code changes?: an exploratory study in industry. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering. ACM, 51.
[57]
Vassilios Tzerpos and Richard C Holt. 2000. Accd: an algorithm for comprehension-driven clustering. In Proceedings Seventh Working Conference on Reverse Engineering. IEEE, 258--267.
[58]
SciTools Understand. 1996--2020. https://scitools.com/.
[59]
Beibei Wang, Lin Chen, Wanwangying Ma, Zhifei Chen, and Baowen Xu. 2015. An empirical study on the impact of Python dynamic features on change-proneness. In SEKE. 134--139.
[60]
Ming Wen, Rongxin Wu, and Shing-Chi Cheung. 2016. Locus: Locating bugs from software changes. In 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 262--273.
[61]
Zhihua Wen and Vassilios Tzerpos. 2004. An effectiveness measure for software clustering algorithms. In Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004. IEEE, 194--203.
[62]
Sunny Wong, Yuanfang Cai, Miryung Kim, and Michael Dalton. 2011. Detecting software modularity violations. In Proceedings of the 33rd International Conference on Software Engineering. 411--420.
[63]
Sunny Wong, Yuanfang Cai, Giuseppe Valetto, Georgi Simeonov, and Kanwarpreet Sethi. 2009. Design rule hierarchies and parallelism in software development tasks. In Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering. IEEE Computer Society, 197--208.
[64]
Lu Xiao, Yuanfang Cai, and Rick Kazman. 2014. Design rule spaces: A new form of architecture insight. In Proceedings of the 36th International Conference on Software Engineering. ACM, 967--977.
[65]
Lu Xiao, Yuanfang Cai, Rick Kazman, Ran Mo, and Qiong Feng. 2016. Identifying and quantifying architectural debt. In Proceedings of the 38th International Conference on Software Engineering. ACM, 488--498.
[66]
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 607--618.
[67]
Annie TT Ying, Gail C Murphy, Raymond Ng, and Mark C Chu-Carroll. 2004. Predicting source code changes by mining change history. IEEE transactions on Software Engineering 30, 9 (2004), 574--586.
[68]
Thomas Zimmermann and Nachiappan Nagappan. 2008. Predicting defects using network analysis on dependency graphs. In 2008 ACM/IEEE 30th International Conference on Software Engineering. IEEE, 531--540.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASE '20: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering
December 2020
1449 pages
ISBN:9781450367684
DOI:10.1145/3324884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 January 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic typing
  2. empirical study
  3. possible dependency
  4. software architecture

Qualifiers

  • Research-article

Funding Sources

  • Ministry of Education Innovation Research Team
  • National Key R&D Program of China
  • National Natural Science Foundation of China

Conference

ASE '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 82 of 337 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)62
  • Downloads (Last 6 weeks)11
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media