skip to main content
10.1145/3540250.3560880acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article
Open access

Language-agnostic dynamic analysis of multilingual code: promises, pitfalls, and prospects

Published: 09 November 2022 Publication History

Abstract

Analyzing multilingual code holistically is key to systematic quality assurance of real-world software which is mostly developed in multiple computer languages. Toward such analyses, state-of-the-art approaches propose an almost-fully language-agnostic methodology and apply it to dynamic dependence analysis/slicing of multilingual code, showing great promises. We investigated this methodology through a technical analysis followed by a replication study applying it to 10 real-world multilingual projects of diverse language combinations. Our results revealed critical practicality (i.e., having the levels of efficiency/scalability, precision, and extensibility to various language combinations for practical use) challenges to the methodology. Based on the results, we reflect on the underlying pitfalls of the language-agnostic design that leads to such challenges. Finally, looking forward to the prospects of dynamic analysis for multilingual code, we identify a new research direction towards better practicality and precision while not sacrificing extensibility much, as supported by preliminary results. The key takeaway is that pursuing fully language-agnostic analysis may be both impractical and unnecessary, and striving for a better balance between language independence and practicality may be more fruitful.

References

[1]
2021. https://github.com/lmacken/pyrasite
[2]
2021. https://github.com/OpenHFT/Java-Thread-Affinity
[3]
2021. https://github.com/kivy/pyjnius
[4]
2021. https://github.com/xerial/snappy-java
[5]
2021. https://github.com/yinwang0/pysonar2
[6]
2021. https://github.com/DEAP/deap
[7]
2021. https://github.com/real-logic/simple-binary-encoding
[8]
2021. https://github.com/google/brotli
[9]
2021. https://github.com/vert-x3/vertx-web
[10]
2021. https://github.com/mongodb/mongo
[11]
Mouna Abidi, Md Saidur Rahman, Moses Openja, and Foutse Khomh. 2021. Are multi-language design smells fault-prone? An empirical study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 3 (2021), 1–56.
[12]
Steven Arzt, Tobias Kussmaul, and Eric Bodden. 2016. Towards cross-platform cross-language analysis with Soot. In ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP). 1–6.
[13]
David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo. 2014. ORBS: Language-independent program slicing. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 109–120.
[14]
David Binkley, Nicolas Gold, Syed Islam, Jens Krinke, and Shin Yoo. 2017. Tree-oriented vs. line-oriented Observation-Based Slicing. In IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). 21–30.
[15]
Haipeng Cai. 2018. Hybrid Program Dependence Approximation for Effective Dynamic Impact Prediction. IEEE Transactions on Software Engineering (TSE), 44 (2018), 334–364.
[16]
Haipeng Cai and Xiaoqin Fu. 2021. D2ABS: A framework for dynamic dependence analysis of distributed programs. IEEE Transactions on Software Engineering (TSE).
[17]
Haipeng Cai, Siyuan Jiang, Raul Santelices, Ying jie Zhang, and Yiji Zhang. 2014. SensA: Sensitivity Analysis for Quantitative Change-impact Prediction. In IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). 165–174.
[18]
Haipeng Cai, Shiv Raj Pant, and Wen Li. 2020. Towards Learning Visual Semantics. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Visions and Reflections. 1537–1540.
[19]
Haipeng Cai and Raul Santelices. 2014. Diver: Precise dynamic impact analysis using dependence-based trace pruning. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (ASE). 343–348.
[20]
Haipeng Cai and Raul Santelices. 2015. Abstracting program dependencies using the method dependence graph. In IEEE International Conference on Software Quality, Reliability and Security (QRS). 49–58.
[21]
Haipeng Cai and Raul Santelices. 2015. A Comprehensive Study of the Predictive Accuracy of Dynamic Change-Impact Analysis. Journal of Systems and Software (JSS), 103 (2015), 248–265.
[22]
Haipeng Cai and Raul Santelices. 2015. A framework for cost-effective dependence-based dynamic impact analysis. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 231–240.
[23]
Haipeng Cai and Raul Santelices. 2015. TracerJD: Generic trace-based dynamic dependence analysis with fine-grained logging. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 489–493.
[24]
Haipeng Cai and Raul Santelices. 2016. Method-Level Program Dependence Abstraction and Its Application to Impact Analysis. Journal of Systems and Software (JSS), 122 (2016), 311–326.
[25]
Haipeng Cai, Raul Santelices, and Siyuan Jiang. 2016. Prioritizing Change Impacts via Semantic Dependence Quantification. IEEE Transactions on Reliability (TR), 65, 3 (2016), 1114–1132.
[26]
Haipeng Cai, Raul Santelices, and Douglas Thain. 2016. DiaPro: Unifying Dynamic Impact Analyses for Improved and Variable Cost-Effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM), 25, 2 (2016).
[27]
Haipeng Cai, Raul Santelices, and Tianyu Xu. 2014. Estimating the accuracy of dynamic change-impact analysis using sensitivity analysis. In 2014 Eighth International Conference on Software Security and Reliability (SERE). 48–57.
[28]
Haipeng Cai and Douglas Thain. 2016. DistIA: a cost-effective dynamic impact analysis for distributed programs. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE). 344–355.
[29]
Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression test selection across JVM boundaries. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 809–820.
[30]
Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIPMerge: recommending experts for integrating changes across branches. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 523–534.
[31]
Catarina Costa, Jair Figueiredo, Anita Sarma, and Leonardo Murta. 2016. TIPMerge: recommending developers for merging branches. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 998–1002. Tool Demonstration
[32]
Cowbody Adventure. 2021. Android malware: com.tinker.gameone. https://github.com/ashishb/android-malware/tree/master/feabme
[33]
Xiaoqin Fu and Haipeng Cai. 2019. A Dynamic Taint Analyzer for Distributed Systems. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1115–1119. Tool Demonstration
[34]
Xiaoqin Fu and Haipeng Cai. 2019. Measuring Interprocess Communications in Distributed Systems. In IEEE/ACM International Conference on Program Comprehension (ICPC). 323–334.
[35]
Xiaoqin Fu and Haipeng Cai. 2021. FlowDist: Multi-Staged Refinement-Based Dynamic Information Flow Analysis for Distributed Software Systems. In 30th USENIX Security Symposium (USENIX Security). 2093–2110. isbn:978-1-939133-24-3
[36]
Xiaoqin Fu, Haipeng Cai, and Li Li. 2020. Dads: Dynamic Slicing Continuously-Running Distributed Programs with Budget Constraints. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1566–1570. Tool Demonstration
[37]
Xiaoqin Fu, Haipeng Cai, Wen Li, and Li Li. 2021. Seads: Scalable and Cost-Effective Dynamic Dependence Analysis of Distributed Systems via Reinforcement Learning. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 1 (2021), 10:1–10:45.
[38]
Xiaoqin Fu, Boxiang Lin, and Haipeng Cai. 2022. DistFax: A Toolkit for Measuring Interprocess Communications and Quality of Distributed Systems. In IEEE/ACM International Conference on Software Engineering (ICSE), Tool Demos. 51–55.
[39]
Manel Grichi, Mouna Abidi, Fehmi Jaafar, Ellis E Eghan, and Bram Adams. 2020. On the Impact of Interlanguage Dependencies in Multilanguage Systems Empirical Case Study on Java Native Interface Applications (JNI). IEEE Transactions on Reliability (TR), 70, 1 (2020), 428–440.
[40]
Sungjae Hwang, Sungho Lee, Jihoon Kim, and Sukyoung Ryu. 2021. JUSTGen: Effective Test Generation for Unspecified JNI Behaviors on JVMs. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1708–1718.
[41]
Capers Jones. 2010. Software engineering best practices. McGraw-Hill, Inc.
[42]
Patrick Lam, Eric Bodden, Ondrej Lhoták, and Laurie Hendren. 2011. Soot - a Java Bytecode Optimization Framework. In Cetus Users and Compiler Infrastructure Workshop.
[43]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization (CGO). 75.
[44]
Seongmin Lee, David Binkley, Robert Feldt, Nicolas Gold, and Shin Yoo. 2021. Observation-based approximate dependency modeling and its use for program slicing. Journal of Systems and Software (JSS), 179 (2021), 110988.
[45]
Seongmin Lee, David Binkley, Nicolas Gold, Syed Islam, Jens Krinke, and Shin Yoo. 2018. MOBS: multi-operator observation-based slicing using lexical approximation of program dependence. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE-Companion). 302–303.
[46]
Sungho Lee, Hyogun Lee, and Sukyoung Ryu. 2020. Broadening Horizons of Multilingual Static Analysis: Semantic Summary Extraction from C Code for JNI Program Analysis. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 127–137.
[47]
Wen Li, Haipeng Cai, Yulei Sui, and David Manz. 2020. PCA: Memory Leak Detection using Partial Call-Path Analysis. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1621–1625. Tool Demonstration
[48]
Wen Li, Li Li, and Haipeng Cai. 2022. On the Vulnerability Proneness of Multilingual Code. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).
[49]
Wen Li, Li Li, and Haipeng Cai. 2022. PolyFax: A Toolkit for Characterizing Multi-Language Software. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Tool Demonstration
[50]
Wen Li, Na Meng, Li Li, and Haipeng Cai. 2021. Understanding Language Selection in Multi-Language Software Projects on GitHub. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 256–257.
[51]
Wen Li, Jiang Ming, Xiapu Luo, and Haipeng Cai. 2022. PolyCruise: A Cross-Language Dynamic Information Flow Analysis. In 31st USENIX Security Symposium (USENIX Security 22). Boston, MA. 2513–2530. isbn:978-1-939133-31-1
[52]
Philip Mayer and Alexander Bauer. 2015. An empirical analysis of the utilization of multiple programming languages in open source projects. In The International Conference on Evaluation and Assessment in Software Engineering (EASE). 1–10.
[53]
Philip Mayer and Andreas Schroeder. 2012. Cross-language code analysis and refactoring. In Proceedings of IEEE Working Conference on Source Code Analysis and Manipulation (SCAM). 94–103.
[54]
Daniel L Moise and Kenny Wong. 2005. Extracting and Representing Cross-Language Dependencies in Diverse Software Systems. In Proceedings of the 12th Working Conference on Reverse Engineering (WCRE). 209–218.
[55]
National Vulnerability Database. 2021. CVE-2016-6691. https://nvd.nist.gov/vuln/detail/CVE-2016-6691
[56]
Hung Viet Nguyen, Christian Kästner, and Tien N Nguyen. 2015. Cross-language program slicing for dynamic web applications. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 369–380.
[57]
Baishakhi Ray, Daryl Posnett, Premkumar Devanbu, and Vladimir Filkov. 2017. A Large-scale Study of Programming Languages and Code Quality in GitHub. Communications of the ACM (CACM), 60, 10 (2017), 91–100.
[58]
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 155–165.
[59]
Miloš Savić, Gordana Rakić, Zoran Budimac, and Mirjana Ivanović. 2014. A language-independent approach to the extraction of dependencies between source code entities. Information and Software Technology (IST), 56, 10 (2014), 1268–1288.
[60]
Dennis Strein, Hans Kratz, and Welf Lowe. 2006. Cross-language program analysis and refactoring. In Proceedings of IEEE Working Conference on Source Code Analysis and Manipulation (SCAM). 207–216.
[61]
The NumPy team. 2021. NumPy–the fundamental package needed for scientific computing with Python. https://github.com/numpy/numpy
[62]
Sander Tichelaar, Stéphane Ducasse, Serge Demeyer, and Oscar Nierstrasz. 2000. A meta-model for language-independent refactoring. In Principles of Software Evolution, 2000. Proceedings. International Symposium on. 154–164.
[63]
Federico Tomassetti and Marco Torchiano. 2014. An empirical assessment of polyglot-ism in GitHub. In The International Conference on Evaluation and Assessment in Software Engineering (EASE). 1–4.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2022
1822 pages
ISBN:9781450394130
DOI:10.1145/3540250
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic analysis
  2. multi-language software
  3. multilingual code

Qualifiers

  • Research-article

Funding Sources

Conference

ESEC/FSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)225
  • Downloads (Last 6 weeks)26
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media