research-article

Open access

Language-agnostic dynamic analysis of multilingual code: promises, pitfalls, and prospects

Authors:

Haipeng CaiAuthors Info & Claims

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1621 - 1626

https://doi.org/10.1145/3540250.3560880

Published: 09 November 2022 Publication History

Abstract

Analyzing multilingual code holistically is key to systematic quality assurance of real-world software which is mostly developed in multiple computer languages. Toward such analyses, state-of-the-art approaches propose an almost-fully language-agnostic methodology and apply it to dynamic dependence analysis/slicing of multilingual code, showing great promises. We investigated this methodology through a technical analysis followed by a replication study applying it to 10 real-world multilingual projects of diverse language combinations. Our results revealed critical practicality (i.e., having the levels of efficiency/scalability, precision, and extensibility to various language combinations for practical use) challenges to the methodology. Based on the results, we reflect on the underlying pitfalls of the language-agnostic design that leads to such challenges. Finally, looking forward to the prospects of dynamic analysis for multilingual code, we identify a new research direction towards better practicality and precision while not sacrificing extensibility much, as supported by preliminary results. The key takeaway is that pursuing fully language-agnostic analysis may be both impractical and unnecessary, and striving for a better balance between language independence and practicality may be more fruitful.

References

[1]

2021. https://github.com/lmacken/pyrasite

[2]

2021. https://github.com/OpenHFT/Java-Thread-Affinity

[3]

2021. https://github.com/kivy/pyjnius

[4]

2021. https://github.com/xerial/snappy-java

[5]

2021. https://github.com/yinwang0/pysonar2

[6]

2021. https://github.com/DEAP/deap

[7]

2021. https://github.com/real-logic/simple-binary-encoding

[8]

2021. https://github.com/google/brotli

[9]

2021. https://github.com/vert-x3/vertx-web

[10]

2021. https://github.com/mongodb/mongo

[11]

Mouna Abidi, Md Saidur Rahman, Moses Openja, and Foutse Khomh. 2021. Are multi-language design smells fault-prone? An empirical study. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 3 (2021), 1–56.

Digital Library

[12]

Steven Arzt, Tobias Kussmaul, and Eric Bodden. 2016. Towards cross-platform cross-language analysis with Soot. In ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP). 1–6.

Digital Library

[13]

David Binkley, Nicolas Gold, Mark Harman, Syed Islam, Jens Krinke, and Shin Yoo. 2014. ORBS: Language-independent program slicing. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 109–120.

Digital Library

[14]

David Binkley, Nicolas Gold, Syed Islam, Jens Krinke, and Shin Yoo. 2017. Tree-oriented vs. line-oriented Observation-Based Slicing. In IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). 21–30.

[15]

Haipeng Cai. 2018. Hybrid Program Dependence Approximation for Effective Dynamic Impact Prediction. IEEE Transactions on Software Engineering (TSE), 44 (2018), 334–364.

[16]

Haipeng Cai and Xiaoqin Fu. 2021. D2ABS: A framework for dynamic dependence analysis of distributed programs. IEEE Transactions on Software Engineering (TSE).

[17]

Haipeng Cai, Siyuan Jiang, Raul Santelices, Ying jie Zhang, and Yiji Zhang. 2014. SensA: Sensitivity Analysis for Quantitative Change-impact Prediction. In IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM). 165–174.

[18]

Haipeng Cai, Shiv Raj Pant, and Wen Li. 2020. Towards Learning Visual Semantics. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Visions and Reflections. 1537–1540.

[19]

Haipeng Cai and Raul Santelices. 2014. Diver: Precise dynamic impact analysis using dependence-based trace pruning. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (ASE). 343–348.

Digital Library

[20]

Haipeng Cai and Raul Santelices. 2015. Abstracting program dependencies using the method dependence graph. In IEEE International Conference on Software Quality, Reliability and Security (QRS). 49–58.

Digital Library

[21]

Haipeng Cai and Raul Santelices. 2015. A Comprehensive Study of the Predictive Accuracy of Dynamic Change-Impact Analysis. Journal of Systems and Software (JSS), 103 (2015), 248–265.

Digital Library

[22]

Haipeng Cai and Raul Santelices. 2015. A framework for cost-effective dependence-based dynamic impact analysis. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 231–240.

[23]

Haipeng Cai and Raul Santelices. 2015. TracerJD: Generic trace-based dynamic dependence analysis with fine-grained logging. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER). 489–493.

[24]

Haipeng Cai and Raul Santelices. 2016. Method-Level Program Dependence Abstraction and Its Application to Impact Analysis. Journal of Systems and Software (JSS), 122 (2016), 311–326.

Digital Library

[25]

Haipeng Cai, Raul Santelices, and Siyuan Jiang. 2016. Prioritizing Change Impacts via Semantic Dependence Quantification. IEEE Transactions on Reliability (TR), 65, 3 (2016), 1114–1132.

[26]

Haipeng Cai, Raul Santelices, and Douglas Thain. 2016. DiaPro: Unifying Dynamic Impact Analyses for Improved and Variable Cost-Effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM), 25, 2 (2016).

Digital Library

[27]

Haipeng Cai, Raul Santelices, and Tianyu Xu. 2014. Estimating the accuracy of dynamic change-impact analysis using sensitivity analysis. In 2014 Eighth International Conference on Software Security and Reliability (SERE). 48–57.

Digital Library

[28]

Haipeng Cai and Douglas Thain. 2016. DistIA: a cost-effective dynamic impact analysis for distributed programs. In Proceedings of IEEE/ACM International Conference on Automated Software Engineering (ASE). 344–355.

Digital Library

[29]

Ahmet Celik, Marko Vasic, Aleksandar Milicevic, and Milos Gligoric. 2017. Regression test selection across JVM boundaries. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 809–820.

Digital Library

[30]

Catarina Costa, Jair Figueiredo, Leonardo Murta, and Anita Sarma. 2016. TIPMerge: recommending experts for integrating changes across branches. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 523–534.

Digital Library

[31]

Catarina Costa, Jair Figueiredo, Anita Sarma, and Leonardo Murta. 2016. TIPMerge: recommending developers for merging branches. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 998–1002. Tool Demonstration

Digital Library

[32]

Cowbody Adventure. 2021. Android malware: com.tinker.gameone. https://github.com/ashishb/android-malware/tree/master/feabme

[33]

Xiaoqin Fu and Haipeng Cai. 2019. A Dynamic Taint Analyzer for Distributed Systems. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1115–1119. Tool Demonstration

Digital Library

[34]

Xiaoqin Fu and Haipeng Cai. 2019. Measuring Interprocess Communications in Distributed Systems. In IEEE/ACM International Conference on Program Comprehension (ICPC). 323–334.

[35]

Xiaoqin Fu and Haipeng Cai. 2021. FlowDist: Multi-Staged Refinement-Based Dynamic Information Flow Analysis for Distributed Software Systems. In 30th USENIX Security Symposium (USENIX Security). 2093–2110. isbn:978-1-939133-24-3

[36]

Xiaoqin Fu, Haipeng Cai, and Li Li. 2020. Dads: Dynamic Slicing Continuously-Running Distributed Programs with Budget Constraints. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1566–1570. Tool Demonstration

[37]

Xiaoqin Fu, Haipeng Cai, Wen Li, and Li Li. 2021. Seads: Scalable and Cost-Effective Dynamic Dependence Analysis of Distributed Systems via Reinforcement Learning. ACM Transactions on Software Engineering and Methodology (TOSEM), 30, 1 (2021), 10:1–10:45.

Digital Library

[38]

Xiaoqin Fu, Boxiang Lin, and Haipeng Cai. 2022. DistFax: A Toolkit for Measuring Interprocess Communications and Quality of Distributed Systems. In IEEE/ACM International Conference on Software Engineering (ICSE), Tool Demos. 51–55.

Digital Library

[39]

Manel Grichi, Mouna Abidi, Fehmi Jaafar, Ellis E Eghan, and Bram Adams. 2020. On the Impact of Interlanguage Dependencies in Multilanguage Systems Empirical Case Study on Java Native Interface Applications (JNI). IEEE Transactions on Reliability (TR), 70, 1 (2020), 428–440.

[40]

Sungjae Hwang, Sungho Lee, Jihoon Kim, and Sukyoung Ryu. 2021. JUSTGen: Effective Test Generation for Unspecified JNI Behaviors on JVMs. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). 1708–1718.

[41]

Capers Jones. 2010. Software engineering best practices. McGraw-Hill, Inc.

[42]

Patrick Lam, Eric Bodden, Ondrej Lhoták, and Laurie Hendren. 2011. Soot - a Java Bytecode Optimization Framework. In Cetus Users and Compiler Infrastructure Workshop.

[43]

Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization (CGO). 75.

[44]

Seongmin Lee, David Binkley, Robert Feldt, Nicolas Gold, and Shin Yoo. 2021. Observation-based approximate dependency modeling and its use for program slicing. Journal of Systems and Software (JSS), 179 (2021), 110988.

[45]

Seongmin Lee, David Binkley, Nicolas Gold, Syed Islam, Jens Krinke, and Shin Yoo. 2018. MOBS: multi-operator observation-based slicing using lexical approximation of program dependence. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings (ICSE-Companion). 302–303.

Digital Library

[46]

Sungho Lee, Hyogun Lee, and Sukyoung Ryu. 2020. Broadening Horizons of Multilingual Static Analysis: Semantic Summary Extraction from C Code for JNI Program Analysis. In 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE). 127–137.

Digital Library

[47]

Wen Li, Haipeng Cai, Yulei Sui, and David Manz. 2020. PCA: Memory Leak Detection using Partial Call-Path Analysis. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 1621–1625. Tool Demonstration

Digital Library

[48]

Wen Li, Li Li, and Haipeng Cai. 2022. On the Vulnerability Proneness of Multilingual Code. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE).

[49]

Wen Li, Li Li, and Haipeng Cai. 2022. PolyFax: A Toolkit for Characterizing Multi-Language Software. In ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Tool Demonstration

[50]

Wen Li, Na Meng, Li Li, and Haipeng Cai. 2021. Understanding Language Selection in Multi-Language Software Projects on GitHub. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 256–257.

[51]

Wen Li, Jiang Ming, Xiapu Luo, and Haipeng Cai. 2022. PolyCruise: A Cross-Language Dynamic Information Flow Analysis. In 31st USENIX Security Symposium (USENIX Security 22). Boston, MA. 2513–2530. isbn:978-1-939133-31-1

[52]

Philip Mayer and Alexander Bauer. 2015. An empirical analysis of the utilization of multiple programming languages in open source projects. In The International Conference on Evaluation and Assessment in Software Engineering (EASE). 1–10.

Digital Library

[53]

Philip Mayer and Andreas Schroeder. 2012. Cross-language code analysis and refactoring. In Proceedings of IEEE Working Conference on Source Code Analysis and Manipulation (SCAM). 94–103.

Digital Library

[54]

Daniel L Moise and Kenny Wong. 2005. Extracting and Representing Cross-Language Dependencies in Diverse Software Systems. In Proceedings of the 12th Working Conference on Reverse Engineering (WCRE). 209–218.

Digital Library

[55]

National Vulnerability Database. 2021. CVE-2016-6691. https://nvd.nist.gov/vuln/detail/CVE-2016-6691

[56]

Hung Viet Nguyen, Christian Kästner, and Tien N Nguyen. 2015. Cross-language program slicing for dynamic web applications. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 369–380.

Digital Library

[57]

Baishakhi Ray, Daryl Posnett, Premkumar Devanbu, and Vladimir Filkov. 2017. A Large-scale Study of Programming Languages and Code Quality in GitHub. Communications of the ACM (CACM), 60, 10 (2017), 91–100.

Digital Library

[58]

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). 155–165.

Digital Library

[59]

Miloš Savić, Gordana Rakić, Zoran Budimac, and Mirjana Ivanović. 2014. A language-independent approach to the extraction of dependencies between source code entities. Information and Software Technology (IST), 56, 10 (2014), 1268–1288.

Digital Library

[60]

Dennis Strein, Hans Kratz, and Welf Lowe. 2006. Cross-language program analysis and refactoring. In Proceedings of IEEE Working Conference on Source Code Analysis and Manipulation (SCAM). 207–216.

Digital Library

[61]

The NumPy team. 2021. NumPy–the fundamental package needed for scientific computing with Python. https://github.com/numpy/numpy

[62]

Sander Tichelaar, Stéphane Ducasse, Serge Demeyer, and Oscar Nierstrasz. 2000. A meta-model for language-independent refactoring. In Principles of Software Evolution, 2000. Proceedings. International Symposium on. 154–164.

[63]

Federico Tomassetti and Marco Torchiano. 2014. An empirical assessment of polyglot-ism in GitHub. In The International Conference on Evaluation and Assessment in Software Engineering (EASE). 1–4.

Digital Library

Cited By

Li WLi G(2024)Research on Real-Time Semantic Understanding and Dynamic Evaluation of Artificial Intelligence Techniques in Multilingual EnvironmentsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-31669:1Online publication date: 11-Nov-2024
https://doi.org/10.2478/amns-2024-3166
Yang HNong YZhang TLuo XCai H(2024)Learning to Detect and Localize Multilingual BugsProceedings of the ACM on Software Engineering10.1145/36608041:FSE(2190-2213)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660804
Li WMarino AYang HMeng NLi LCai H(2024)How Are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual SoftwareACM Transactions on Software Engineering and Methodology10.1145/363196733:3(1-46)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3631967
Show More Cited By

Index Terms

Language-agnostic dynamic analysis of multilingual code: promises, pitfalls, and prospects

Index terms have been assigned to the content through auto-classification.

Recommendations

On the vulnerability proneness of multilingual code
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Software construction using multiple languages has long been a norm, yet it is still unclear if multilingual code construction has significant security implications and real security consequences. This paper aims to address this question with a large-...
Cross-language code search using static and dynamic analyses
ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch ...
Large Language Models Based Stemming for Information Retrieval: Promises, Pitfalls and Failures
SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Text stemming is a natural language processing technique that is used to reduce words to their base form, also known as the root form. In Information Retrieval (IR), stemming is used in keyword-based matching pipelines to normalise text before indexing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 2022

1822 pages

ISBN:9781450394130

DOI:10.1145/3540250

General Chair:
Abhik Roychoudhury
National University of Singapore, Singapore
,
Program Chairs:
Cristian Cadar
Imperial College London, UK
,
Miryung Kim
University of California at Los Angeles, USA

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 November 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ESEC/FSE '22

Sponsor:

ESEC/FSE '22: 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

November 14 - 18, 2022

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
499
Total Downloads

Downloads (Last 12 months)225
Downloads (Last 6 weeks)26

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li WLi G(2024)Research on Real-Time Semantic Understanding and Dynamic Evaluation of Artificial Intelligence Techniques in Multilingual EnvironmentsApplied Mathematics and Nonlinear Sciences10.2478/amns-2024-31669:1Online publication date: 11-Nov-2024
https://doi.org/10.2478/amns-2024-3166
Yang HNong YZhang TLuo XCai H(2024)Learning to Detect and Localize Multilingual BugsProceedings of the ACM on Software Engineering10.1145/36608041:FSE(2190-2213)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660804
Li WMarino AYang HMeng NLi LCai H(2024)How Are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual SoftwareACM Transactions on Software Engineering and Methodology10.1145/363196733:3(1-46)Online publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1145/3631967
Yang HNong YWang SCai H(2024)Multi-Language Software Development: Issues, Challenges, and SolutionsIEEE Transactions on Software Engineering10.1109/TSE.2024.335825850:3(512-533)Online publication date: Mar-2024
https://doi.org/10.1109/TSE.2024.3358258
Li WRuan JYi GCheng LLuo XCai HCalandrino JTroncoso C(2023)POLYFUZZProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620315(1379-1396)Online publication date: 9-Aug-2023
https://dl.acm.org/doi/10.5555/3620237.3620315
Li WYang HLuo XCheng LCai HMeng WJensen CCremers CKirda E(2023)PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative FuzzingProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security10.1145/3576915.3623166(1645-1659)Online publication date: 15-Nov-2023
https://dl.acm.org/doi/10.1145/3576915.3623166
Yang HLian WWang SCai HGrundy JPollock LPenta M(2023)Demystifying Issues, Challenges, and Solutions for Multilingual Software DevelopmentProceedings of the 45th International Conference on Software Engineering10.1109/ICSE48619.2023.00157(1840-1852)Online publication date: 14-May-2023
https://dl.acm.org/doi/10.1109/ICSE48619.2023.00157
Li WLi LCai HRoychoudhury ACadar CKim M(2022)PolyFax: a toolkit for characterizing multi-language softwareProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558925(1662-1666)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3558925
Li WLi LCai HRoychoudhury ACadar CKim M(2022)On the vulnerability proneness of multilingual codeProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549173(847-859)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549173

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents