skip to main content
10.1145/3597503.3639220acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Precise Sparse Abstract Execution via Cross-Domain Interaction

Published: 12 April 2024 Publication History

Abstract

Sparse static analysis offers a more scalable solution compared to its non-sparse counterpart. The basic idea is to first conduct a fast pointer analysis that over-approximates the value-flows and propagates the data-flow facts sparsely along only the pre-computed value-flows instead of all control flow points. Current sparse techniques focus on improving the scalability of the main analysis while maintaining its precision. However, their pointer analyses in both the offline and main phases are inherently imprecise because they rely solely on a single memory address domain without considering values from other domains like the interval domain. Consequently, this leads to conservative alias results, like arrayinsensitivity, which leaves substantial room for precision improvement of the main data-flow analysis.
This paper presents CSA, a new Cross-domain Sparse Abstract execution that interweaves correlations between values across multiple abstract domains (e.g., memory address and interval domains). Unlike traditional sparse analysis without cross-domain interaction, CSA performs correlation tracking by establishing implications of values from one domain to another. This correlation tracking enables online bidirectional refinement: CSA refines spurious alias relations using interval domain information and also enhances the precision of interval analysis with refined alias results. This contributes to increasingly improved precision and scalability as the main analysis progresses. To improve the efficiency of correlation tracking, we propose an equivalent correlation tracking approach that groups (virtual) memory addresses with equivalent implication results to minimize redundant value joins and storage associated.
We apply CSA on two common assertion-based checking clients, buffer overflow and null dereference detection. Experimental results show that CSA outperforms five open-source tools (Infer, Cppcheck, IKOS, Sparrow and KLEE) on ten large-scale projects. CSA finds 111 real bugs with 68.51% precision, detecting 46.05% more bugs than Infer and exhibiting 12.11% more precision rate than KLEE. CSA records 96.63% less false positives on real-world projects than the version without cross-domain interaction. CSA also exhibits an average speedup of 2.47× and an average memory reduction of 6.14× with equivalent correlation tracking.

References

[1]
2023. Darknet - Open Source Neural Networks in C. https://github.com/pjreddie/darknet
[2]
2023. MP4v2 - A C/C++ library to create, modify and read MP4 files. https://github.com/enzo1982/mp4v2/
[3]
2023. NanoMQ - An ultra-lightweight and blazing-fast MQTT broker for IoT edge. https://github.com/emqx/nanomq
[4]
2023. Redis - The open source, in-memory data store used by millions of developers as a database, cache, streaming engine, and message broker. https://github.com/redis/redis/
[5]
2023. RIOT - The friendly OS for IoT. https://github.com/RIOT-OS/RIOT
[6]
2023. Teeworlds - A retro multiplayer shooter. https://teeworlds.com/
[7]
2023. Tmux - tmux source code. https://github.com/tmux/tmux
[8]
2023. YAJL - A fast streaming JSON parsing library in C. https://github.com/lloyd/yajl
[9]
Roberto Amadini, Graeme Gange, Peter Schachte, Harald Søndergaard, and Peter J Stuckey. 2020. Abstract interpretation, symbolic execution and constraints. In Recent Developments in the Design and Implementation of Programming Languages. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
[10]
Lars Ole Andersen. 1994. Program analysis and specialization for the C programming language. PhD Thesis, DIKU, University of Copenhagen (1994).
[11]
Roberto Bagnara, Patricia M. Hill, and Enea Zaffanella. 2006. Widening operators for powerset domains. International Journal on Software Tools for Technology Transfer 8, 4 (01 Aug 2006), 449--466.
[12]
George Balatsouras and Yannis Smaragdakis. 2016. Structure-Sensitive Points-To Analysis for C and C++. In SAS '16.
[13]
Mohamad Barbar and Yulei Sui. 2021. Hash Consed Points-To Sets. In Static Analysis: 28th International Symposium, SAS 2021, Chicago, IL, USA, October 17--19, 2021, Proceedings (Chicago, IL, USA). Springer-Verlag, Berlin, Heidelberg, 25--48.
[14]
Mohamad Barbar, Yulei Sui, and Shiping Chen. 2021. Object Versioning for Flow-Sensitive Pointer Analysis. In 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO '21). IEEE Computer Society, USA, 222--235.
[15]
François Bourdoncle. 1993. Efficient chaotic iteration strategies with widenings. In Formal Methods in Programming and Their Applications, Dines Bjørner, Manfred Broy, and Igor V. Pottosin (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 128--141.
[16]
Guillaume Brat, Jorge A. Navas, Nija Shi, and Arnaud Venet. 2014. IKOS: A Framework for Static Analysis Based on Abstract Interpretation. In Software Engineering and Formal Methods, Dimitra Giannakopoulou and Gwen Salaün (Eds.). Springer International Publishing, Cham, 271--277.
[17]
Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). USENIX Association, 209--224.
[18]
Xiao Cheng, Haoyu Wang, Jiayi Hua, Guoai Xu, and Yulei Sui. 2021. DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network. ACM Trans. Softw. Eng. Methodol. 30, 3, Article 38 (2021), 33 pages.
[19]
Xiao Cheng, Guanqin Zhang, Haoyu Wang, and Yulei Sui. 2022. Path-Sensitive Code Embedding via Contrastive Learning for Software Vulnerability Detection. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA '22). ACM.
[20]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical Memory Leak Detection Using Guarded Value-Flow Analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '07). Association for Computing Machinery.
[21]
Agostino Cortesi, Giulia Costantini, and Pietro Ferrara. 2013. A survey on product operators in abstract interpretation. arXiv preprint arXiv:1309.5146 (2013).
[22]
Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints (POPL '77). Association for Computing Machinery, New York, NY, USA, 238--252.
[23]
Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2007. Combination of Abstractions in the ASTRÉE Static Analyzer. In Advances in Computer Science - ASIAN 2006. Secure Software and Related Issues, Mitsu Okada and Ichiro Satoh (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 272--300.
[24]
Patrick Cousot, Radhia Cousot, and Laurent Mauborgne. 2011. The Reduced Product of Abstract Domains and the Combination of Decision Procedures. In Foundations of Software Science and Computational Structures, Martin Hofmann (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 456--472.
[25]
Patrick Cousot, Roberto Giacobazzi, and Francesco Ranzato. 2019. A2I: Abstract2 Interpretation. Proc. ACM Program. Lang. 3, POPL, Article 42 (jan 2019), 31 pages.
[26]
Patrick Cousot and Nicolas Halbwachs. 1978. Automatic Discovery of Linear Restraints among Variables of a Program. In Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (Tucson, Arizona) (POPL '78). Association for Computing Machinery, New York, NY, USA, 84--96.
[27]
Cppcheck. 2021. Cppcheck: A tool for static C/C++ code analysis. http://cppcheck.sourceforge.net/.
[28]
CWE-121 2023. CWE-121: Stack-based Buffer Overflow. https://cwe.mitre.org/data/definitions/121.html.
[29]
CWE-122 2023. CWE-122: Heap-based Buffer Overflow. https://cwe.mitre.org/data/definitions/122.html.
[30]
CWE-126 2023. CWE-126: Buffer Over-read. https://cwe.mitre.org/data/definitions/126.html.
[31]
CWE-476 2023. CWE-476: NULL Pointer Dereference. https://cwe.mitre.org/data/definitions/476.html.
[32]
R. Cytron, J. Ferrante, B.K. Rosen, M.N. Wegman, and F.K. Zadeck. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Transactions on Programming Languages and Systems 13, 4 (1991), 451--490.
[33]
Manuvir Das, Sorin Lerner, and Mark Seigle. 2002. ESP: Path-Sensitive Program Verification in Polynomial Time (PLDI '02). Association for Computing Machinery, New York, NY, USA, 57--68.
[34]
Jean-Christophe Filliâtre and Sylvain Conchon. 2006. Type-Safe Modular Hash-Consing. In Proceedings of the 2006 Workshop on ML (Portland, Oregon, USA) (ML '06). Association for Computing Machinery, New York, NY, USA, 12--19.
[35]
Roberto Giacobazzi and Francesco Ranzato. 1997. Refining and compressing abstract domains. In Automata, Languages and Programming, Pierpaolo Degano, Roberto Gorrieri, and Alberto Marchetti-Spaccamela (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 771--781.
[36]
Jean Goubault. 1994. Implementing functional languages with fast equality, sets and maps: an exercise in hash consing. Journées Francophones des Langages Applicatifs (JFLA'93) (1994), 222--238.
[37]
B. Hardekopf and C. Lin. 2011. Flow-sensitive pointer analysis for millions of lines of code. CGO '11 (2011), 289--298.
[38]
Infer. 2021. Facebook Infer: a tool to detect bugs in Java and C/C++/Objective-C code. https://fbinfer.com/.
[39]
C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis & transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004. 75--86.
[40]
Jacob Laurel, Rem Yang, Gagandeep Singh, and Sasa Misailovic. 2022. A Dual Number Abstraction for Static Analysis of Clarke Jacobians. Proc. ACM Program. Lang. 6, POPL, Article 56 (jan 2022), 30 pages.
[41]
Ondrej Lhoták and Kwok-Chiang Andrew Chung. 2011. Points-to Analysis with Efficient Strong Updates. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (Austin, Texas, USA) (POPL '11). Association for Computing Machinery, New York, NY, USA, 3--16.
[42]
Magnus Madsen and Anders Møller. 2014. Sparse Dataflow Analysis with Pointers and Reachability. In Static Analysis, Markus Müller-Olm and Helmut Seidl (Eds.). Springer International Publishing, Cham, 201--218.
[43]
Laurent Mauborgne and Xavier Rival. 2005. Trace Partitioning in Abstract Interpretation Based Static Analyzers. In Programming Languages and Systems, Mooly Sagiv (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 5--20.
[44]
Antoine Miné. 2006. The octagon abstract domain. Higher-Order and Symbolic Computation 19, 1 (01 Mar 2006), 31--100.
[45]
Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.
[46]
NIST 2023. NIST datasets. https://samate.nist.gov/SARD/test-suites/116.
[47]
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, Daejun Park, Jeehoon Kang, and Kwangkeun Yi. 2014. Global Sparse Analysis Framework. ACM Trans. Program. Lang. Syst. 36, 3, Article 8 (sep 2014), 44 pages.
[48]
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and Implementation of Sparse Global Analyses for C-like Languages. In Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (Beijing, China) (PLDI '12). Association for Computing Machinery, New York, NY, USA, 229--238.
[49]
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. The Sparrow static analyzer. https://opam.ocaml.org/packages/sparrow/.
[50]
Komal Pathade and Uday P. Khedker. 2018. Computing Partially Path-Sensitive MFP Solutions in Data Flow Analyses. In Proceedings of the 27th International Conference on Compiler Construction (Vienna, Austria) (CC 2018). Association for Computing Machinery, New York, NY, USA, 37--47.
[51]
Komal Pathade and Uday P. Khedker. 2019. Path Sensitive MFP Solutions in Presence of Intersecting Infeasible Control Flow Path Segments. In Proceedings of the 28th International Conference on Compiler Construction (Washington, DC, USA) (CC 2019). Association for Computing Machinery, New York, NY, USA, 159--169.
[52]
D.J. Pearce, P.H.J. Kelly, and C. Hankin. 2007. Efficient field-sensitive pointer analysis of C. ACM TOPLAS 30, 1 (2007), 4--es.
[53]
John H. Reif and Harry R. Lewis. 1977. Symbolic Evaluation and the Global Value Graph. In Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (Los Angeles, California) (POPL '77). Association for Computing Machinery, New York, NY, USA, 104--118.
[54]
P. Cousot. 2005. Abstract interpretation. (Feb.--May 2005). MIT course 16.399, http://web.mit.edu/16.399/www/.
[55]
Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. 2019. PhASAR: An Inter-procedural Static Analysis Framework for C/C++. In Tools and Algorithms for the Construction and Analysis of Systems, Tomáš Vojnar and Lijun Zhang (Eds.). Springer International Publishing, Cham, 393--410.
[56]
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. 2018. Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code. SIGPLAN Not. 53, 4 (jun 2018), 693--706.
[57]
Qingkai Shi, Peisen Yao, Rongxin Wu, and Charles Zhang. 2021. Path-Sensitive Sparse Analysis without Path Conditions. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (Virtual, Canada) (PLDI 2021). Association for Computing Machinery, New York, NY, USA, 930--943.
[58]
Gagandeep Singh, Markus Püschel, and Martin Vechev. 2015. Making Numerical Program Analysis Fast. SIGPLAN Not. 50, 6 (jun 2015), 303--313.
[59]
Yulei Sui, Xiao Cheng, Guanqin Zhang, and Haoyu Wang. 2020. Flow2Vec: Value-Flow-Based Precise Code Embedding. Proc. ACM Program. Lang. 4, OOPSLA, Article 233 (Nov. 2020), 27 pages.
[60]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-Flow Analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (Barcelona, Spain) (CC). ACM, New York, NY, USA, 265--266.
[61]
Yulei Sui, Ding Ye, and Jingling Xue. 2012. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the 2012 International Symposium on Software Testing and Analysis (ISSTA '12). ACM, 254--264.
[62]
Yulei Sui, Ding Ye, and Jingling Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Trans. Software Eng (TSE '14). 40, 2 (2014), 107--122.
[63]
Yulei Sui, Sen Ye, Jingling Xue, and Pen-Chung Yew. 2011. SPAS: Scalable Path-Sensitive Pointer Analysis on Full-Sparse SSA. Programming Languages and Systems (APLAS '11) (2011), 155--171.
[64]
Oskar Haarklou Veileborg, Georgian-Vlad Saioc, and Anders Møller. 2023. Detecting Blocking Errors in Go Programs Using Localized Abstract Interpretation. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (Rochester, MI, USA) (ASE '22). Association for Computing Machinery, New York, NY, USA, Article 32, 12 pages.
[65]
Mark N. Wegman and F. Kenneth Zadeck. 1991. Constant Propagation with Conditional Branches. ACM Trans. Program. Lang. Syst. 13, 2 (apr 1991), 181--210.
[66]
Cathrin Weiss, Cindy Rubio-González, and Ben Liblit. 2015. Database-Backed Program Analysis for Scalable Error Propagation. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1. 586--597.
[67]
Peisen Yao, Qingkai Shi, Heqing Huang, and Charles Zhang. 2021. Program Analysis via Efficient Symbolic Abstraction. Proc. ACM Program. Lang. 5, OOPSLA, Article 118 (oct 2021), 32 pages.
[68]
Hongtao Yu, Jingling Xue, Wei Huo, Xiaobing Feng, and Zhaoqing Zhang. 2010. Level by Level: Making Flow- and Context-Sensitive Pointer Analysis Scalable for Millions of Lines of Code. In Proceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization (Toronto, Ontario, Canada) (CGO '10). Association for Computing Machinery, New York, NY, USA, 218--229.
[69]
Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Guoqing Harry Xu, Linzhang Wang, and Xuandong Li. 2019. Grapple: A Graph System for Static Finite-State Property Checking of Large-Scale Systems Code. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) (EuroSys '19). Association for Computing Machinery, New York, NY, USA, Article 38, 17 pages.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
May 2024
2942 pages
ISBN:9798400702174
DOI:10.1145/3597503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Badges

Author Tags

  1. abstract execution
  2. sparse analysis
  3. cross-domain interaction

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)103
  • Downloads (Last 6 weeks)20
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media