skip to main content
10.1145/3192366.3192418acmconferencesArticle/Chapter ViewAbstractPublication PagespldiConference Proceedingsconference-collections
research-article

Pinpoint: fast and precise sparse value flow analysis for million lines of code

Published: 11 June 2018 Publication History

Abstract

When dealing with millions of lines of code, we still cannot have the cake and eat it: sparse value-flow analysis is powerful in checking source-sink problems, but existing work cannot escape from the “pointer trap” – a precise points-to analysis limits its scalability and an imprecise one seriously undermines its precision. We present Pinpoint, a holistic approach that decomposes the cost of high-precision points-to analysis by precisely discovering local data dependence and delaying the expensive inter-procedural analysis through memorization. Such memorization enables the on-demand slicing of only the necessary inter-procedural data dependence and path feasibility queries, which are then solved by a costly SMT solver. Experiments show that Pinpoint can check programs such as MySQL (around 2 million lines of code) within 1.5 hours. The overall false positive rate is also very low (14.3% - 23.6%). Pinpoint has discovered over forty real bugs in mature and extensively checked open source systems. And the implementation of Pinpoint and all experimental results are freely available.

Supplementary Material

WEBM File (p693-shi.webm)

References

[1]
Alex Aiken, Suhabe Bugrara, Isil Dillig, Thomas Dillig, Brian Hackett, and Peter Hawkins. 2006. The Saturn Program Analysis System. Stanford University.
[2]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Notices 49, 6 (2014), 259–269.
[3]
D. Babic and A. Hu. 2008. Calysto: Scalable and Precise Extended Static Checking. In 2008 ACM/IEEE 30th International Conference on Software Engineering (ICSE 2008). IEEE, 211–220.
[4]
Thomas Ball and Sriram K. Rajamani. 2002. The SLAM Project: Debugging System Software via Static Analysis. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’02). ACM, 1–3.
[5]
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. 2010. A few billion lines of code later: using static analysis to find bugs in the real world. Commun. ACM 53, 2 (2010), 66–75.
[6]
Frederick E Boland Jr and Paul E Black. 2012. The Juliet 1.1 C/C++ and Java Test Suite. Computer (IEEE Computer) 45, 10 (2012).
[7]
Juan Caballero, Gustavo Grieco, Mark Marron, and Antonio Nappa. 2012. Undangle: early detection of dangling pointers in use-after-free and double-free vulnerabilities. In Proceedings of the 2012 International Symposium on Software Testing and Analysis. ACM, 133–143.
[8]
Sagar Chaki, Edmund M Clarke, Alex Groce, Somesh Jha, and Helmut Veith. 2004. Modular verification of software components in C. IEEE Transactions on Software Engineering 30, 6 (2004), 388–402.
[9]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. 2007. Practical Memory Leak Detection Using Guarded Value-flow Analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, 480–491.
[10]
Chia Yuan Cho, Vijay D’Silva, and Dawn Song. 2013. Blitz: Compositional bounded model checking for real-world programs. In Automated Software Engineering (ASE), 2013 IEEE/ACM 28th International Conference on. IEEE, 136–146.
[11]
Edmund Clarke, Daniel Kroening, Natasha Sharygina, and Karen Yorav. 2004. Predicate Abstraction of ANSI-C Programs Using SAT. Formal Methods in System Design 25, 2 (2004), 105–127.
[12]
Edmund Clarke, Daniel Kroening, and Karen Yorav. 2003. Behavioral consistency of C and Verilog programs using bounded model checking. In Proceedings of the 40th annual Design Automation Conference. ACM, 368–371.
[13]
Manuvir Das, Sorin Lerner, and Mark Seigle. 2002. ESP: Path-sensitive Program Verification in Polynomial Time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI ’02). ACM, 57–68.
[14]
Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337–340.
[15]
Jeffrey Dean, David Grove, and Craig Chambers. 1995. Optimization of object-oriented programs using static class hierarchy analysis. In European Conference on Object-Oriented Programming. Springer, 77– 101.
[16]
David Dewey, Bradley Reaves, and Patrick Traynor. 2015. Uncovering Use-After-Free Conditions in Compiled Code. In Availability, Reliability and Security (ARES), 2015 10th International Conference on. IEEE, 90–99.
[17]
Isil Dillig, Thomas Dillig, and Alex Aiken. 2008. Sound, complete and scalable path-sensitive analysis. In ACM SIGPLAN Notices, Vol. 43. ACM, 270–280.
[18]
Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. 2011. Precise and compact modular procedure summaries for heap manipulating programs. In ACM SIGPLAN Notices, Vol. 46. ACM, 567–577.
[19]
Lisa Nguyen Quang Do, Karim Ali, Benjamin Livshits, Eric Bodden, Justin Smith, and Emerson Murphy-Hill. 2017. Just-in-time static analysis. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM, 307–317.
[20]
N. Dor, S. Adams, M. Das, and Z. Yang. 2004. Software Validation via scalable path-sensitive value flow analysis. In Proceedings of the 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’04). ACM, 12–22.
[21]
Josselin Feist, Laurent Mounier, and Marie-Laure Potet. 2014. Statically detecting use after free on binary code. Journal of Computer Virology and Hacking Techniques 10, 3 (2014), 211–217.
[22]
Jeanne Ferrante, Karl J. Ottenstein, and Joe D. Warren. 1987. The Program Dependence Graph and Its Use in Optimization. ACM Trans. Program. Lang. Syst. 9, 3 (1987), 319–349.
[23]
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: Unified Pointsto and Taint Analysis. Proc. ACM Program. Lang. 1, OOPSLA (2017), 102:1–102:28.
[24]
Samuel Guyer and Calvin Lin. 2003. Client-driven pointer analysis. Static Analysis (2003), 1073–1073.
[25]
Samuel Z Guyer and Calvin Lin. 2005. Error checking with clientdriven pointer analysis. Science of Computer Programming 58, 1-2 (2005), 83–114.
[26]
Nevin Heintze and Olivier Tardieu. 2001. Demand-driven pointer analysis. In ACM SIGPLAN Notices, Vol. 36. ACM, 24–34.
[27]
Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. 2002. Lazy Abstraction. In Proceedings of the 29th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’02). ACM, 58–70.
[28]
Michael Hind. 2001. Pointer analysis: Haven’t we solved this problem yet?. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 54–61.
[29]
David Hovemeyer and William Pugh. 2007. Finding more null pointer bugs, but not too many. In Proceedings of the 7th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering. ACM, 9–14.
[30]
David Hovemeyer, Jaime Spacco, and William Pugh. 2005. Evaluating and tuning a static analysis to find null pointer bugs. In ACM SIGSOFT Software Engineering Notes, Vol. 31. ACM, 13–19.
[31]
James C King. 1976. Symbolic execution and program testing. Commun. ACM 19, 7 (1976), 385–394.
[32]
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization. IEEE, 75.
[33]
Chris Lattner, Andrew Lenharth, and Vikram Adve. 2007. Making context-sensitive points-to analysis with heap cloning practical for the real world. ACM SIGPLAN Notices 42, 6 (2007), 278–289.
[34]
Benjamin Livshits, Manu Sridharan, Yannis Smaragdakis, Ondřej Lhoták, J Nelson Amaral, Bor-Yuh Evan Chang, Samuel Z Guyer, Uday P Khedker, Anders Møller, and Dimitrios Vardoulakis. 2015. In defense of soundiness: a manifesto. Commun. ACM 58, 2 (2015), 44–46.
[35]
V Benjamin Livshits and Monica S Lam. 2003. Tracking pointers with path and context sensitivity for bug detection in C programs. ACM SIGSOFT Software Engineering Notes 28, 5 (2003), 317–326.
[36]
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. 2013. Scalable and incremental software bug detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. ACM, 554–564.
[37]
Nomair A Naeem and Ondrej Lhoták. 2011. Faster Alias Set Analysis Using Summaries. In CC. Springer, 82–103.
[38]
Hakjoo Oh, Kihong Heo, Wonchan Lee, Woosuk Lee, and Kwangkeun Yi. 2012. Design and implementation of sparse global analyses for C-like languages. In ACM SIGPLAN Notices, Vol. 47. ACM, 229–238.
[39]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages. ACM, 49–61.
[40]
Wolf-Steffen Rödiger. 2011. Merging Static Analysis and model checking for improved security vulnerability detection. Ph.D. Dissertation. Master thesis, Dept. of Com. Sc. Augsburg University.
[41]
Diptikalyan Saha and CR Ramakrishnan. 2005. Incremental and demand-driven points-to analysis using logic programming. In Proceedings of the 7th ACM SIGPLAN international conference on Principles and practice of declarative programming. ACM, 117–128.
[42]
LA Sandra. 1994. PHB Practical Handbook of Curve Fitting.
[43]
G Snelting, T Robschink, and J Krinke. 2006. Efficient Path Conditions in Dependence Graphs for Software Safety Analysis. ACM Transactions on Software Engineering and Methodology (TOSEM) 15, 4 (2006), 410– 457.
[44]
Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven points-to analysis for Java. In ACM SIGPLAN Notices, Vol. 40. ACM, 59–76.
[45]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural static value-flow analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction. ACM, 265–266.
[46]
Yulei Sui and Jingling Xue. 2016. SVF: Interprocedural Static Value-flow Analysis in LLVM. In Proceedings of the 25th International Conference on Compiler Construction (CC 2016). ACM, 265–266.
[47]
Y. Sui, D. Ye, and J. Xue. 2014. Detecting Memory Leaks Statically with Full-Sparse Value-Flow Analysis. IEEE Transactions on Software Engineering 40, 2 (2014), 107–122.
[48]
Peng Tu and David Padua. 1995. Efficient building and placing of gating functions. ACM SIGPLAN Notices 30, 6 (1995), 47–55.
[49]
Mark N Wegman and F Kenneth Zadeck. 1991. Constant propagation with conditional branches. ACM Transactions on Programming Languages and Systems (TOPLAS) 13, 2 (1991), 181–210.
[50]
John Whaley and Monica S Lam. 2004. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In ACM SIGPLAN Notices, Vol. 39. ACM, 131–144.
[51]
Robert P Wilson and Monica S Lam. 1995. Efficient context-sensitive pointer analysis for C programs. Vol. 30. ACM.
[52]
Yichen Xie and Alex Aiken. 2005. Context-and path-sensitive memory leak detection. In ACM SIGSOFT Software Engineering Notes, Vol. 30. ACM, 115–125.
[53]
Yichen Xie and Alex Aiken. 2005. Scalable Error Detection Using Boolean Satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’05). ACM, 351–363.
[54]
Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011. Demand-driven context-sensitive alias analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis. ACM, 155– 165.
[55]
Xin Zheng and Radu Rugina. 2008. Demand-driven alias analysis for C. ACM SIGPLAN Notices 43, 1 (2008), 197–208.

Cited By

View all

Index Terms

  1. Pinpoint: fast and precise sparse value flow analysis for million lines of code

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2018
    825 pages
    ISBN:9781450356985
    DOI:10.1145/3192366
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Badges

    Author Tags

    1. Sparse program analysis
    2. error detection
    3. path-sensitive analysis

    Qualifiers

    • Research-article

    Conference

    PLDI '18
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 406 of 2,067 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)207
    • Downloads (Last 6 weeks)16
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media