skip to main content
10.1145/3368089.3417059acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Scaling static taint analysis to industrial SOA applications: a case study at Alibaba

Published: 08 November 2020 Publication History

Abstract

In Alibaba, we have seen a growing demand for tracing data flow for scenarios such as data leak detection, change governance, and data consistency checking. Static taint analysis is a technique for such problems, and many approaches are proposed for high scalability and precision. This paper shares our experience in applying taint analysis in Alibaba. In particular, we find that the state-of-the-art taint analysis tool, FlowDroid, does not work well in our cases because our applications make heavy use of libraries, native methods and enterprise-specific frameworks, which impose two major challenges, scalability and implicit dependency, to FlowDroid. This paper presents ANTaint to address these problems. ANTaint improves scalability by expanding the call graph and applying taint propagation on demand for libraries, which account for majority of the program execution but only a small fraction propagates taints. To improve accuracy, we ensure to build a sound call graph with its core part having certain accuracy, and providing a more precise taint propagation model. The practice of applying ANTaint in the company workload validates the idea. According to an experiment on 60 production cases, ANTaint is correct for 95% of the cases (precision: 95%, recall: 98%) while FlowDroid is 13%. ANTaint takes 65% less time and none of the cases run out of memory with 32 GB limitation.

Supplementary Material

Auxiliary Teaser Video (fse20ind-p80-p-teaser.mp4)
Scaling Static Taint Analysis to Industrial SOA Applications: A Case Study at Alibaba
Auxiliary Presentation Video (fse20ind-p80-p-video.mp4)
Scaling Static Taint Analysis to Industrial SOA Applications: A Case Study at Alibaba

References

[1]
2020. aliflow micro bench. https://github.com/af-static-toolchains/aliflow-microbenchmark.
[2]
2020. Aspect-oriented_programming. https://en.wikipedia.org/wiki/Aspectoriented_programming.
[3]
2020. Camel 3.0. https://camel.apache.org/.
[4]
2020. Class_hierarchy. https://en.wikipedia.org/wiki/Class_hierarchy.
[5]
2020. Data access object. https://en.wikipedia.org/wiki/Data_access_object.
[6]
2020. Data breach. https://en.wikipedia.org/wiki/Data_breach.
[7]
2020. Datalog. https://en.wikipedia.org/wiki/Datalog.
[8]
2020. Plain old Java object. https://en.wikipedia.org/wiki/Plain_old_Java_object.
[9]
2020. Program Slicing. https://en.wikipedia.org/wiki/Program_slicing.
[10]
2020. securitystatistics. https://www.varonis.com/blog/cybersecurity-statistics/.
[11]
2020. SOFARPC. https://github.com/sofastack/sofa-rpc.
[12]
2020. Spring cloud. https://spring.io/projects/spring-cloud.
[13]
Antoniadis Anastasios, Filippakis Nikos, Krishnan Paddy, Ramesh Raghavendra, Allen Nicholas, Allen Nicholas, and Yannis Smaragdakis. 2020. Static analysis of Java enterprise applications: frameworks and caches, the elephants in the room. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '20). 794-807.
[14]
S. Arzt and E. Bodden. 2016. StubDroid: Automatic Inference of Precise Data-Flow Summaries for the Android Framework. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 725-735.
[15]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. 2014. FlowDroid: Precise Context, Flow, Field, Object-Sensitive and Lifecycle-Aware Taint Analysis for Android Apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '14). Association for Computing Machinery, New York, NY, USA, 259-269. https: //doi.org/10.1145/2594291.2594299
[16]
Sam Blackshear, Alexandra Gendreau, and Bor-Yuh Evan Chang. 2015. Droidel: A General Approach to Android Framework Modeling. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (SOAP 2015 ). Association for Computing Machinery, New York, NY, USA, 19-25. https://doi.org/10.1145/2771284.2771288
[17]
Eric Bodden. 2012. Inter-Procedural Data-Flow Analysis with IFDS/IDE and Soot. In Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program Analysis (SOAP '12). Association for Computing Machinery, New York, NY, USA, 3-8. https://doi.org/10.1145/2259051.2259052
[18]
James Clause, Ioannis Doudalis, Alessandro Orso, and Milos Prvulovic. 2007. Efective Memory Protection Using Dynamic Tainting. In Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering (ASE '07). Association for Computing Machinery, New York, NY, USA, 284-292. https://doi.org/10.1145/1321631.1321673
[19]
Isil Dillig, Thomas Dillig, and Alex Aiken. 2011. Precise Reasoning for Programs Using Containers. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '11). Association for Computing Machinery, New York, NY, USA, 187-200. https://doi.org/10.1145/ 1926385.1926407
[20]
Neville Grech and Yannis Smaragdakis. 2017. P/Taint: unified points-to and taint analysis. In Proceedings of the ACM on Programming Languages (PACMPL), Vol. 1. ACM, 1-28. https://doi.org/10.1145/3133926
[21]
Alex Ho, Michael Fetterman, Christopher Clark, Andrew Warfield, and Steven Hand. 2006. Practical Taint-Based Protection Using Demand Emulation. SIGOPS Oper. Syst. Rev. 40, 4 (April 2006 ), 29-41. https://doi.org/10.1145/1218063.1217939
[22]
Wei Huang, Yao Dong, Ana Milanova, and Julian Dolby. 2015. Scalable and Precise Taint Analysis for Android. In Proceedings of the 2015 International Symposium on Software Testing and Analysis (ISSTA 2015 ). Association for Computing Machinery, New York, NY, USA, 106-117. https://doi.org/10.1145/2771783.2771803
[23]
Ondřej Lhoták and Laurie Hendren. 2003. Scaling Java Points-to Analysis Using SPARK. In Proceedings of the 12th International Conference on Compiler Construction (CC'03). Springer-Verlag, Berlin, Heidelberg, 153-169.
[24]
Li Li, Alexandre Bartel, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon, Steven Arzt, Siegfried Rasthofer, Eric Bodden, Damien Octeau, and Patrick McDaniel. 2015. IccTA: Detecting Inter-Component Privacy Leaks in Android Apps. In Proceedings of the 37th International Conference on Software Engineering (ICSE '15). IEEE Press, 280-291.
[25]
Benjamin Livshits. 2006. Improving Software Security with Precise Static and Runtime Analysis. Ph.D. Dissertation. Stanford, CA, USA. Advisor(s) Lam, Monica. AAI3242585.
[26]
Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee, and Guofei Jiang. 2012. CHEX: Statically Vetting Android Apps for Component Hijacking Vulnerabilities. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (CCS '12). Association for Computing Machinery, New York, NY, USA, 229-240. https://doi.org/10.1145/2382196.2382223
[27]
W. Masri, A. Podgurski, and D. Leon. 2004. Detecting and debugging insecure information flows. In 15th International Symposium on Software Reliability Engineering. 198-209.
[28]
W. Masri, A. Podgurski, and D. Leon. 2007. An Empirical Study of Test Case Filtering Techniques Based on Exercising Information Flows. IEEE Transactions on Software Engineering 33, 7 ( July 2007 ), 454-477. https://doi.org/10.1109/TSE. 2007.1020
[29]
B. Livshits. Securibench micro. 2013. securibench-micro. http://suif.stanford.edu/ livshits/work/securibench-micro/.
[30]
Andrew C. Myers. 1999. JFlow: Practical Mostly-Static Information Flow Control. In Proceedings of the 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '99). Association for Computing Machinery, New York, NY, USA, 228-241. https://doi.org/10.1145/292540.292561
[31]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '95). Association for Computing Machinery, New York, NY, USA, 49-61. https://doi.org/10.1145/199448.199462
[32]
Gregor Snelting, Torsten Robschink, and Jens Krinke. 2006. Eficient Path Conditions in Dependence Graphs for Software Safety Analysis. ACM Trans. Softw. Eng. Methodol. 15, 4 (Oct. 2006 ), 410-457. https://doi.org/10.1145/1178625.1178628
[33]
Raphael Spreitzer, Gerald Palfinger, and Stefan Mangard. 2018. SCAnDroid: Automated Side-Channel Analysis of Android APIs. In Proceedings of the 11th ACM Conference on Security & Privacy in Wireless and Mobile Networks (WiSec '18). Association for Computing Machinery, New York, NY, USA, 224-235. https: //doi.org/10.1145/3212480.3212506
[34]
Omer Tripp, Marco Pistoia, Patrick Cousot, Radhia Cousot, and Salvatore Guarnieri. 2013. Andromeda: Accurate and Scalable Security Analysis of Web Applications. In Fundamental Approaches to Software Engineering. Springer Berlin Heidelberg, Berlin, Heidelberg, 210-225.
[35]
Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, and Omri Weisman. 2009. TAJ: Efective Taint Analysis of Web Applications. SIGPLAN Not. 44, 6 ( June 2009 ), 87-97. https://doi.org/10.1145/1543135.1542486
[36]
Shiyi Wei and Barbara G. Ryder. 2013. Practical Blended Taint Analysis for JavaScript. In Proceedings of the 2013 International Symposium on Software Testing and Analysis (ISSTA 2013 ). Association for Computing Machinery, New York, NY, USA, 336-346. https://doi.org/10.1145/2483760.2483788
[37]
Z. Yang and M. Yang. 2012. LeakMiner: Detect Information Leakage on Android with Static Taint Analysis. In 2012 Third World Congress on Software Engineering. 101-104.

Cited By

View all

Index Terms

  1. Scaling static taint analysis to industrial SOA applications: a case study at Alibaba

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
      November 2020
      1703 pages
      ISBN:9781450370431
      DOI:10.1145/3368089
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 08 November 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. accurate
      2. hidden code
      3. implicit dependencies
      4. scalable
      5. static taint analysis

      Qualifiers

      • Research-article

      Conference

      ESEC/FSE '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 112 of 543 submissions, 21%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)80
      • Downloads (Last 6 weeks)9
      Reflects downloads up to 16 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Automated End-to-End Dynamic Taint Analysis for WhatsAppCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663824(21-26)Online publication date: 10-Jul-2024
      • (2024)Data Lineage Analysis for Enterprise Applications by Manta: The Story of Java and C# ScannersProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice10.1145/3639477.3639739(25-35)Online publication date: 14-Apr-2024
      • (2024)AutoWeb: Automatically Inferring Web Framework Semantics via Configuration MutationEngineering of Complex Computer Systems10.1007/978-3-031-66456-4_20(369-389)Online publication date: 29-Sep-2024
      • (2023)Compositional Taint Analysis for Enforcing Security Policies at ScaleProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3613889(1985-1996)Online publication date: 30-Nov-2023
      • (2023)DeFiTainter: Detecting Price Manipulation Vulnerabilities in DeFi ProtocolsProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598124(1144-1156)Online publication date: 12-Jul-2023
      • (2023) Anchor: Fast and Precise Value-flow Analysis for Containers via Memory OrientationACM Transactions on Software Engineering and Methodology10.1145/356580032:3(1-39)Online publication date: 26-Apr-2023
      • (2023)Two Sparsification Strategies for Accelerating Demand-Driven Pointer Analysis2023 IEEE Conference on Software Testing, Verification and Validation (ICST)10.1109/ICST57152.2023.00036(305-316)Online publication date: Apr-2023
      • (2023)Scalable Compositional Static Taint Analysis for Sensitive Data Tracing on Industrial Micro-Services2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)10.1109/ICSE-SEIP58684.2023.00015(110-121)Online publication date: May-2023
      • (2022)Jasmine: A Static Analysis Framework for Spring Core TechnologiesProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556910(1-13)Online publication date: 10-Oct-2022
      • (2022)TaintSQL: Dynamically Tracking Fine-Grained Implicit Flows for SQL Statements2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE55969.2022.00012(1-12)Online publication date: Oct-2022

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media