TAJ: Effective Taint Analysis of Web Applications PLDI 2009 Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, Omri Weisman.

Slides:



Advertisements
Similar presentations
Runtime Prevention & Recovery Protect existing applications Advantages: Prevents vulnerabilities from doing harm Safe mode for Web application execution.
Advertisements

Runtime Techniques for Efficient and Reliable Program Execution Harry Xu CS 295 Winter 2012.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
When Role Models Have Flaws: Static Validation of Enterprise Security Policies Marco Pistoia IBM T. J. Watson Research Center Hawthorne, New York
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Type-based Taint Analysis for Java Web Applications Wei Huang, Yao Dong and Ana Milanova Rensselaer Polytechnic Institute 1.
Demand-driven Alias Analysis Implementation Based on Open64 Xiaomi An
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Thin Slicing Manu Sridharan, Stephen J. Fink, Rastislav Bodík.
Chapter 10 Introduction to Arrays
Using Programmer-Written Compiler Extensions to Catch Security Holes Authors: Ken Ashcraft and Dawson Engler Presented by : Hong Chen CS590F 2/7/2007.
The Ant and The Grasshopper Fast and Accurate Pointer Analysis for Millions of Lines of Code Ben Hardekopf and Calvin Lin PLDI 2007 (Best Paper & Best.
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
VBA Modules, Functions, Variables, and Constants
CS 290C: Formal Models for Web Software Lecture 10: Language Based Modeling and Analysis of Navigation Errors Instructor: Tevfik Bultan.
Chapter 2: Algorithm Discovery and Design
1 Refinement-Based Context-Sensitive Points-To Analysis for Java Manu Sridharan, Rastislav Bodík UC Berkeley PLDI 2006.
1 CMSC 132: Object-Oriented Programming II Nelson Padua-Perez William Pugh Department of Computer Science University of Maryland, College Park.
Testing an individual module
Methodology Conceptual Database Design
Λ λ Language Based Security TAJ: Effective Taint Analysis of Web Applications PLDI 2009 Omer Tripp IBM Software Group Marco Pistoia IBM.
Impact Analysis of Database Schema Changes Andy Maule, Wolfgang Emmerich and David S. Rosenblum London Software Systems Dept. of Computer Science, University.
Role-based Trust Management Security Policy Analysis and Correction Environment (RT-SPACE). Gregory T. Hoffer CS7323 – Research Seminar (Dr. Qi Tian)
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 17: Code Mining.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
TAJ: Effective Taint Analysis of Web Applications
Comp 245 Data Structures Software Engineering. What is Software Engineering? Most students obtain the problem and immediately start coding the solution.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University STATIC ANALYSES FOR JAVA IN THE PRESENCE OF DISTRIBUTED COMPONENTS AND.
Preventing SQL Injection Attacks in Stored Procedures Alex Hertz Chris Daiello CAP6135Dr. Cliff Zou University of Central Florida March 19, 2009.
Introduction to ASMs Dumitru Roman Digital Enterprise Research Institute
A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
A Specification Language and Test Planner for Software Testing Aolat A. Adedeji 1 Mary Lou Soffa 1 1 DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF VIRGINIA.
Saving the World Wide Web from Vulnerable JavaScript International Symposium on Software Testing and Analysis (ISSTA 2011) Omer Tripp IBM Software Group.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Computer Science Automated Software Engineering Research ( Mining Exception-Handling Rules as Conditional Association.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
MODES-650 Advanced System Simulation Presented by Olgun Karademirci VERIFICATION AND VALIDATION OF SIMULATION MODELS.
Effective Interprocedural Resource Leak Detection ICSE 10 Emina Torlak Satish Chandra IBM T.J. Watson Research Center, USA.
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
PRESTO: Program Analyses and Software Tools Research Group, Ohio State University Merging Equivalent Contexts for Scalable Heap-cloning-based Points-to.
Generating Precise and Concise Procedure Summaries Greta Yorsh Eran Yahav Satish Chandra.
Generating Precise and Concise Procedure Summaries Greta Yorsh Eran Yahav Satish Chandra.
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
/ PSWLAB Evidence-Based Analysis and Inferring Preconditions for Bug Detection By D. Brand, M. Buss, V. C. Sreedhar published in ICSM 2007.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Saner: Composing Static and Dynamic Analysis to Validate Sanitization in Web Applications Davide Balzarotti, Marco Cova, Vika Felmetsger, Nenad Jovanovic,
Chapter 4 Static Analysis. Summary (1) Building a model of the program:  Lexical analysis  Parsing  Abstract syntax  Semantic Analysis  Tracking.
Testing Overview Software Reliability Techniques Testing Concepts CEN 4010 Class 24 – 11/17.
By Ramesh Mannava.  Overview  Introduction  10 secure software engineering topics  Agile development with security development activities  Conclusion.
CS223: Software Engineering Lecture 26: Software Testing.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
INFORMATION-FLOW ANALYSIS OF ANDROID APPLICATIONS IN DROIDSAFE JARED YOUNG.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Making k-Object-Sensitive Pointer Analysis More Precise with Still k-Limiting Tian Tan, Yue Li and Jingling Xue SAS 2016 September,
Control Flow Testing Handouts
Software Engineering (CSI 321)
Input Space Partition Testing CS 4501 / 6501 Software Testing
Harry Xu University of California, Irvine & Microsoft Research
About the Presentations
Automated Pattern Based Mobile Testing
Outline of the Chapter Basic Idea Outline of Control Flow Testing
All You Ever Wanted to Know About Dynamic Taint Analysis & Forward Symbolic Execution (but might have been afraid to ask) Edward J. Schwartz, Thanassis.
CSC-682 Advanced Computer Security
Presentation transcript:

TAJ: Effective Taint Analysis of Web Applications PLDI 2009 Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, Omri Weisman

INDEX Authors Introduction Motivation Core Taint Analysis Techniques Experimental Results Illumination

Omer Tripp Advisory Software Engineer, Researcher Omer is a member of the static analysis group at IBM Rational's Security Products Department. He is engaged in research and development in the areas of Static Program Analysis for Security and Language-based Security, with emphasis on Web-application security. Publications: Omer Tripp, and Dror Feitelson. Zipf's Law Revisited. Technical Report Number School of Computer Science and Engineering, The Hebrew University of Jerusalem, August Omer Tripp. Exploration in the Dark: Reasoning about Planning Strategies. M.Sc. Thesis. School of Computer Science and Engineering, The Hebrew University of Jerusalem, January 2009.

Omer Tripp’s Patents Rob Calendino, Craig Conboy, Guy Podjarny, Ory Segal, Adi Sharabani, Omer Tripp, and Omri Weisman. Black-box Testing Optimization Using Information from White- box Testing. Filed in the United States Patent and Trademark Office, October Omer Tripp. Detecting Security Vulnerabilities Relating to Cryptographically- sensitive Information Carriers when Testing Computer Software. Filed in the United States Patent and Trademark Office, September Yinnon Haviv, Roee Hay, Marco Pistoia, Adi Sharabani, Takaaki Tateishi, Omer Tripp, and Omri Weisman. Identifying Security Vulnerabilities in Computer Software. Filed in the United States Patent and Trademark Office, June Adi Sharabani, and Omer Tripp. Efficient Code Instrumentation. Filed in the United States Patent and Trademark Office, March Stephen Fink, Yinnon A. Haviv, Marco Pistoia, Omer Tripp, and Omri Weisman. Importance-based Call Graph Construction. Filed in the United States Patent and Trademark Office, March Marco Pistoia, Takaaki Tateishi, Omer Tripp, and Omri Weisman. A Client-Driven Refinement-Based Static Analysis Method for Identifying Chainable Accesses to a Logical Container. Filed as Docket IL in the United States Patent and Trademark Office, June 2008.

Marco Pistoia Research Staff Member Recent activities: ACSAC 2009, Program Committee Member ACSAC 2009 PLDI 2009, Poster and Student Research Competition Chair PLDI 2009 PLAS 2009, Program Committee Member PLAS 2009 SSIRI 2009, Program Committee Member SSIRI 2009 NDSS 2009, Program Committee Member NDSS 2009 Refereed Conference Papers and Journal Articles: Avraham Shinnar, Marco Pistoia, and Anindya Banerjee. A Language for Information Flow: Dynamic Information Tracking in Multiple Interdependent Dimensions. Accepted for Publication in Proceedings of the 4th ACM SIGPLAN Workshop on Programming Languages and Analysis for Security (PLAS 2009), co-located with the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009), Dublin, Ireland, June 2009.PLAS 2009PLDI 2009 Emmanuel Geay, Marco Pistoia, Takaaki Tateishi, Barbara Ryder, and Julian Dolby. Modular String-Sensitive Permission Analysis with Demand-Driven Precision Accepted for Publication in Proceedings of the 31st International Conference on Software Engineering (ICSE 2009), Vancouver, BC, Canada, May 2009.ICSE 2009

Marco Pistoia Marco Pistoia and Úlfar Erlingsson. Programming Languages and Program Analysis for Security: A Three-year Retrospective. ACM SIGPLAN Notices, Volume 43, Number 12, New York, NY, USA, December 2008.ACM SIGPLAN Notices Sharon Shoham, Eran Yahav, Stephen J. Fink, and Marco Pistoia. Static Specification Mining Using Automata-Based Abstractions. IEEE Transactions on Software Engineering (TSE) Journal, Volume 34, Number 5, Piscataway, NJ, USA, September 2008.TSE Paolina Centonze, Robert J. Flynn, and Marco Pistoia. Combining Static and Dynamic Analysis for Automatic Identification of Precise Access-Control Policies. In Proceedings of the Annual Computer Security Applications Conference (ACSAC 2007), Miami Beach, FL, December 2007.ACSAC 2007 Sharon Shoham, Eran Yahav, Stephen J. Fink, and Marco Pistoia. Static Specification Mining Using Automata-Based Abstractions. In Proceedings of the ACM SIGSOFT 2007 International Symposium on Software Testing and Analysis (ISSTA 2007), London, United Kingdom, July ACM Press. Winner of the following recognitions:ISSTA 2007 – ACM SIGSOFT Distinguished Paper Award, London, United Kingdom, July ACM SIGSOFT Distinguished Paper Award – IBM Research Pat Goldberg Memorial Best Paper Award (3 papers selected our of 130 submissions), IBM Thomas J. Watson Research Center, Hawthorne, NY, USA, July IBM Research Pat Goldberg Memorial Best Paper Award – Invited for publication in the IEEE Transaction on Software Engineering (TSE) Journal, Volume 34, Issue 5, Piscataway, NJ, USA, September 2008.TSE

Stephen J. Fink Research Staff Member The Complexity of Andersen's Analysis in Practice. Manu Sridharan and Stephen J. Fink. To appear in The 16th International Static Analysis Symposium (SAS 2009). Snugglebug: A Powerful Approach To Weakest Preconditions. Satish Chandra, Stephen J. Fink, and Manu Sridharan. ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI 2009). Effective Taint Analysis for Java. Omer Tripp, Marco Pistoia, Stephen J. Fink, Manu Sridharan, and Omri Weisman. Accepted for Publication in Proceedings of the ACM SIGPLAN 2009 Conference on Programming Language Design and Implementation (PLDI 2009), Dublin, Ireland, June 2009 Static Specification Mining Using Automata-Based Abstractions Sharon Shoham, Eran Yahav, Stephen J. Fink, Marco Pistoia September 2008 IEEE Transactions on Software Engineering, Volume 34 Issue 5 Verifying dereference safety via expanding-scope analysis Alexey Loginov, Eran Yahav, Satish Chandra, Stephen Fink, Noam Rinetzky, Mangala Nanda July 2008 ISSTA '08: Proceedings of the 2008 international symposium on Software testing and analysis

Stephen J. Fink Effective typestate verification in the presence of aliasing Stephen J. Fink, Eran Yahav, Nurit Dor, G. Ramalingam, Emmanuel Geay April 2008 Transactions on Software Engineering and Methodology (TOSEM), Volume 17 Issue 2 Static Specification Mining Using Automata-Based Abstractions. Sharon Shoham, Eran Yahav, Stephen Fink, and Marco Pistoia, ISSTA Thin Slicing. Manu Sridharan, Stephen Fink, and Ras Bodik, PLDI Thin Slicing Declarative Object Identity using Relation Types Mandana Vaziri, Frank Tip, Stephen Fink, and Julian Dolby, ECOOP When Role Models Have Flaws: Static Validation of Enterprise Security Policies. Marco Pistoia, Stephen J. Fink, Robert J. Flynn, and Eran Yahav. Proceedings of the 29th International Conference on Software Engineering (ICSE 2007), Minneapolis, MN, May Effective Typestate Verification in the Presence of Aliasing, Stephen Fink, Eran Yahav, Nurit Dor, Ramalingam, and Emmanuel Geay, ISSTA 06, July Role-Based Access Control Consistency Validation, Paolina Centonze, Gleb Naumovich, Stephen Fink, and Marco Pistoia, ISSTA 06, July Scalable and Flexible Error Detection, Emmanuel Geay, Eran Yahav, and Stephen Fink, PEPM 06 tools track, January 2006.

Introduction In this paper, they present Taint Analysis for Java (TAJ), a tool designed to be precise enough to produce a low false-positive rate, yet scalable enough to allow the analysis of large applications. TAJ incorporates a number of techniques to produce useful results on extremely large applications, even when constrained to a given time or memory budget. They have designed and implemented TAJ that meets the requirements of industry-level applications.

Introduction Contributions: Hybrid thin slicing. a novel thin-slicing algorithm that combines flow-insensitive data-flow propagation through the heap with flow- and context-sensitive data-flow propagation through local variables. An effective model for static analysis of Web applications. A set of bounded analysis techniques. Make it possible to analyze in a short time or stay below a given memory consumption level. Implementation and evaluation. on industrial codes.

Motivation 26: } 27: Map m = new HashMap(); 28: m.put("fName", t1); 29: m.put("lName", t2); 30: m.put("date", new String(Date.getDate())); 31: String s1 = (String) idMethod.invoke(this, new 32: Object[] {m.get("fName")}); 33: String s2 = (String) idMethod.invoke(this, new 34: Object[] {URLEncoder.encode(m.get("lName"))}); 35: String s3 = (String) idMethod.invoke(this, new 36: Object[] {m.get("date")}); 37: Internal i1 = new Internal(s1); 38: Internal i2 = new Internal(s2); 39: Internal i3 = new Internal(s3); 40: writer.println(i1); // BAD 41: writer.println(i2); // OK 42: writer.println(i3); // OK 43: } catch(Exception e) { 44: e.printStackTrace(); 45: } 46: } 47: public String id(String string) { 48: return string; 49: } 50: } 1: public class Motivating { 2: private static class Internal { 3: private String s; 4: public Internal(String s) { 5: this.s = s; 6: } 7: public String toString() { 8: return s; 9: } 10: } 11: protected void doGet(HttpServletRequest req, 12: HttpServletResponse resp) throws IOException { 13: String t1 = req.getParameter("fName"); 14: String t2 = req.getParameter("lName"); 15: PrintWriter writer = resp.getWriter(); 16: Method idMethod = null; 17: try { 18: Class k = Class.forName("Motivating"); 19: Method methods[] = k.getMethods(); 20: for (int i = 0; i < methods.length; i++) { 21: Method method = methods[i]; 22: if (method.getName().equals("id")) { 23: idMethod = method; 24: break; 25: }

Core Taint Analysis TAJ takes a Web application and its supporting libraries, and checks it with respect to a set of “security rules”. Each security rule is of the form (S1, S2, S3), where S1 is a set of “sources”, S2 is a set of “sanitizers”, and S3 is a set of “sinks”. A source is a method whose return value is considered tainted, or untrusted. A sanitizer is a method that manipulates its input to produce taint-free output. A sink is a pair (m, P), where m is a method that perform security-sensitive computations and P contains those parameters of m that are vulnerable to attack via tainted data. TAJ statically checks that no value derived from a source is passed as an input to a sink unless it first undergoes appropriate sanitization. Two stages: Pointer Analysis and Call-graph Construction. The current implementation relies on a context-sensitive variant of Andersen’s analysis with on-the-fly call graph construction. The pointer analysis adds one level of call-string context to calls to library factory methods Hybrid Thin Slicing

Using the preliminary pointer analysis and call graph, the second phase of TAJ tracks data flow from tainted sources using hybrid thin slicing, a novel thin-slicing algorithm. Hybrid thin slicing combines flow-insensitive reasoning about flow through the heap with flow- and context-sensitive tracking of flow through local variables. Q: what is the difference between Thin Slicing and Slicing, how to be thin, what to hybrid?

Hybrid Thin Slicing Program slicing systematically identifies parts of a program relevant to a seed statement. A thin slice consists only of producer statements for the seed, i.e., those statements that help compute and copy a value to the seed. Statements that explain why producers affect the seed are excluded. For example, for a seed that reads a value from a container object, a thin slice includes statements that store the value into the container, but excludes statements that manipulate pointers to the container itself. A thin slice can typically captures the statements most relevant to a tainted flow. Hybrid thin slicing is a novel thin- slicing algorithm. Hybrid thin slicing combines aspects of the previously proposed context-sensitive (CS) and context- insensitive (CI) thin slicing algorithms, achieving a better tradeoff between scalability and precision for taint analysis.

Hybrid Thin Slicing HybridCSCI tracks flow through local variables with flow and context sensitivity. tracks heap data dependencies via direct edges from stores to loads. Not track heap data dependencies via additional method parameters and return values. track heap data dependencies via additional method parameters and return values. tracks heap data dependencies via direct edges from stores to loads. Such edges are added based on the preliminary pointer analysis, Hybrid thin slicing performs a demand-driven traversal over a special System Dependence Graph (SDG) called the Hybrid SDG (HSDG). Nodes in an HSDG correspond to load and store statements in the program, as well as call statements representing source and sink methods.

Figure 2 shows an example, which displays the slice computed on the no-heap SDG corresponding to a load-to-store summary edge in the HSDG. An HSDG has two types of edges representing data dependence: “direct edges” and “summary edges”. A direct edge connects a store to a load and represents a data dependence computed by a preliminary pointer analysis. A summary edge can connect s to t if t is transitively data-dependent on s purely via flow through local variables; flow through the heap is excluded. Summary edges are obtained on demand by computing context-sensitive reachability over a no-heap SDG—an SDG that elides all control- and data- dependence edges reflecting flow through heap locations.

Techniques Code-modeling Techniques 1.Security-specific Modeling 1.Taint Carriers,2.Handling Exceptions 2.General Models 1.Code-reduction Techniques,2.Approximating the Behavior of Web Frameworks, 3.Reflection APIs and Native Methods Eliminating Redundant Reports Bounded Analysis Techniques 1.Priority-driven Call-graph Construction 2.Useful Bounds on Analysis Dimensions 1. Slice Size, 2.Flow Length,3. Nested-taint Depth

Code-reduction Techniques A simple, yet effective, code-reduction optimization is to exclude benign library classes, packages, and subpackages based on a whitelist generated by hand. simplify dataflow propagation by substituting simpler models for library methods, where the simpler model encodes the behavior with respect to flow of taint. For example, taint analysis does not need to analyze the complex manipulations in the implementation of URLEncoder.encode; it suffices to observe that this method returns some string that is sanitized according to the relevant rules. Using this insight, TAJ gives special treatment to String operations, which arise frequently in tainted flows, have relatively simple semantics, but are often difficult to analyze precisely.

Eliminating Redundant Reports Some tainted flows reported by the analysis may be redundant to a user. We now describe an approach to address this potential redundancy. an approach to address this potential redundancy. Considering the insertion of a sanitizer invocation into the path as a remediation action, they propose an approach whereby flows are grouped together according to the remediation actions they map to. TAJ reports one representative per group, rather than all the flows.

Eliminating Redundant Reports a library call point (LCP) is the last statement along a flow from a source to a sink where data flows from application code (i.e., the project’s source code) to library code (i.e., libraries referenced by the project). p1, p3, p4, p5 reported.

Bounded Analysis Techniques Q: How does TAJ maintain QoS even when constrained to a given time or memory budget? Priority-driven Call-graph Construction Useful Bounds on Analysis Dimensions #1 Slice Size, constrain the size of a slice, when computed through hybrid thin slicing. limiting the number of heap transitions. #2 Flow Length the longer a flow is, the less likely it is to be a true positive. #3 Nested-taint Depth

Priority-driven Call-graph Construction Under a fixed time and memory budget, TAJ may terminate pointer analysis and call-graph construction early. TAJ uses priority-driven call-graph construction to heuristically improve pointer analysis quality within a fixed budget. The priority heuristic favors the analysis of methods that are more likely to generate and propagate taint. Priority-driven call-graph construction forces the pointer analysis to add constraints first from higher- priority methods—in this case, those methods likely to be more relevant to taint analysis.

Experimental Results The unbounded hybrid algorithm offers a compelling tradeoff between performance and accuracy, when compared to the CI and CS configurations. The prioritized hybrid algorithm offers superior accuracy and performance tradeoffs than the CI and unbounded hybrid configurations. The fully optimized version of the hybrid algorithm is more accurate than the prioritized variant and more efficient than the CI algorithm.

Illumination Hybrid Thin Slicing Priority-driven Experimental Results

Thank you!