λ λ Language Based Security TAJ: Effective Taint Analysis of Web Applications PLDI 2009 Omer Tripp IBM Software Group Marco Pistoia IBM T. J. Watson Research Center Stephen Fink IBM T.J. Watson Research Center Manu Sridharan IBM T.J. Watson Research Center Omri Weisman IBM Software Group
LaBaSec λ λ PLDI OWASP * Top Ten Security Vulnerabilities 1.Cross-site scripting (XSS) 2.Injection flaws 3.Malicious file executions 4.Insecure direct object reference 5.Cross site request forgery (CSRF) 6.Information leakage and improper error handling 7.Broken authentication and improper session management 8.Unsecure cryptographic storage 9.Unsecure communications 10.Failure to restrict URL accesses 1.Cross-site scripting (XSS) 2.Injection flaws 3.Malicious file executions 4.Insecure direct object reference 5.Cross site request forgery (CSRF) 6.Information leakage and improper error handling 7.Broken authentication and improper session management 8.Unsecure cryptographic storage 9.Unsecure communications 10.Failure to restrict URL accesses * Open Web Application Security Project (OWASP):
LaBaSec λ λ PLDI Existing Static-Analysis Solutions Type systems: Complex, conservative, require code annotations Classic slicing: Has not been shown to scale to large applications while maintaining sufficient accuracy
LaBaSec λ λ PLDI Contributions of TAJ Hybrid thin slicing Sound, effective modeling of Web applications Bounded-analysis techniques Implementation, productization* and extensive evaluation * IBM Rational AppScan:
LaBaSec λ λ PLDI Motivating Example * * Inspired by Refl1 in SecuriBench Micro Taint Flow #1
LaBaSec λ λ PLDI Motivating Example * Sanitizer * Inspired by Refl1 in SecuriBench Micro Taint Flow #2
LaBaSec λ λ PLDI Motivating Example * * Inspired by Refl1 in SecuriBench Micro Non-tainted Taint Flow #3
LaBaSec λ λ PLDI Motivating Example * * Inspired by Refl1 in SecuriBench Micro Reflection
LaBaSec λ λ PLDI Motivating Example * * Inspired by Refl1 in SecuriBench Micro Different Map Keys
LaBaSec λ λ PLDI Motivating Example * * Inspired by Refl1 in SecuriBench Micro Object Fields
LaBaSec λ λ PLDI Outline of TAJ Algorithm consists of 2 stages: 1. Global pointer analysis 2. Slicing based on resulting call graph Rich set of models Effective reports Efficient behavior under restricted budget
LaBaSec λ λ PLDI Dimensions of Precision Pointer analysis is a variant of Andersen’s analysis Custom context-sensitivity policy: Unlimited-depth object sensitivity for Java collections (up to recursion) One level of call-string context for factory methods One level of call-string context for taint APIs One-level receiver-object context-sensitivity as default Analysis is field sensitive Analysis is intraprocedurally flow sensitive and interprocedurally flow insensitive (accounting for multithreaded code)
LaBaSec λ λ PLDI Hybrid System Dependence Graph st 4 l2l2 l2l2 l2l2 l2l2 l4l4 l4l4 st 2 st 1 l5l5 l5l5 l3l3 l3l3 l1l1 l1l1 st 3 st 5 c3c3 c3c3 c4c4 c4c4 sk 1 r3r3 r3r3 r7r7 r7r7 r8r8 r8r8 r4r4 r4r4 c2c2 c2c2 s1s1 s1s1 s2s2 s2s2 r2r2 r2r2 c1c1 c1c1 c5c5 c5c5 r5r5 r5r5 r1r1 r1r1 sk 2 st i Store statement lili lili Load statement sk i Sink-dispatch statement Hybrid SDG Slice in the no-heap SDG Store-to-load direct edge Load-to-store or load- to-sink summary edge No-heap SDG edge cici cici Call statement riri riri Return statement sisi sisi Other statement Computed based on preliminary pointer analysis Computed using graph reachability over a no-heap SDG
LaBaSec λ λ PLDI Modeling Web Applications Taint Carriers String StringBuilder StringBuffer Reflection Native Methods Map Keys JSP Struts MVC ExceptionsEJB Internal i1 i1.s map.put("key1", taint); nontaint = map.get("key2"); map.put("key1", taint); nontaint = map.get("key2"); ConcreteActionForm caf = (ConcreteActionForm) af DynaActionForm daf = (DynaActionForm) af ENTERPRISE BEAN DEPLOYMENT DESCRIPTOR Bean1Bean Bean1Home Bean1 Bean1Bean Stateless Bean1 ejb/Bean2 Session Bean2Home Bean2 Bean2Bean ENTERPRISE BEAN DEPLOYMENT DESCRIPTOR Bean1Bean Bean1Home Bean1 Bean1Bean Stateless Bean1 ejb/Bean2 Session Bean2Home Bean2 Bean2Bean Bean1Bean.m1() Bean2.m2() Bean2Bean.m2() Class.forName Method.invoke Thread.start AccessController. doPrivileged Thread.start AccessController. doPrivileged
LaBaSec λ λ PLDI Eliminating Redundant Flows Flows are equivalent iff Parts under application code coincide Sinks corresponding to same issues type Dramatically improves user experience (on JBoard, x25 less reports) Sound, minimal with respect to remediation n2n2 n2n2 n9n9 n9n9 n8n8 n8n8 n4n4 n4n4 n3n3 n3n3 n1n1 n1n1 n 11 n7n7 n7n7 n6n6 n6n6 n5n5 n5n5 n 10 Application Library Sinks with same issue type
LaBaSec λ λ PLDI Priority-driven Call-graph Construction Priority queue used to govern call-graph growth Sources are assigned priority 0 (most important) Recursively, for each “neighbor” t of node n: priority(t) = min{(priority(n) + 1), priority(t)} Propagate priorities to fixed point “Locality-of-taint” principle
LaBaSec λ λ PLDI Experimental Setup Five variants assessed: 1. Context sensitive (CS) 2. Context insensitive (CI) 3. Unbounded hybrid (i.e., running to completion) 4. Prioritized hybrid (i.e., call graph bounded, priority-driven scheme) 5. Fully optimized hybrid (i.e., prioritized, “long” flows eliminated, taint depth restricted, slice size bounded) All implemented on top of WALA * * IBM Watson Libraries for Analysis:
LaBaSec λ λ PLDI Experimental Results – Accuracy X X X X X
LaBaSec λ λ PLDI Experimental Results – Performance
LaBaSec λ λ PLDI Experimental Results – Performance
LaBaSec λ λ PLDI Conclusion Effective solution for taint analysis of Web applications based on pointer analysis and hybrid thin slicing Efficient strategies for analysis under limited budget General models for frameworks and other programming constructs Thorough evaluation and productization
LaBaSec λ λ PLDI Future Work Detailed comparison of demand-driven and priority-driven scheme String analysis More languages Coverage of more attack vectors
λ λ Language Based Security Thank You!