Automatically Hardening Web Applications Using Precise Tainting Anh Nguyen-Tuong Salvatore Guarnieri Doug Greene Jeff Shirley David Evans University of Virginia
2 phpBB Worm December 21, 2004 Over 40,000 sites defaced PHP injection Loads Perl scripts to spread itself Uses Google to search for other phpBB sites
3 phpBB Vulnerability $words = explode (' ', trim (htmlspecialchars (urldecode ($HTTP_GET_VARS ['highlight']))));... $highlight_match[] =... $words[$i]...;... … preg_replace (... $highlight_match...) Original user input: '_%2527_attack User input after HTTP_GET_VARS call: \'_%27_attack User input after explicit urldecode call: \'_'_attack
4 Classes of Attacks Code injection –Cause user provided data to be executed while data is being processed PHP injection (phpBB worm) SQL injection Output generation –Cause user provided data to be displayed to visitors of the website: Cross Site Scripting
5 SQL Injection Attacker constructs data that injects database commands Example: $res = executeQuery ("SELECT real_name FROM users WHERE user = '". $user. "'AND pwd = '". $pwd. "' ");
6 Cross Site Scripting Inserts user provided data onto a webpage that may include JavaScript Executes with permissions of hosting website Simple example: Hello
7
8 Importance Over 12% of Secunia Advisories 4 of last 10 advisories from FrSIRT Cross Site Scripting and Code Injection are responsible for many attacks on the internet It is very hard to write bug free code
9 Previous Approaches Static techniques Dynamic techniques before deployment Dynamic techniques during deployment
10 Static Static analyzers [Shanker+ 01] Code inspections [Fagan76] SQL prepared statements [Fisk04, Php05] Pros –No runtime overhead –Can be done before website is released to the public Cons –Coding practices may need to change –Inspections are only as good as the inspector –Many false positives
11 Dynamic Before Deployment Automated Test Suites: [Huang+ 04], [Tenable05], [Kavado05], [Offutt+ 04], [Watchfire05], [SPI05] Human testing Pros –Coding practices do not need to change –Attempts to simulate real world attacking conditions Cons –Only tests known attacks, cannot show absence of vulnerability –Requires developer effort to fix security holes
12 Automated Dynamic: Firewalls Incoming [Scott, Sharp 02] Incoming and Outgoing [Watchfire04], [Kavado05], [Teros04] Pros –No need to modify web service Cons –Only prevent recognized attacks –Coarse policies without knowing application semantics
13 Automated: Magic Quotes Escape all quotes supplied by a user Implemented in PHP and other scripting languages Extremely successful –Do not require the programmer to do anything –Prevent many SQL injection attacks –But, prevent only a specific class of attacks
14 Previous Work Limitations Being precise about what constitutes an attack is a lot of work Automated techniques suffer from not exploiting the application semantics We want a system that works as effortlessly as magic quotes, but prevents a wider class of attacks
15 Our Approach Fully automated Aware of application semantics Replace PHP interpreter with a modified interpreter that: –Keeps track of which information comes from untrusted sources (precise tainting) –Checks how untrusted input is used
16 HTTP Server PHP Interpreter File System file.php Database Client Web Server System APIs 6 7 PHPrevent
17 Coarse Grain Tainting Provided by many scripting languages (Perl, Ruby) Untrusted input is tainted Everything touched by tainted data becomes tainted $query = "SELECT real_name FROM users WHERE user = '". $user. "'AND pwd = '". $pwd. "' "; Entire $query string is tainted
18 Precise Tainting $query = "SELECT real_name FROM users WHERE user = '". $user. "'AND pwd = '". $pwd. "' "; $query = "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';'AND pwd = '' "; Untrusted input is tainted Taint markings are maintained at character level –Depends on semantics of program Only really tainted data is tainted
19 Precise Checking Wrappers around PHP functions that handle updating and checking precise taint information Conservative: no false negatives while minimizing false positives –Behavior only changes when an attack is likely
20 Preventing SQL Injection Parse the query using the Postgres SQL parser: identify interpreted text Disallow SQL keywords or delimiters in interpreted text that is tainted –Query is not sent to database –Error response it returned "SELECT real_name FROM users WHERE user = '' OR 1 = 1; -- ';' AND pwd = '' ";
21 Preventing PHP Injection Disallow tainted data to be used in functions that treat input strings as PHP code or manipulate system state –We place wrappers around these functions to enforce this rule phpBB attack prevented by wrappers around preg_replace
22 Preventing Cross Site Scripting Wrappers around output functions –Buffer output and then parse the tainted output with HTML Tidy Check the parsed HTML against a white list to ensure there is no dangerous output –Dangerous content was determined by examining HTML grammar –Sanitize it by removing tags Hello Safe Hello Unsafe
23 Current Status Modified PHP interpreter: PHPrevent –Prevents PHP injection, SQL injection and cross site scripting attacks –Overly conservative: we have not specified precise semantics for most PHP functions Performance –Initial measurements indicate performance overhead is acceptable
24 Future Work: Theory and Analysis End-to-end information flow security Replace ad-hoc taint marking with principled mechanism –Analyze data flow at interpreter level –Infer taint specifications for PHP functions using dynamic analysis Verify that taint marking in PHP specification is consistent with interpreter implementation
25 Future Work: Implementation Full implementation of precise tainting for PHP APIs Handle persistent state –Track tainting through database store Multiple tainting types with different checking rules Incorporate modifications into main PHP distribution
26 Summary Many websites are prone to attacks even after using current methods Our method: –Fully automated –Prevents large classes of attacks –Easy to deploy
27 Thank You