Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Gary Wassermann and Zhendong Su UC Davis Slides from Made some additions/clarifications!
SQL Injection Vulnerabilities 2006: 14% of CVEs were SQLCIVs (2 nd most) Percent of attacks likely much higher – Web applications are accessible – Databases hold valuable information Web browser Database Application User inputSQL Query
Example <? $sid = addslashes($_GET[‘sid’]); $query = “SELECT * FROM carts WHERE sid = ”.$sid; mysql_query($query); ?> On malicious input: SELECT * FROM carts WHERE sid = 78 OR 1 = 1 Result: Returns information from all shopping carts. (())
Informal Characterization [POPL’06] During runtime, we can see that the parse tree changed to a completely different structure from the one we had in mind.
Past Approaches Runtime checks – Benefits: easy to be precise – State of the Art: lexical or syntactic confinement Drawback: We pay many times the overhead of a correctly-placed check Static analysis – Benefits Early bug detection Analyze code fragments No runtime overhead – State of the Art: static taint analysis
Static Checking for SQLCIVs Dataflow GraphCode addslashes() $sid = addslashes($_GET[‘sid’]); $query = “SELECT…”.$sid; mysql_query($query);. $_GET[‘sid’] $sid $query SELECT…
Static Checking for SQLCIVs Static Taint AnalysisCode U T T T addslashes() Source Sink Sanitizer false negative! Integrity $sid = addslashes($_GET[‘sid’]); $query = “SELECT…”.$sid; mysql_query($query);.
Static Checking for SQLCIVs Static Taint AnalysisOur Goal U U’T T T addslashes() Source Sink Sanitizer U TU’ addslashes() Source Sink false negative! check against policy Transformation T Integrity (Integrity x String)* Set..
Static Checking for SQLCIVs Our Goal U’ U TU’ addslashes() Source Sink check against policy Transformation T (Integrity x String)* Set How can we: model semantics of transformation? track integrity classes through transformations? check the value at the sink against our policy?.
SQLCIV analysis Framework Static Taint AnalysisCompliance Check
$_GET[‘sid’] $sid $query SELECT… String Analysis addslashes() CFGs model string sets Construct extended CFG from dataflow graph GETsid ! * Sid ! addslashes(GETsid) C ! SELECT… Query ! C Sid [Min05].
SELECT…$sid $_GET[‘sid’] String Analysis U’ U TU’ addslashes() T CFGs model string sets Construct extended CFG from dataflow graph GETsid ! * Sid ! addslashes(GETsid) C ! SELECT… Query ! C Sid [Min05]. $query
Modeling String Transformations Finite State Transducers model string functions Use FSTs to turn extended CFG into CFG GETsid ! * Sid ! addslashes(GETsid) C ! SELECT… Query ! C Sid \ / ' / ' A / \A \ / \ O\'Brian ! O'Brian stripslashes() B / B InputOutput A 2 b{'}A 2 b{'} B 2 b{\}B 2 b{\}
S ! a S ! S X X ! a*a* S 01 ! a X 11 ! [0-9] S 01 ! S 01 X 11 Tracking Integrity Classes 0 1 a-z 0-9 S 01 X 11 [a-z][0-9] * Find CFG-FSA intersection via CFL-reachability Propagate labels to corresponding nonterminals Use this algorithm to find CFG’s image over FST a[0-9] *
S ! a S ! S X X ! a * S 01 ! a X 11 ! [0-9] S 01 ! S 01 X 11 Tracking Integrity Classes 0 1 a-z 0-9 S 01 X 11 [a-z][0-9] * Find CFG-FSA intersection via CFL-reachability Propagate labels to corresponding nonterminals Use this algorithm to find CFG’s image over FST a[0-9] *
S ! a S ! S X X ! a * S 01 ! a X 11 ! [0-9] S 01 ! S 01 X 11 Tracking Integrity Classes 0 1 a-z 0-9 S 01 X 11 [a-z][0-9] * Find CFG-FSA intersection via CFL-reachability Propagate labels to corresponding nonterminals Use this algorithm to find CFG’s image over FST a[0-9] *
Policy Conformance Use SQL grammar as reference grammar Check “literals” case with regular languages Untrusted input – not in quoted context, not numeric, includes SQL code – DIRECT if immediately affected by user – INDIRECT if affected by previous query answer GETsid’ ! ( b {’} [ {\’} ) * Sid ! GETsid’ C ! SELECT * FROM users WHERE id = Query ! C Sid
Evaluation: Results Modified Minamide’s PHP String Analyzer Evaluated on 6 real-world PHP web apps Subject LinesTime (h:mm:ss)Errors String-Taint Policy Conformance DirectIndirect RealFalse Claroline169,4793:04:110:02: e107132,8621:08:050:01: EVE9040:00:010:00:04401 Tiger14,3503:14:073:27:50032 Utopia5,4380:13:100:00: Warp24,3650:00:520:04:49000
isset($_GET[‘userid’]) ? isset($_GET[‘userid’]) ? $userid = $_GET[‘userid’] : $userid = ‘’; if (!eregi(‘[0-9]+’, $userid)) { unp_msg(‘invalid user ID.’); exit; } $getuser = $DB-> query(“SELECT * FROM `unp_user` WHERE userid=‘$userid’”); Example Vulnerability Should be ‘^[0-9]+$’
False Positive CASTING PROBLEMS
Indirect Error Verified ? Returned from DB
Conclusions Achieved accurate checking for SQLCIVs by tracking string values and sources Successfully applied to real-world PHP programs and found subtle vulnerabilities Future work: – Improve error reports – Apply to XSS