Download presentation
Presentation is loading. Please wait.
Published byRandolf Tyler Modified over 9 years ago
1
Finding Bugs in Dynamic Web Applications Shay Artzi, Adam Kiezun, Julian Dolby, Frank Tip, Danny Dig, Amit Paradkar, Michael D. Earnst Proceeding: ISSTA '08 (International Symposium on Software Testing and Analysis )
2
– Presented By » Md. Monjurul Hasan CSE 6329 Special Topics in Advanced Software Engineering
3
Dynamic Web Application Generates pages (HTML contents) on-the-fly Content varies on user and user-specified criteria Obtained by server-side programming We can say that all big, known web applications are Dynamic Web Application Source: Dynamic Web Application Development using PHP and MySQL – By Simon Stobart and David Parsons
4
Web Threats Web script crashes and malformed dynamically-generated Web pages impact usability of Web applications Current tools for Web-page validation cannot handle the dynamically-generated pages
5
Web Script Crash Missing included file Call to undefined method Wrong Database query Uncaught exceptions
6
Malformed HTML HTML that does not conform to the WDG (Web Design Group) or W3C’s (World Wide Web Consortium) standard – Not using defined tags by W3C (e.g...etc.) – Not maintaining the structure(e.g... ) – Not using proper opening and matching closing tag – etc. Web Scripting language can generate HTML
7
The Problem Bad scripts creating syntactically-malformed HTML – Partially displayable or Non-displayable HTML – Browser’s attempt to correct crashes – Slower HTML rendering – Discard important information – Trouble indexing correct pages for search engines Example
8
More Problems Dynamic web page testing challenges – HTML validation tools only perform testing of static page – Can not fully capture behavior since not all of functionality of code is found in the HTML result – No automatic validator for scripting languages that dynamically generate HTML pages – HTML Kit validates every generated page but requires manual generation of inputs that lead to displaying pages
9
What this paper presents… Presents automated technique for finding faults manifested as Web script crashes or malformed- HTML – extends dynamic test generation to scripting languages. Identifies minimal part of input responsible for triggering failures Uses an oracle to determine well-formed HTML Creates a tool, Apollo that implements all these in the context of PHP
10
Why ? Widely used in Web development – Network interactions – Database – HTTP processing Object oriented Scripting 21 millions domains 1 (75%) are powered including large websites like Wikipedia, WordPress, Facebook, Dig etc. 1 Source Netcraft, April 2007
11
Example: program SchoolMate.php – Allows school administrators to manage classes and users, teachers to manage assignments and grades and students to access their information Typical URL: schoolmate.php?page=1&page2=100&login=1& username=user&password=password
12
1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo " username must be supplied. \n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo " Login error. Please try again \n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 46 Class Management 47 "); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 53 "); 54 } 55 ?>
13
1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo " username must be supplied. \n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo " Login error. Please try again \n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 46 Class Management 47 "); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 53 "); 54 } 55 ?> ‘printReportCards.php’ missing make_footer() not executed in certain situations unclosed HTML tag Generates illegal tag
14
Failures in PHP programs Targets two types of failures – Execution failures Web Script Crashes – HTML failures Malformed HTML
15
Failure-Finding in PHP Applications Concolic Testing – Dynamic Test Generation Technique Execute application on 1.Initially on empty input 2.Then on additional inputs, obtained by solving constraints that are derived from control flow paths Extensions – Validate to correctness of program output by using oracle – Use isset, isempty, require, etc. to require generation of constraints absent in other OOPL’s – Use pre-specified set of values for database authentication – Simulate each user input by transforming source code
16
Transformation of Code Interactive HTML pages with buttons and menus For each page (h) that contains N buttons – Add additional input parameter p to PHP program Values range from 1 to N – Switch statement inserted including appropriate PHP source file, depending on p
17
An example <? /* Simulated User Input */ Switch ($_GET[“_btn”] { Case 1: require_once(“mainmenu.php”); break; Case 2: require_once (“newuser.php”); break; } ?> <?php echo “ Webchess “.$Version.” login” ; ?> Nick: Password:
18
The Failure Detection Algorithm parameters: Program P, oracle O result : Bug reports B; B : setOf ( ) 1.P′ ≔ simulateUserInput(P); 2.B ≔ empty; 3.pcQueue ≔ emptyQueue(); 4.enqueue(pcQueue, emptyPathConstraint()); 5.while not empty(pcQueue) and not timeExpired() do 6. pathConstraint ≔ dequeue(pcQueue); 7. input ≔ solve(pathConstraint); 8. if input not equals to ⊥ then 9. output ≔ executeConcrete(P′, input); 10. failures ≔ getFailures(O, output); 11. foreach f in failures do 12. merge into B; 13. c1 ∧... ∧ cn ≔ executeSymbolic(P′, input); 14. foreach i = 1,...,n do 15. newPC ≔ c1 ∧... ∧ ci−1 ∧ ¬ ci; 16. queue(pcQueue, newPC); 17.return B;
19
Example: Execution 1 (Expose Third Fault) 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo " username must be supplied. \n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo " Login error. Please try again \n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 46 Class Management 47 "); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 true – sets page = 0 false GoTo(20) Execution HTML validation tool determines output is legal NotSet(page) ∧ page2 ≠ 1337 ∧ login ≠ 1 HTML validation tool determines output is legal NotSet(page) ∧ page2 ≠ 1337 ∧ login ≠ 1 parameters: Program P, oracle O result : Bug reports B; B : setOf ( ) 1.P′ ≔ simulateUserInput(P); 2.B ≔ empty; 3.pcQueue ≔ emptyQueue(); 4.enqueue(pcQueue, emptyPathConstraint()); 5.while not empty(pcQueue) and not timeExpired() do 6. pathConstraint ≔ dequeue(pcQueue); 7. input ≔ solve(pathConstraint); 8. if input not equals to ⊥ then 9. output ≔ executeConcrete(P′, input); 10. failures ≔ getFailures(O, output); 11. foreach f in failures do 12. merge into B; 13. c1 ∧... ∧ cn ≔ executeSymbolic(P′, input); 14. foreach i = 1,...,n do 15. newPC ≔ c1 ∧... ∧ ci−1 ∧ ¬ ci; 16. queue(pcQueue, newPC); 17.return B; NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 NotSet(page) ∧ page2 = 1337 Set(page) NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 NotSet(page) ∧ page2 = 1337 Set(page)
20
Example: Execution 2 (The Opposite Path) NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 – Constraint solver may get page2 0; login 1 1 <?php 2 3 make_header(); // print HTML header 4 5 // Make the $page variable easy to use // 6 if(!isset($_GET[’page’])) $page = 0; 7 else $page = $_GET[’page’]; 8 9 // Bring up the report cards and stop processing // 10 if($_GET[’page2’]==1337) { 11 require(’printReportCards.php’); 12 die(); // terminate the PHP program 13 } 14 15 // Validate and log the user into the system // 16 if($_GET["login"] == 1) validateLogin(); 17 18 switch ($page) 19 { 20 case 0: require(’login.php’); break; 21 case 1: require(’TeacherMain.php’); break; 22 case 2: require(’StudentMain.php’); break; 23 default: die("Incorrect page number. Please verify."); 24 } 25 26 make_footer(); // print HTML footer 27... 27 function validateLogin() { 28 if(!isset($_GET[’username’])) { 29 echo " username must be supplied. \n"; 30 return; 31 } 32 $username = $_GET[’username’]; 33 $password = $_GET[’password’]; 34 if($username=="john" && $password=="theTeacher") 35 $page=1; 36 else if($username=="john" && $password=="theStudent") 37 $page=2; 38 else echo " Login error. Please try again \n"; 39 } 40 41 function make_header() { // print HTML header 42 print(" 43 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 44 "http://www.w3.org/TR/html4/strict.dtd"> 45 46 Class Management 47 "); 48 } 49 50 function make_footer() { // close HTML elements opened by header() 51 print(" 52 53 "); 54 } 55 ?> true HTML validation tool discovers failure and generates bug report added to output set of bug reports
21
Minimization on Path Constraints Find shorter path constraint for a given bug report Eliminates irrelevant constraints – better assist programmer to detect location of the fault Solution for a shorter path constraint is often a smaller input Does not guarantee returned path constraint is shortest that exposes failure
22
Minimization Example HTML malformation from previous example could have been reached from different execution paths NotSet(page) ∧ page2 ≠ 1337 ∧ login = 1 Set(page) ∧ page = 0 ∧ page2 ≠ 1337 ∧ login = 1 page2 ≠ 1337 ∧ login = 1 page2 ≠ 1337 login = 1 (login 1)
23
parameters: Program P, oracle O, bug report b result : Short path constraint that exposes b.failure 1.c1 ∧... ∧ cn ≔ intersect(b.pathConstraints); 2.pc ≔ true; 3.foreach i = 1,..., n do 4. pci ≔ c1 ∧... ci−1 ∧ ci+1 ∧... cn; 5. input ≔ solve(pci); 6. if input not equals ⊥ then 7. output ≔ executeConcrete(P, input); 8. failures ≔ getFailures(O, output); 9. if b.failure not belongs to failures then 10. pc ≔ pc ∧ ci; 11.input pc ≔ solve(pc); 12.if input pc not equals to ⊥ then 13. output pc ≔ executeConcrete(P, input pc ); 14. failures pc ≔ getFailures(O, output pc ); 15. if b.failure ∈ failures pc then 16. return pc; 17.return shortest(b.pathConstraints); Path Constraint Minimization Algorithm
24
Apollo User Input Simulator Executor Bug Finder – Oracle – Bug Report Repository – Input minimizer Input Generator – Symbolic Finder – Constraint Solver – Value Generator
25
Apollo
26
Executor: Shadow Interpreter Shadow Interpreter – Modified Zend PHP interpreter 5.2.2 to record path constraints and information associated with output – Performs symbolic execution along with concrete execution – Records conditions for PHP-specific comparison operations such as isset and empty
27
Executor: Database Manager Database Manager – (Re) initializes DB used by a PHP application. Restores DB before each execution – Supply additional information about username/password pairs
28
Bug Finder Bug Report = Failure + Path constraint + Input inducing failure Failure = Type of Failure + Corresponding Message + PHP statement generating bad HTML Oracle – HTML validation tool (WDG and WC3) Input Minimizer – uses the path constraints minimization algorithm
29
Input Generator Symbolic Driver – generates new path constraints and select next path constraint Constraint Solver – computes an assignment of values to input parameters that satisfies a given path constraint. – Choco constraint solver Value Generator – generates value for parameters – Combines random value generation and constant values mined from source code
30
Experimentation Program#filesLOCPHP LOC# DL’s faqforge19171273414164 webchess244718222632352 schoolmate63818142634466 phpsysinfo73166347745492217 total1793124514968543199 faqforge = Tool for creating and managing documents webchess = Online chess game schoolmate = PHP/MySQL solution for administering schools phpsysinfo = Displays system info
31
Generation Strategies Compared to two other approaches – Halfond and Orso (Randomized) Random values to the parameters Proposed for JavaScript – Minamide’s static analysis Approximates the string output of program with a context-free grammar Discovers malformed HTML faults Apollo’s test input generation previously discussed
32
Methodology 10-minute runs on each program – Generation of hundreds of inputs Ran on both Apollo and Random test input generation strategies WDG offline HTML validation tool
33
Results Classification Execution crash: PHP interpreter terminates with exception Execution error: PHP interpreter emits warning visible in generated HTML Execution warning: PHP interpreter emits warning invisible to HTML output HTML error: program generates HTML for which validation tool produces error report HTML warning: program generates HTML for which validation produces a warning report
34
Randomized Results Analysis Apollo Average line coverage – 58.0% Faults Found on Subject Apps – 214 Average line coverage – 15.0% Faults Found on Subject Apps – 59 Tries to load two missing files Database related Unset Time-zone Resulted in Malformed HTML Line Coverage = Number of executed lines / Total lines with executable PHP code in application
35
Results Analysis Apollo Vs Randomized – 58% line coverage Vs 15.2% line coverage – 214 faults Vs 59 faults Apollo Vs Minamide’s tool – 2.7 more HTML validation faults (120 Vs 45) – 83 additional execution faults – 104 faults (10 minutes) Vs 14 faults (126 minutes) Apollo is more effective and efficient than both
36
Results Analysis: Path Constraint Minimization ProgramSuccess rate % Path ConstraintsInputs Orig. SizeReductionOrig. SizeReduction faqforge6422.30.229.30.31 webchess9123.40.1910.90.40 schoolmate5122.90.3811.50.58 phpsysinfo8224.30.1817.50.26 Reduces size of inputs by up to factor of 0.18 for more than 50% of faults Success rate – Percentage of faults whose exposing input was minimized Orig. size – Average size of original path constraints (# of conjuncts) and inputs (# of key-value pairs) Reduction columns – Ratio of minimized to un-minimized size. The lower the ratio, the more successful the minimization
37
Limitations Simulating user inputs statically JavaScript code in the generated HTML not tracked Limited line coverage for native C methods Limited sources of input parameters – Only inputs from global arrays (_POST, _GET and _REQUEST)
38
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.