Presentation is loading. Please wait.

Presentation is loading. Please wait.

Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer.

Similar presentations


Presentation on theme: "Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer."— Presentation transcript:

1 Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer

2 Helix Introduction Automatic bug repair is an important unsolved problem in software engineering Automated repair is needed for self-healing systems “The problem of security is the problem of software” We combine state-of-the-art methods from programming languages with innovations in evolutionary computation To repair bugs in publicly released software

3 Helix Summary of Method Assume: Access to C source code Negative test case (input = 10593 ; output = infinite loop) Positive test cases (encode required program functionality) Construct Abstract Syntax Tree using CIL Evolve repair that avoids negative test case and passes positive test case Minimize repair using structural differencing and delta debugging

4 Helix What is evolutionary computation? 4 Evolution in a computer: Individuals (genotypes) stored in the computer’s memory Evaluation of individuals (artificial selection) Differential reproduction by copying and deleting Variation introduced by analogy with mutation and crossover

5 Helix Example: Microsoft Zune Dec. 31, 2008. Microsoft Zune players mysteriously freeze up. Bug: Infinite loop when input is last day of a leap year. Negative test case: 10593, which corresponds to Dec 31, 2008. Repair is not trivial. Microsoft’s recommendation was to let Zune drain its battery and then reset. Downloaded from http://pastie.org/349916 (Jan. 2009).http://pastie.org/349916

6 Helix Evolutionary Computation Innovations Start with a working program Focus on execution path through AST Restrict mutation and crossover to execution path Represent AST at level of statements Leaves out expressions, variable declarations Genetic operators Don’t invent any new code, crossback, macromutation operators Minimize repair size using structural differencing

7 Helix AST Representation

8 Helix Weighted Path Nodes visited by negative test case have weight 1.0 Nodes visited by negative and positive test cases have weight 0.01 All other nodes have weight 0.0

9 Helix The Final Evolved Repair

10 Helix Summary of Repairs to Date Twenty distinct defects in 7 classes: Segfault: 7 Buffer overflows: 3 Infinite loops: 4 Incorrect output: 2 Integer overflow: 2 Non-overflow DOS: 1 Format string attack: 1 Twenty distinct programs totaling 186,603 LOC (180k LOC) Scientific Computing: 1 Scripting Languages: 3 Games, Graphics, Sound: 4 Servers (web, ftp, authentication): 4 Operating system utilities: 8

11 Helix Benchmark programs Program Lines of Code (LOC) FunctionalityFault Time to Repair zune28media playerinfinite loop42s gcd22handcrafted exampleinfinite loop153s uniq1146duplicate text processingsegfault34s look-u1169dictionary lookupsegfault45s look-s1363dictionary lookupinfinite loop55s units1504metric conversionsegfault109s deroff2236document processingsegfault131s nullhttp5575webserverheap overflow578s indent9906source code processinginfinite loop546s flex18775lexical analyzer generatorsegfault230s atris21553graphical tetris gamestack overflow80s openldap io.c6519directory protocolnonoverflow DOS665s lighttp fastcgi.c 13984webserverheap overflow49s php string.c26044scripting languageinteger overflow6s wu-ftpd35109FTP serverformat string2256s GECCO 2009, ICSE 2009, ACSAC (submitted)

12 Helix Time to Discover Repair Time to repair: 3 - 10 minutes Time includes: GP algorithm (selection, mutation, calculating fitness, etc.) Running test cases Pretty printing and memoizing ASTs gcc (compiling ASTs into executable code) No special hardware

13 Helix Research Questions Does it really work? Why does it work? How can we break it? How does the representation affect size of search space? Order-of-magnitude reductions What is the role of evolution? Variable. Random search often performs as well How does the number of test cases affect results? Can improve results and reduce variability, but increases search time How does the method scale with problem size? Search time scales more than linearly but less than a quadratic

14 Helix Search Time Scaling m = 1.26

15 Helix Why it Works Generic approach Powerful intermediate representation Weighted path greatly reduces search space Minimization eliminates unnecessary fixes Most bugs can be fixed with a few local modifications 667 average atomic genetic operations to discover a repair; Repair discovered on average in 3.6 generations; 2.9 genetic operations per fitness evaluation At least 1/2 the time, Random Search does as well as GP

16 Helix Quality of Repair Manual checks for repair correctness. Microsoft requires that security-critical changes be subjected to 100,000 fuzz inputs (randomly generated structured input strings). Used SPIKE black-box fuzzer (immunitysec.com) to generate 100,000 held-out fuzz requests for web server examples. In no case did GP repairs introduce errors that were detected by the fuzz tests, and in every case the GP repairs defeated variant attacks based on the same exploit. Thus, the GP repairs are not fragile memorizations of the input. GP repairs also correctly handled all subsequent requests from indicative workload.

17 Helix Papers and Awards W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest ``Automatically finding patches using genetic programming.'’ ICSE (2009) Best Paper Award. S. Forrest, W. Weimer, T. Nguyen, and C. Le Goues ``A Genetic Programming Approach to Automated Software Repair.'’ GECCO (2009) Best Paper Award. C. Le Goues, T. Nguyen, W. Weimer, and S. Forrest ``Closed-Loop Repair of Security Vulnerabilities.'’ (ACSAC 25) (Submitted June 2009). AWARD: Human-Competitive Results Produced by Genetic and Evolutionary Computation (Humie Award). $5000 IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice.1024 Euros 2nd International Workshop on Search-Based Software Testing. Best paper and best presentation. 17

18 Helix The Future Self-healing systems for security (next talk) Integrating anomaly detection to find negative test cases Runtime repair using software dynamic translation, e.g., Strata Repair templates, other search methods Repair quality carefully Consistency in distributed applications? N-version diversity? Systematic study of large software code bases Hypothesis: Most bugs are small A small step for GP, a large step for software?

19 Helix Evolutionary computation details Fitness: Weighted sum of test cases that the program passes: F(Programs that don’t compile) = 0 5 positive test cases (weight = 1), 1 or 2 negative test cases (weight = 10) Mutation operations: Delete a statement, Insert a statement, Swap a stmt along the weighted path with a stmt from another part of the program, Crossover: Crosses back to original parent Population size is 40. Standard run is 10 gens + 10 gens

20 Helix Minimizing the final repair Use tree-structured differencing (Al-Ekram et al. 2005) View primary repair as a set of tree-structured operations Consider the One-minimal subset of repairs Let C p = {c 1, c 2,... c n } be the set of changes in a primary repair One-minimal subset is the minimal subset of C p that passes all test cases Delta debugging: Search for one-minimal subset using binary search n 2 time in worst case often linear


Download ppt "Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer."

Similar presentations


Ads by Google