REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Le Goues Westley Weimer Stephanie Forrest

Slides:



Advertisements
Similar presentations
Fitness Uniform Deletion: A Simple Way to Preserve Diversity Shane Legg & Marcus Hutter IDSIA Galleria Manno-Lugano Switzerland.
Advertisements

1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Time-Aware Test Suite Prioritization Kristen R. Walcott, Mary Lou Soffa University of Virginia International Symposium on Software Testing and Analysis.
A SYSTEMATIC STUDY OF AUTOMATED PROGRAM REPAIR: FIXING 55 OUT OF 105 BUGS FOR $8 EACH Claire Le Goues Michael Dewey-Vogt Stephanie Forrest Westley Weimer.
1 Software Engineering Lecture 11 Software Testing.
Automatic Software Repair Using GenProg 张汉生 ZHANG Hansheng 2013/12/3.
Automatic Program Correction Anton Akhi Friday, July 08, 2011.
Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche
Friend Recommendations in Social Networks using Genetic Algorithms and Network Topology Jeff Naruchitparames, Mehmet Gunes, Sushil J. Louis University.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
SE 450 Software Processes & Product Metrics Reliability: An Introduction.
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Genetic Algorithms GAs are one of the most powerful and applicable search methods available GA originally developed by John Holland (1975) Inspired by.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
2 What Genetic Programming ISN'T: What Genetic Programming ISN'T: Engineering a breed of purple carrots. Engineering a breed of purple carrots. Analyzing.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Subgoal: conduct an in-depth study of critical representation, operator and other choices used for evolutionary program repair at the source code level.
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard MIT CSAIL
Khaled Rasheed Computer Science Dept. University of Georgia
EMPIRICAL EVALUATION OF INNOVATIONS IN AUTOMATIC REPAIR CLAIRE LE GOUES SITE VISIT FEBRUARY 7,
IMPROVED FITNESS FUNCTIONS FOR AUTOMATED PROGRAM REPAIR ZACHARY P. FRY.
Genetic Programming.
WEEK 1 CS 361: ADVANCED DATA STRUCTURES AND ALGORITHMS Dong Si Dept. of Computer Science 1.
Automatic Program Repair With Evolutionary Computation Westley Weimer Computer Science Dept. University of Virginia Charlottesville, VA 22904
` Research 2: Information Diversity through Information Flow Subgoal: Systematically and precisely measure program diversity by measuring the information.
IMPROVED FITNESS FUNCTIONS FOR AUTOMATED PROGRAM REPAIR ZACHARY P. FRY.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Invitation to Computer Science, Java Version, Second Edition.
© SERG Dependable Software Systems (Mutation) Dependable Software Systems Topics in Mutation Testing and Program Perturbation Material drawn from [Offutt.
Using Execution Paths to Evolve Software Patches ThanhVu Nguyen*, Westley Weimer**, Claires Le Gouges**, Stephanie Forrest* * University of New Mexico.
Helix Automatic Software Repair with Evolutionary Computation Stephanie Forrest Westley Weimer.
Bug Localization with Machine Learning Techniques Wujie Zheng
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING CLAIRE LE GOUES APRIL 22,
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 15: Automated Patch Generation.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING 1 CLAIRE LE GOUES APRIL 22, 2013
Hai Wan School of Software Sun Yat-sen University KRW-2012 June 17, 2012 Boolean Program Repair Reverse Conversion Tool via SMT.
Extended Finite-State Machine Inference with Parallel Ant Colony Based Algorithms PPSN’14 September 13, 2014 Daniil Chivilikhin PhD student ITMO.
1 Genetic Algorithms and Ant Colony Optimisation.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Applications of Genetic Algorithms TJHSST Computer Systems Lab By Mary Linnell.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
“Isolating Failure Causes through Test Case Generation “ Jeremias Rößler Gordon Fraser Andreas Zeller Alessandro Orso Presented by John-Paul Ore.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
Bug Localization with Association Rule Mining Wujie Zheng
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
Automated discovery in math Machine learning techniques (GP, ILP, etc.) have been successfully applied in science Machine learning techniques (GP, ILP,
 Software reliability is the probability that software will work properly in a specified environment and for a given amount of time. Using the following.
March 1, 2016Introduction to Artificial Intelligence Lecture 11: Machine Evolution 1 Let’s look at… Machine Evolution.
Evolutionary Computation Evolving Neural Network Topologies.
Code Learning and Transfer for Automatic Patch Generation
Anti-patterns in Search-based Program Repair
Evolutionary Algorithms Jim Whitehead
Different Types of Testing
IF statements.
MultiRefactor: Automated Refactoring To Improve Software Quality
Ask the Mutants: Mutating Faulty Programs for Fault Localization
Automated Patch Generation
Introduction to Artificial Intelligence Lecture 11: Machine Evolution
Applications of Genetic Algorithms TJHSST Computer Systems Lab
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Using Automated Program Repair for Evaluating the Effectiveness of
(presentor: jee-weon Jung)
Presentation transcript:

REPRESENTATIONS AND OPERATORS FOR IMPROVING EVOLUTIONARY SOFTWARE REPAIR Claire Le Goues Westley Weimer Stephanie Forrest 1

Claire Le Goues, GECCO 2012 PROBLEM: BUGGY SOFTWARE “Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” – Mozilla Developer, 2005 Annual cost of software errors in the US: $59.5 billion (0.6% of GDP). Average time to fix a security-critical error: 28 days. 90%: Maintenance 10%: Everything Else 2

Claire Le Goues, GECCO 2012 APPROACH: EVOLUTIONARY COMPUTATION 3

Claire Le Goues, GECCO 2012 Genetic Programming Search Input: source code, specification Output: repaired version of the program 4 Input: source code, specification Output: repaired version of the program

Claire Le Goues, GECCO 2012 SEARCH SPACE 5

Claire Le Goues, GECCO 2012 Learning: patches or repaired programs Population: 40 Iterations: 10 Max variant size: unbounded Largest benchmark program: 2.8 million lines of C code Learning: patches or repaired programs Population: 40 Iterations: 10 Max variant size: unbounded Largest benchmark program: 2.8 million lines of C code. OTHER GP PROBLEMS GECCO GP-TRACK BEST PAPERS Learning: expression trees or lists Population: 64 – 2500 Iterations: 50 – Max variant size: 16 operations, 48 operations, 17 levels, 11 levels PROGRAM REPAIR 6

Claire Le Goues, GECCO 2012 SEARCH SPACE 7 EC-based repair starts with a large genome. The starting individual is mostly correct.

Claire Le Goues, GECCO 2012 Genetic Programming Search 8 Input: source code, specification Output: repaired version of the program

Claire Le Goues, GECCO 2012 IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR. OUR GOAL 9 IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR.

Claire Le Goues, GECCO 2012 INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT 10 CROSSOVER, MUTATE, SELECT

Claire Le Goues, GECCO 2012 IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR. OUR GOAL 11

Claire Le Goues, GECCO 2012 INPUT OUTPUT EVALUATE FITNESS DISCARD ACCEPT 12 CROSSOVER, MUTATE, SELECT

Claire Le Goues, GECCO 2012 CROSSOVER, MUTATE, SELECT DISCARD INPUT EVALUATE FITNESS ACCEPT OUTPUT

Claire Le Goues, GECCO

Claire Le Goues, GECCO 2012 Input: REPRESENTATION Patch: Delete(3) Replace(3,5) 15 AST/WP: ’ Delete(3) Insert(2,4) Replace(5,1) Insert(5,4) Insert(3,3) Delete(4) …

Claire Le Goues, GECCO 2012 Mutation operators Manipulate only existing genetic material. Semantic checking improves probability that mutation is viable. Crossover: One-point: on the weighted path or edit list. Patch-subset: uniform, on the edit list. Mutation operators Manipulate only existing genetic material. Semantic checking improves probability that mutation is viable. Crossover: One-point: on the weighted path or edit list. Patch-subset: uniform, on the edit list. GENETIC OPERATORS 16 swap insert delete Aside: mutation operator selection matters!

Claire Le Goues, GECCO 2012 Legend: Likely faulty. probability Maybe faulty. probability Not faulty Input: Legend: High change probability. Low change probability. Not changed. Default: 10 : 1 ratio

Claire Le Goues, GECCO 2012 IN-DEPTH STUDY OF REPRESENTATION AND OPERATORS FOR EVOLUTIONARY PROGRAM REPAIR. OUR GOAL 18

Claire Le Goues, GECCO 2012 Benchmarks: 105 bugs in 8 real-world programs. 1 5 million lines of C code, 10,000 test cases. Bugs correspond to human-written repairs for regression test failures. Default parameters, for comparison: Patch representation. Mutation operators selected with equal random probability. 1 mutation, 1 crossover/individual/iteration. Population size: iterations or 12 wall-clock hours, whichever comes first. Tournament size: 2. EXPERIMENTAL SETUP Claire Le Goues, Michael Dewey-Vogt, Stephanie Forrest, and Westley Weimer, “A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 Each.” International Conference on Software Engineering (ICSE), 2012, pp. 3 – 13.

Claire Le Goues, GECCO /105 bugs repaired using default parameters. Some bugs are more difficult to repair than others! Easy: 100% success rate on default parameters. Medium: 50 – 100% success rate on default parameters Hard: 1 – 50% success rate on default parameters Unfixed: 0% success Metrics: # fitness evaluations to a repair GP success rate. EXPERIMENTAL SETUP 20

Claire Le Goues, GECCO 2012 Representation: Which representation choice gives better results? Which representation features contribute most to success? Crossover: Which crossover operator is best? Operators: Which operators contribute the most to success? How should they be selected? Search space: How should the representation weight program statements to best define the search space? RESEARCH QUESTIONS 21 Representation: Which representation choice gives better results? Which representation features contribute most to success? Crossover: Which crossover operator is best? Operators: Which operators contribute the most to success? How should they be selected? Search space: How should the representation weight program statements to best define the search space? Representation: Which representation choice gives better results? Which representation features contribute most to success? Crossover: Which crossover operator is best? Operators: Which operators contribute the most to success? How should they be selected? Search space: How should the representation weight program statements to best define the search space?

Claire Le Goues, GECCO 2012 Procedure: Compare AST/WP to PATCH on original benchmarks with default parameters. For both representations, test effectiveness of: 1.Crossover. 2.Semantic check. Results: 1.Patch outperforms AST/WP (14 – 30%). 2.Semantic check strongly influences success rate of both representations. 3.Crossover also improves results. REPRESENTATION: RESULTS 22 Procedure: Compare AST/WP to PATCH on original benchmarks with default parameters. For both representations, test effectiveness of: 1.Crossover. 2.Semantic check. Results: 1.Patch outperforms AST/WP (14 – 30%). 2.Semantic check strongly influences success rate of both representations. 3.Crossover also improves results.

Claire Le Goues, GECCO 2012 Representation: Which representation choice gives better results? Which representation features contribute most to success? Crossover: Which crossover operator is best? Operators: Which operators contribute the most to success? How should they be selected? Search space: How should the representation weight program statements to best define the search space? RESEARCH QUESTIONS 23

Claire Le Goues, GECCO 2012 CROSSOVER: RESULTS 24 Crossover OperatorSuccess Rate Fitness evaluations to repair None54.4%82.43 Default/“Uniform”61.1% One-Point/AST- WP 63.7% One-Point/Patch65.2% Crossover OperatorSuccess Rate Fitness evaluations to repair None54.4%82.43 Default/“Uniform”61.1% One-Point/AST- WP 63.7% One-Point/Patch65.2% Crossover OperatorSuccess Rate Fitness evaluations to repair None54.4%82.43 Default/“Uniform”61.1% One-Point/AST- WP 63.7% One-Point/Patch65.2%118.20

Claire Le Goues, GECCO 2012 Representation: Which representation choice gives better results? Which representation features contribute most to success? Crossover: Which crossover operator is best? Operators: Which operators contribute the most to success? How should they be selected? Search space: How should the representation weight program statements to best define the search space? RESEARCH QUESTIONS 25

Claire Le Goues, GECCO SEARCH SPACE: SETUP Hypothesis: statements executed only by the failing test case(s) should be mutated more often than those also executed by the passing test cases. Procedure: examine that ratio in actual repairs. Result: Expected: 10 : 1 vs. Actual: 1 : 1.85 Hypothesis: statements executed only by the failing test case(s) should be mutated more often than those also executed by the passing test cases. Procedure: examine that ratio in actual repairs. Result:

Claire Le Goues, GECCO SEARCH SPACE: REPAIR TIME

Claire Le Goues, GECCO SEARCH SPACE: SUCCESS RATE

Claire Le Goues, GECCO 2012 This EC problem is atypical; atypical problems warrant study. We studied representation and operators for EC-based bug repair. These choices matter, especially for difficult bugs. Incorporating all recommendations, GenProg repairs 5 new bugs; repair time decreases by 17–43% on difficult bugs. We don’t know why some of these things are true, but: We now have lots of interesting data to dig into! We are currently (as we sit here) doing more and bigger runs with the new parameters on the as-yet unfixed bugs. CONCLUSIONS/DISCUSSION 29

Claire Le Goues, GECCO 2012 PLEASE ASK QUESTIONS 30

Claire Le Goues, GECCO 2012 REPRESENTATION BENCHMARKS ProgramLOC Test s BugDescription gcd226infinite loopExample uniq-utx11466segfaultText processing look-utx11696segfaultDictionary lookup look-svr13636infinite loopDictionary lookup units-svr15046segfaultMetric conversion deroff- utx 22366segfaultDocument processing nullhttpd55757 buffer exploit Webserver indent99066infinite loopSource code processing flex segfaultLexical analyzer generator atris buffer exploit Graphical tetris game Total

Claire Le Goues, GECCO 2012 SEARCH SPACE BENCHMARKS ProgramLOCTestsBugsDescription fbc97, Language (legacy) gmp145, Multiple precision math gzip491,000125Data compression libtiff77, Image manipulation lighttpd62, Web server php 1,046,00 0 8,47144Language (web) python407, Language (general) wireshark 2,814, Network packet analyzer Total 5,139, , /105 bugs repaired using default parameters.