Download presentation
Presentation is loading. Please wait.
1
Coevolutionary Automated Software Correction
Josh Wilkerson PhD Candidate in Computer Science Missouri S&T
2
Technical Background Evolutionary Algorithms (EAs)
Subfield of evolutionary computation (in artificial intelligence) Based on biological evolution Uses mutation, reproduction, and selection Population composed of candidate solutions Needed: Solution representation Fitness function Applicable to a wide variety of fields Makes no assumptions about the problem space (ideally)
3
Technical Background EA Operation Start with an initial population
Each generation Create new individuals and evaluate them Population competition (survival of the fittest) Mutation and reproduction Explore the problem space Bring in new genetic material Selection Applies pressure to individuals More fit individuals are selected for mutation and reproduction more often 3
4
Technical Background Genetic Programming Coevolution Type of EA
Evolves tree representations E.g., computer program parse trees Coevolution Extension of standard EA Fitness dependency between individuals Dependency can be either cooperative or competitive CASC system uses competitive coevolution Evolutionary arms-race 4
5
High Level View of CASC View the problem space as all possible software artifacts Use the given software artifact as a starting point Use the correct version(s) of the given software artifact as the goal point Generate test cases to guide the evolution of the software artifact Evolve the software artifacts in order to handle the test cases better (and ultimately perform better) Evolve the test cases in order to better find flaws in the software artifacts Competitive coevolutionary arms race results
6
CASC Evolutionary Model
7
CASC Evolutionary Model
8
CASC Evolutionary Model
Evaluate all individuals, assign fitness Discuss the details of evaluation later
9
CASC Evolutionary Model
10
Reproduction Phase: Programs
Randomly select a genetic operation to perform Probability of operation selection is configurable Perform operation, generate new program(s) Add new individuals to population Repeat until specified number of individuals has been created 10
11
Reproduction Phase: Programs
Genetic Operations Reset Copy Crossover Two individuals are randomly selected based off fitness Randomly select and exchange compatible sub-trees Generates two new programs Mutation Randomly select individual based off fitness Randomly select and change mutable node Generate a new sub-tree (if necessary) Architecture Altering Operations Reselection is allowed for all operators Only specific nodes are considered for mutation (critical points) Numeric constants and unmodified variables Identified dynamically
12
Reproduction Phase: Test Cases
Reproduction employs uniform crossover Each offspring has a chance to mutate Genes to mutate are selected random Mutated gene is randomly adjusted The amount adjusted is selected from a Gaussian distribution Only specific nodes are considered for mutation (critical points) Numeric constants and unmodified variables Identified dynamically 12
13
CASC Evolutionary Model
13
14
CASC Evolutionary Model
15
CASC Evolutionary Model
Trim populations back down to specified size Reverse tournament selection
16
CASC Evolutionary Model
e.g. Number of generations, goal fitness reached, population converged on maxima (low diversity), etc.
17
CASC Implementation Details
Adaptive parameter control EAs typically have many control parameters Difficult to find optimal settings for these parameters In CASC genetic operator probabilities are adaptive parameters Rewarded/punished based on performance If one operator is generating improved individuals more than the others make it more likely to be used Allows the system to adapt to the different phases in the search
18
CASC Implementation Details
Parallel Computation Computational complexity is generally a problem for Eas CASC writes, compiles, and executes hundreds (or even thousands) of C++ programs in a given run To reduce run times this is done in parallel (on the NIC cluster here on campus) Main node: responsible for generating and writing programs Worker nodes: responsible for compiling and executing programs Dramatically speeds up execution Investigating new options for this (discussed later) 18
19
Current and Future Work
Fitness Function Design For each new problem CASC needs a new fitness function Fitness function design can often be difficult Developing a guide for fitness function design Starts a program specifications Walks through the thought process for designing a fitness function for the problem Long term goal: automate fitness function creation 19
20
Current and Future Work
File system slow down CASC is writing and compiling many many programs each run I.e., many many files in the file system each run File system access is bottlenecking the speed of the CASC system Currently reworking the system to store program files and executables in RAM Uses a virtually mounted hard disk that stored data in RAM Expecting a dramatic speed up (fingers crossed…) Other option: distributed computing (like BOINC, etc.) 20
21
Current and Future Work
Scalability As program size increases so does the problem space Many more modifications possible More genetic material Investigating options to allow CASC to scale with problem size Current idea: break the program up into pieces Multiple program populations Each population is based on a piece of the original program Each population has its own objective Cooperative coevolution 21
22
Current and Future Work
Add new diagram 22
23
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.