The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston.

Slides:



Advertisements
Similar presentations
Jack Jedwab Association for Canadian Studies September 27 th, 2008 Canadian Post Olympic Survey.
Advertisements

Chapter 7 Algebra II Review JEOPARDY Jeopardy Review.
Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
New Micro Genetic Algorithm for multi-user detection in WCDMA AZMI BIN AHMAD Borhanuddin Mohd Ali, Sabira Khatun, Azmi Hassan Dept of Computer and Communication.
Cooperative Transmit Power Estimation under Wireless Fading Murtaza Zafer (IBM US), Bongjun Ko (IBM US), Ivan W. Ho (Imperial College, UK) and Chatschik.
Rolls-Royce supported University Technology Centre in Control and Systems Engineering UK e-Science DAME Project Alex Shenfield
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
© The University of Texas at El Paso Course-based Program Assessment Helmut Knaust & Joe A. Guthrie Department of Mathematical Sciences The University.
Lecturer Michael S. McCorquodale Authors Michael S. McCorquodale, Mei Kim Ding, and Richard B. Brown Study and Simulation of CMOS LC Oscillator Phase Noise.
Masters Programmes Stuart Anderson.
Evolving Cooperation in the N-player Prisoner's Dilemma: A Social Network Model Dept Computer Science and Software Engineering Golriz Rezaei Michael Kirley.
When you see… Find the zeros You think….
Better Software Defect Prediction Using Equalized Learning With Machine Learners Kim Kaminsky Gary D. Boetticher Department of Computer Science University.
Understanding the Human Estimator Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA
Using Correlation and Accuracy for Identifying Good Estimators 4 th International Predictor Models.
Nearest Neighbor Sampling for Better Defect Prediction Gary D. Boetticher Department of Software Engineering University of Houston - Clear Lake Houston,
Before Between After.
2.10% more children born Die 0.2 years sooner Spend 95.53% less money on health care No class divide 60.84% less electricity 84.40% less oil.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
DETC ASME Computers and Information in Engineering Conference ONE AND TWO DIMENSIONAL DATA ANALYSIS USING BEZIER FUNCTIONS P.Venkataraman Rochester.
Representing Hypothesis Operators Fitness Function Genetic Programming
Yuri R. Tsoy, Vladimir G. Spitsyn, Department of Computer Engineering
School of Applied Technology, Dep. Of Computer Engineering, T.E.I of Epirus A-Class: a novel classification method I.Tsoulos, A. Tzallas, E. Glavas.
Exact and heuristics algorithms
GP Applications Two main areas of research Testing genetic programming in areas other techniques have been applied to. Applying genetic programming to.
Automated Software Maintainability through Machine Learning by Eric Mudge.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
The GDB Cup: Applying “Real World” Financial Data Mining in an Academic Setting Gary D. Boetticher University of Houston - Clear Lake Houston, Texas, USA.
Study of a Paper about Genetic Algorithm For CS8995 Parallel Programming Yanhua Li.
The University of Kansas Information and Telecommunications Technology Center Engineering of Computer-Based Systems Dr. Perry Alexander Associate Professor.
Department of Engineering, Control & Instrumentation Research Group 22 – Mar – 2006 Optimisation Based Clearance of Nonlinear Flight Control Laws Prathyush.
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
The CONVERSE Project: Tough on Change, Tough on the Causes of Change. Improving Software in Engine Controllers University of York John McDermid, John Clark.
Genetic Algorithms Learning Machines for knowledge discovery.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Instuctor Background Instructor: Michael J. McCarthy Associate Teaching Professor of Information Systems at Carnegie Mellon University from 1999 until.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.
Using Machine Learning to Predict Project Effort: Empirical Case Studies in Data-starved Domains Gary D. Boetticher Department of Software Engineering.
Evolution Strategies Evolutionary Programming Genetic Programming Michael J. Watts
1 James Herbsleb Carnegie Mellon University The Architecture of Coordination The author gratefully acknowledge.
1 Integration of Neural Network and Fuzzy system for Stock Price Prediction Student : Dah-Sheng Lee Professor: Hahn-Ming Lee Date:5 December 2003.
Genetic Algorithms Michael J. Watts
ASC2003 (July 15,2003)1 Uniformly Distributed Sampling: An Exact Algorithm for GA’s Initial Population in A Tree Graph H. S.
NSF SURE Program, Summer 2002 / Clemson University, Clemson, SC 1 Broadband Arrays and Switching Antennas Dan Palecek, SD School of Mines and Technology,
National Science Foundation Directorate for Computer & Information Science & Engineering (CISE) Trustworthy Computing and Transition to Practice Secure.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
Learning by Simulating Evolution Artificial Intelligence CSMC February 21, 2002.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Edge Assembly Crossover
Project Demonstration Template Computer Science University of Birmingham.
CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
N- Queens Solution with Genetic Algorithm By Mohammad A. Ismael.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
MEER 111 – Global Research Solving Real-World Problems with Evolutionary Algorithms Daniel Tauritz, Ph.D. Associate Professor of Computer Science.
George Yauneridge.  Machine learning basics  Types of learning algorithms  Genetic algorithm basics  Applications and the future of genetic algorithms.
Genetic (Evolutionary) Algorithms CEE 6410 David Rosenberg “Natural Selection or the Survival of the Fittest.” -- Charles Darwin.
Bulgarian Academy of Sciences
Science Fair: Data Analysis
Kim Kaminsky Gary D. Boetticher Department of Computer Science
How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.
Wayne Dyksen, Jane Evarian
Understanding the Human Estimator
Project Approach and Outreach
Predicting Student Performance: An Application of Data Mining Methods with an Educational Web-based System FIE 2003, Boulder, Nov 2003 Behrouz Minaei-Bidgoli,
Behrouz Minaei, William Punch
Presentation transcript:

The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA IEEE International Conference on Information Reuse and Integration Kim Kaminsky Univ. of Houston - Clear Lake, Houston, TX, USA

About the Author: Gary D. Boetticher IEEE International Conference on Information Reuse and Integration  Ph.D. in Machine Learning and Software Engineering A neural network-based software reuse economic model  Executive member of IEEE Reuse Standard Committees (1990s)  Commercial consultant: U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, …  Currently: Associate Professor Department of Comp. Science/Software Engineering University of Houston - Clear Lake, Houston, TX, USA  Research interests: Data mining, ML, Computational Bioinformatics, and Software metrics

Motivating Questions Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems? If so, how could these insights be utilized to make better breeding decisions? IEEE International Conference on Information Reuse and Integration

2) Determine the fitness for each (1 /Stand. Error) IEEE International Conference on Information Reuse and Integration Genetic Program Overview X, Y, and Z  RESULT? XYZRESULT :::: ) Create a population of equations Eq#Equation 1X+Y 2(Z-X)*Y+X :: 1000(X*X)-Z : 57 3) Breed Equations X + Y (Z-X) * Y+X (Z-X) + Y X * Y+X 4) Generate new populations and breed until a solution is found

Genetic Program Overview EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZYZY 75 :: Y22 Y - X18 Generation N Generation N+1 EquationFitness (X - Z) (X + Y) * (Y * Y) Z + Y : X Y + Y Why discard legacy information? IEEE International Conference on Information Reuse and Integration

Goal: Examine fitness patterns over time EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W IEEE International Conference on Information Reuse and Integration Generation 1 Generation 2 Generation 3 Localized? Volatile?

Proof of Concept Experiments experiments using synthetic equations: Z = W + X + Y Z = 2 * X + Y – W Z = X / Y Z = X 3 Z = W 2 + W * X - Y Data slightly perturbed to prevent premature convergence Genetic Program 1000 Chromosomes (Equations) 50 Generations Breeding based on fitness rank IEEE International Conference on Information Reuse and Integration

Proof of Concept Experiments - 2 For the 1000 Chromosomes: Divide into 5 groups of 200 (by fitness) Focus on the best, middle, and worst groups See where each group’s offspring occur in the next generation IEEE International Conference on Information Reuse and Integration

Results for Z = W + X + Y Best Middle Worst IEEE International Conference on Information Reuse and Integration

Results for Z = 2 * X + Y – W Best Middle Worst IEEE International Conference on Information Reuse and Integration

Results for Z = X / Y Best Middle Worst IEEE International Conference on Information Reuse and Integration

Results for Z = X 3 Best Middle Worst IEEE International Conference on Information Reuse and Integration

Results for Z = W 2 + W * X - Y Best Middle Worst IEEE International Conference on Information Reuse and Integration

Applied Experiments Best class produces best offspring. Now what? Compare 2 Genetic Programs (GPs) 1) Use a vanilla-based GP 2) Use a GP that breeds only the top 20% of a population and replicates 5 times IEEE International Conference on Information Reuse and Integration Genetic Program 1000 Chromosomes (Equations) 50 Generations 20 Trials Equations to model Z = Sin(W) + Sin(X) + Sin(Y) Z = log 10 (W X ) + (Y * Z)

Results for Z = Sin(W) + Sin(X) + Sin(Y) IEEE International Conference on Information Reuse and Integration Vanilla-Based GP Lineage-Based GP Average Fitness Average r Ave. Generations needed to complete

Results for Z = log 10 (W X ) + (Y * Z) IEEE International Conference on Information Reuse and Integration Vanilla-Based GP Lineage-Based GP Average Fitness Average r Ave. Generations needed to complete

Conclusions IEEE International Conference on Information Reuse and Integration Proof of concept experiments demonstrate the viability of considering lineage in GPs Applied experiments show that lineage-based GP modeling produce better results faster