CTU Prague, Faculty of Electrical Engineering, Department of Computer Science Pavel Kordík GMDH and FAKE GAME: Evolving ensembles of inductive models.

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Applications of one-class classification

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Data Mining Classification: Alternative Techniques

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”

Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.

Feature Selection Presented by: Nafise Hatamikhah

Machine Learning Neural Networks

Data Mining Techniques Outline

Introduction to Neural Networks Simon Durrant Quantitative Methods December 15th.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.

A Genetic Algorithms Approach to Feature Subset Selection Problem by Hasan Doğu TAŞKIRAN CS 550 – Machine Learning Workshop Department of Computer Engineering.

Issues with Data Mining

Efficient Model Selection for Support Vector Machines

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Integrating Neural Network and Genetic Algorithm to Solve Function Approximation Combined with Optimization Problem Term presentation for CSC7333 Machine.

Chapter 9 Neural Network.

© Copyright 2004 ECE, UM-Rolla. All rights reserved A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C.

Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,

Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.

Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 14/15 – TP19 Neural Networks & SVMs Miguel Tavares.

NEURAL NETWORKS FOR DATA MINING

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.

Ensemble Methods: Bagging and Boosting

CLASSIFICATION: Ensemble Methods

Akram Bitar and Larry Manevitz Department of Computer Science

Ivica Dimitrovski 1, Dragi Kocev 2, Suzana Loskovska 1, Sašo Džeroski 2 1 Faculty of Electrical Engineering and Information Technologies, Department of.

Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.

Data Mining and Decision Support

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

FAKE GAME updates Pavel Kordík

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

An Evolutionary Algorithm for Neural Network Learning using Direct Encoding Paul Batchis Department of Computer Science Rutgers University.

A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.

VISUALIZATION TECHNIQUES UTILIZING THE SENSITIVITY ANALYSIS OF MODELS Ivo Kondapaneni, Pavel Kordík, Pavel Slavík Department of Computer Science and Engineering,

Evolutionary Computation Evolving Neural Network Topologies.

Inductive modelling: Detection of system states validity Pavel Kordík Department of Computer Science and Engineering, FEE,CTU Prague

The GAME Algorithm Applied to Complex Fractionated Atrial Electrograms Data Set Pavel Kordík, Václav Křemen and Lenka Lhotská Department of Computer Science.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Today’s Lecture Neural networks Training

CS 9633 Machine Learning Support Vector Machines

Chapter 7. Classification and Prediction

OPTIMIZATION OF MODELS: LOOKING FOR THE BEST STRATEGY

Artificial Neural Networks

Computer Science and Engineering, Seoul National University

Alan D. Mead, Talent Algorithms Inc. Jialin Huang, Amazon

2006 IEEE World Congress on Computational Intelligence, International Joint Conference on Neural Networks (IJCNN) Evolutionary Search for Interesting Behavior.

Advanced Artificial Intelligence Feature Selection

Regularization of Evolving Polynomial Models

A “Holy Grail” of Machine Learing

Machine Learning Today: Reading: Maria Florina Balcan

Data Mining Practical Machine Learning Tools and Techniques

klinické neurofyziologie

Neural Networks Geoff Hulten.

Department of Electrical Engineering

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Ensemble learning.

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Evolutionary Ensembles with Negative Correlation Learning

A task of induction to find patterns

Akram Bitar and Larry Manevitz Department of Computer Science

Presentation transcript:

CTU Prague, Faculty of Electrical Engineering, Department of Computer Science Pavel Kordík GMDH and FAKE GAME: Evolving ensembles of inductive models

2/67 Outline of the tutorial … Introduction, FAKE GAME framework, architecture of GAME GAME – niching genetic algorithm GAME – ensemble methods –Trying to improve accuracy GAME – ensemble methods –Models’ quality –Credibility estimation

3/67 Theory in background Knowledge discovery Data preprocessing Data mining Neural networks Inductive models Continuous optimization Ensemble of models Information visualization

4/67 Problem statement Knowledge discovery (from databases) is time consuming and expert skills demanding task. Data preprocessing is extremely time consuming process. Data mining involves many experiments to find proper methods and to adjust their parameters. Black-box data mining methods such us neural networks are not considered credible. Knowledge extraction from data mining methods is often very complicated.

5/67 Solutions proposed – FAKE GAME concept

6/67 Detailed view of proposed solutions

7/67 The GAME engine for automated data mining What is hidden inside?

8/67 Background (GMDH) Group Method of Data Handling GMDH – prof. Ivakhnenko in 1968 The principle of INDUCTION employed: Inductive model grows from data, it decomposes a multivariate problem into subproblems, solves it in the space of lower dimensionality (two dimensions for units with 2 inputs), combines partial solutions to get the global solution of the problem. MIA GMDH – the most commonly used algorithm P For instance: y=ax 1 +bx 2 +c x2x2 x1x1 y

9/67 Our improvements Non-heuristic search Exhaustive search GA+DC

10/67 Recent improvements (GAME) Group of Adaptive Models Evolution (GAME) -Heterogeneous units -Several competing training methods -Interlayer connections, number of inputs allowed increases -Niching genetic algorithm (will be explained) employed in each layer to optimize the topology of GAME networks. -Ensemble of models generated (will be explained)

11/67 Heterogeneous units in GAME

12/67 Optimization of coefficients (learning) x 1 x n x 2...    * 1aeay n n i ii a ax n         Gaussian(GaussianNeuron) We have inputs x 1, x 2, …, x n and target output y in the training data set We are looking for optimal values of coefficients a 0, a 1, …, a n+2 y’ The difference between unit output y’ and the target value y should be minimal for all vectors from the training data set

13/67 How to derive analytic gradient? Error of the unit (energy surface) Gradient of the error Unit with gaussian transfer function Partial derivation of error in the direction of coefficient a i

14/67 Partial derivatives of the Gauss unit

15/67 Optimization of their coefficients Unit does not provide analytic gradient just error of the unit Unit provides analytic gradient and the error of the unit

16/67 Very efficient gradient based training for hybrid networks developed!

17/67 Outline of the talk … Introduction, FAKE GAME framework, architecture of GAME GAME – niching genetic algorithm GAME – ensemble methods –Trying to improve accuracy GAME – ensemble methods –Models’ quality –Credibility estimation

18/67 Genetic algorithm

19/67 Example of encoding into chromosome

20/67 The process of GAME models evolution – Genetic Algorithm Encoding the GAME unit to genotype (first layer): GA (no niching) P trans.fn. All individuals are connected to the most significant feature after standard GA. P Inputs Type Other Select N best individuals Result: REDUNDANT information extracted P P P P

21/67 Better way is to use worse, but non-correlated units By using less significant features we get more additional information than by using several best individuals connected to the most significant feature! The outputs of units A and B are correlated combining them does not bring improvement in performance How to do it?

22/67 Canonical GAs Niching GAs Niching method - extends genetic algorithm to be able to locate multiple solutions. Promote the formation and maintenance of stable subpopulations in genetic algorithm. Niching Genetic Algorithms Mnoho metod pro zachování různorodosti jedinců (niching methods): Overspecification – dobré u dynamických systémů Ecological genetic algorithms – více prostředí Heterozygote advantage – diploidní genom, fitness zvýšit o dist. rodičů Crowding (restricted replacement) – nahrazení podobných jedinců v subpop. Restricted competition – turnaj kompatibilních jedinců (threshold) Fitness sharing – sharing methods – přelidníno –> hledám jinde Immune system models - antigeny

23/67 Niching GA – Deterministic Crowding Deterministic Crowding: S. W. Mahfoud (1995). Subpopulations (niches) are based on the distance of individuals. For units in the GAME network: distance of units in one population is based on the phenotypic difference of units.

24/67 Distance of GAME units

25/67 The process of GAME models evolution – GA with DC Niching GA (Deterministic Crowding) P trans.fn. P Inputs Type Other Select the best individual from each niche Result: MAXIMUM information extracted Encoding the GAME unit to genotype (first layer): Individuals connected to less significant features survived. We gained more info. P S P

26/67 Experimental Results: Standard GA versus GA+Deterministic Crowding P P P P P S P Single uniform populationThree subpopul.(niches) Units connected to the same input belongs to one niche.

27/67 Experimental Results: Standard GA versus GA+Deterministic Crowding The statistical test proved that on the level of significance 95%, the GA+DC performs better than simple GA for the energy and hot water consumption. Results:

28/67 Experimental Results: Standard GA versus GA+Deterministic Crowding Except the models of the fire radius attribute, the performance of all other models is significantly better with Deterministic Crowding enabled. Average RMS error of 20 models Small simple dataset Simple GA GA+DC

29/67 Results: the best way is to combine genotypic and phenotypic distance

30/67 Distance of units can be monitored during the training phase

31/67 Another visualization of diversity

32/67 GA or niching GA? Genetic search is more effective than exhaustive search The use of niching genetic algorithm (e.g. with Deterministic Crowding) is beneficial: – More accurate models are generated – Feature ranking can be derived – Non-correlated units (active neurons) can be evolved The crossover of active neurons in transfer function brings further improvements

33/67 Remember? Units with various transfer functions competes in the niching GA

34/67 Natural selection of units really works! The best performance is achieved when units of all types are competing in the construction process.

35/67 Optimization methods available in GAME

36/67 Experimental results of competing training methods on Building data set RMS error on testing data sets (Building data) averaged over 5 runs

37/67 RMS error on the Boston data set

38/67 Classification accuracy [%] on the Spiral data set

39/67 Evaluation on diverse data sets What is it All?

40/67 Remember the Genetic algorithm optimizing the structure of GAME?

41/67 Outline of the talk … Introduction, FAKE GAME framework, architecture of GAME GAME – niching genetic algorithm GAME – ensemble methods –Trying to improve accuracy GAME – ensemble methods –Models’ quality –Credibility estimation

42/67 Ensemble models – why to use it? Hot topic in machine learning communities Successfully applied in several areas such as: –Face recognition [Gutta96, Huang00] –Optical character recognition [Drucker93, Mao98] –Scientific image analysis [Cherkauer96] –Medical diagnosis [Cunningham00,Zhou02] –Seismic signal classification [Shimshoni98] –Drug discovery [Langdon02] –Feature extraction [Brown02]

43/67 Ensemble models – what is it? The collection of models (e.g. neural networks) is trained for the same task. Outputs of models are then combined. Model 1Model 2Model 3Model N Input variables Output variable Ensemble output combination … Output variable

44/67 Bagging Sample with replacement several training sets of size n (instead of having just one training set of size n). Build model/classifier for each training set. Combine classifier’s predictions. Training data Sample 1 Sample 2 Sample M... Learning algorithm Learning algorithm Learning algorithm Model 1 (classifier) Model 2 (classifier) Model M (classifier) Sampling with replacement Ensemble model (classifier) output Averaging or voting

45/67 We employed Bagging in GAME G roup of A daptive M odels E volution is a method we are working on in our department (see next slide). Training data Sample 1 Sample 2 Sample M... GAME model 1 GAME model 2 GAME model M Sampling with replacement GAME ensemble output Averaging or voting

46/67 Results of Bagging GAME models Weighted ensemble of GAME models The weighted ensemble has apparent tendency to overfitt the data. While its performance is superior on the training and validation data, on the testing data there are several individual models performing better.

47/67 GAME ensembles are not statistically significantly better than the best individual GAME model! Why? Are GAME models diverse? –Vary input data  (bagging) –Vary input features  (using subset of features) –Vary initial parameters  (random initialization of weights) –Vary model architecture  (heterogeneous units used) –Vary training algorithm  (heterogeneous units used) –Use a stochastic method when building model  (niching GA) Yes, they use several methods to promote diversity!

48/67 Ensemble models – why it works? Bias-variance decomposition (theoretical tool to study how a training data affects the performance of models/classifiers) Total expected error of model = variance + bias variance: the part of the expected error due to the nature of the training set bias: the part of the expected error caused by the fact the model is not perfect G.Brown: Diversity in Neural networks ensembles, 2004 Ensembling reduces varianceEnsembling reduces bias training data model 1 training data model 2 model 1 model 2 ensemble model

49/67 GAME ensembles are not statistically significantly better than the best individual GAME model! Why? GAME models grows until optimal complexity (grows from the minimal form up to the required complexity) Therefore bias cannot be further reduced Bagging slightly reduces variance, but not significantly (for datasets we have been experimented with) Ensembling reduces bias … model 1 model 2 ensemble model … just for non-optimal models model 1 model 2 ensemble model model 3

50/67 Results of Bagging GAME models Simple ensemble of GAME models Is the ensemble is significantly better than any of individual models? Suboptimal GAME models YESOptimal GAME models NO

51/67 Outline of the talk … Introduction, FAKE GAME framework, architecture of GAME GAME – niching genetic algorithm GAME – ensemble methods –Trying to improve accuracy GAME – ensemble methods –Models’ quality –Credibility estimation

52/67 Ensembles can estimate credibility We found out that ensembling GAME models does not further improve accuracy of modelling. Why to use ensemble then? There is one big advantage that ensemble models can provide: Using ensemble of models we can estimate credibility for any input vector. Ensembles are starting to be used in this manner in several real world applications:

53/67 Visualization of model’s behavior GAME

54/67 Estimating credibility using ensemble of GAME models GAME 1 2 … n While x2 is varying from min to max and other inputs stay constant, the output of the model shows sensitivity of the model to variable x2 in the configuration x1, x3 = const. In regions, where we have enough data, all models from ensemble have compromise response – they are credible. When responses of models differ then models are not credible (dark background). We need to know when the response of a model is based on training data and when it is just random guessing – then we can estimate the credibility of the model.

55/67 Artificial data set – partial definition Credibility: the criterion is a dispersion of models` responses. Advantages: No need of the training data set, Modeling method success considered, Inputs importance considered.

56/67 Practical example – Nuclear data A.G. Ivakhnenko – experiments with GMDH Extremely sparse data set – 6 dimensions, 10 data vectors

57/67 Typical application of softcomputing methods in industry: Building data set Model of hot water consumpt. Model of energy consumption Model of cold water consumpt. Black box Looking for the optimal configuration of the GAME simulator – Pavel Staněk GAME ensemble

58/67 The data set is problem A of the "Great energy predictor shootout" contest. "The Great Energy Predictor Shootout" - The First Building Data Analysis And Prediction Competition; ASHRAE Meeting; Denver, Colorado; June, 1993; Building data set

59/67 Credibility of models in classification

60/67 Credibility of models in classification BeforeAfter

61/67 Other visualization techniques

62/67 Automated retrieval of 3D plots showing interesting behaviour Genetic Algorithm Genetic algorithm with special fitness function is used to adjust all other inputs (dimensions)

63/67 Conclusion New quality in modeling: GAME models adapt to the data set complexity. Hybrid models more accurate than models with single type of unit Compromise response of the ensemble indicates valid system states. Black-box model gives estimation of its accuracy (2,3 ± 0,1) Automated retrieval of “interesting plots” of system behavior Java application developed – real time simulation of models

64/67 Classification of very complex data set Benchmarking Spiral data classification problem solved by GAME Note: inner crossvalidation mechanism to prevent overfitting was disabled for this problem Benchmarks of GAME

65/67 Internet advertisements database BackProp best result: 87.97% GAME best result: 93.13%

66/67 Other benchmarking problems Classification accuracy[%] Pima indians data set: 10fold crossvalidation results: Stock value prediction (S&P DEP RECEIPTS): Change in direction prediction ratio: Backpropagation with momentum: 82.4%; GAME: 91.3% Zoo database: Accuracy of classification into genotypes: Backpropagation: 80.77%; GAME: 93.5%

67/67 Thank you!