Neuro-Computing Lecture 5 Committee Machine

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.

Data Mining Classification: Alternative Techniques

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

2806 Neural Computation Committee Machines Lecture Ari Visa.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning what is an ensemble? why use an ensemble?

Ensemble Learning: An Introduction

Three kinds of learning

Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Introduction to machine learning

Radial Basis Function Networks

For Better Accuracy Eick: Ensemble Learning

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Methods: Bagging and Boosting

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

CLASSIFICATION: Ensemble Methods

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Ensemble Methods in Machine Learning

Classification Ensemble Methods 1

COMP24111: Machine Learning Ensemble Models Gavin Brown

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Intro. ANN & Fuzzy Systems Lecture 13. MLP (V): Speed Up Learning.

Classification and Regression Trees

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Machine Learning: Ensemble Methods

Today’s Lecture Neural networks Training

CS 9633 Machine Learning Support Vector Machines

Chapter 7. Classification and Prediction

Data Mining Practical Machine Learning Tools and Techniques

Alan D. Mead, Talent Algorithms Inc. Jialin Huang, Amazon

Trees, bagging, boosting, and stacking

COMP61011 : Machine Learning Ensemble Models

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

ECE 5424: Introduction to Machine Learning

A “Holy Grail” of Machine Learing

INTRODUCTION TO Machine Learning

Combining Base Learners

Data Mining Practical Machine Learning Tools and Techniques

Introduction to Data Mining, 2nd Edition

Neuro-Computing Lecture 4 Radial Basis Function Network

Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Ensemble learning.

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Ensemble learning Reminder - Bagging of Trees Random Forest

Model generalization Brief summary of methods

INTRODUCTION TO Machine Learning 3rd Edition

Evolutionary Ensembles with Negative Correlation Learning

Support Vector Machines 2

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

Neuro-Computing Lecture 5 Committee Machine

Suggested Reading Neural Networks Ensembles Haykin: Chapter 7 - Kuncheva: Combining Pattern Classifiers Methods and Algorithms, Wiley, 2004.

Pattern Classification Processing Feature extraction Classification Fork spoon

The power of Parliament - Many èxperts' arguing about the same problem - The èxperts' can locally disagree, but overall, all of them try to solve the same global problem - The collective answer is usually more stable to fluctuations in intelligence/emotions/biases of individual èxperts‘ - There exist many types of formalizations of this idea.

Combining neural classifiers

Averaging Results: Mean Error for Each Network Suppose we have L trained experts with outputs yi(x) for a regression problem to approximate h(x) each with an error of ei. Then we can write: Thus the sum of squares error for network yi is: Where x[.] denotes the expectation (average or mean value). Thus the average error for the networks acting individually is:

Averaging Results: Mean Error for Committee Suppose instead we form a committee by averaging the outputs yi to get the committee prediction: This estimate will have error: Thus, by Cauchy’s inequality: Indeed, if the errors are uncorrelated ECOM = EAV / L but this is unlikely in practice as errors tend to be correlated

Methods for constructing Diverse Neural Classifiers Different learning machines Different representation of patterns Partitioning the training set Different labeling in learning

Methods for constructing Diverse Neural Classifiers Different learning machines Methods for constructing Diverse Neural Classifiers 1.Using different learning algorithms. For example we can use multi-layer perceptron, radial basis function network and decision trees in an ensemble to make a composite system (Kittler et al. 1997). 2.Using certain algorithms with different complexity. For example using networks with different number of nodes and layers or nearest neighbour classifiers with different number of prototypes. 3. Using different learning parameters. For example in an MLP, different initial weights, number of epochs, learning rate and momentum or cost function can affect the generalization of the network (Windeatt and Ghaderi 1998).

Methods for constructing Diverse Neural Classifiers Different representation of patterns Methods for constructing Diverse Neural Classifiers 1. Using different feature sets or rule sets. This method is useful particularly when more than one source of information is available (Kittler et al. 1997). 2. Using decimation techniques Even when only one feature set exists, we can produce different feature sets (Turner and Ghosh 1996) to be used in training the different classifiers by removing different parts of this set. 3. Feature set partitioning in a multi-net system This method can be useful if patterns include parts which are independent. For example different parts of an identification form or a human face. In this case, patterns can be divided into sub-patterns each one can be used to train a classifier. One of theoretical properties of neural networks is the fact that they do not need special feature extraction for classification. Feature sets can be the same real valued measurements (like gray levels in image processing). The number of these measurements can be high so if we apply all of them to a single network, the curse of dimensionality will occur. To avoid this problem they can be divided into parts each one used as the input of a sub-network (Rowley 1999, Rowley et al. 1998) which are independent.

Methods for constructing Diverse Neural Classifiers Partitioning the training set Methods for constructing Diverse Neural Classifiers 1. Random partitioning such as: •Cross validation, (Friedman 1994) is a statistical method in which the training set is divided into k subsets of approximately equal size. A network is trained k times. Each time one of subsets is "left-out", so changing the "left-out" subset we will have k different trained networks. If k equals the number of samples, it is called " leave-one-out" •Bootstrapping, in which the mechanism is essentially the same as cross validation, but subsets (patterns) are chosen randomly with replacement (Jain et al. 1987, Efron 1983). This method has been used in the well known technique of Bagging (Breiman 1999, Bauer and Kohavi 1999, Maclin and Optiz 1997, Quinlan 1996). It has yielded significant improvement in many real problems. 2.Partitioning based on spatial similarities Mixtures of Experts is based on the divide and conquer strategy (Jacobs 1995, Jacobs et al. 1991), in which the training set is partitioned according to the spa¬tial similarity rather than random partitioning in the former methods. During training, different classifiers (experts) try to deal with the different parts of the input space. There is a competitive gating mechanism that localizes the experts, using a gating network.

Methods for constructing Diverse Neural Classifiers Partitioning the training set Methods for constructing Diverse Neural Classifiers 3 . Partitioning based on experts' ability •Boosting is a general technique for constructing a classifier with good performance and small error from some individual classifiers which have performance just a little better than free guess (Drucker 1993). a. Boosting by filtering. This approach involves filtering the training examples by different versions of a weak learning algorithm. It assumes the availability of a large (in theory, infinite) source of examples, with the examples being either discarded or kept during training. An advantage of this approach is that it allows for a small memory requirement compared to the other two approaches. b. Boosting by subsampling. This second approach works with a training sample of fixed size. The examples are "resampled according to a given probability distribution during training. The error is calculated with respect to the fixed training sample. c. Boosting by reweighting. This third approach also works with a fixed training sample, but it assumes that the weak learning algorithm can receive "weighted“ examples.The error is calculated with respect to the weighted examples.

Methods for constructing Diverse Neural Classifiers Different labeling in learning If we redefine the classes for example by giving the same label to a group of classes, the classifiers which are made by different defining of classes can generalize diversely. Methods like Error Correction Output Codes, ECOC, (Dietterich 1991, Dietterich and Bakiri 1991), binary clustering (Wilson 1996) and pair wise coupling (HaKtie and Tibsliirani 1996) use this technique.

Combination strategies Static combiners Dynamic combiners

Combination strategies Static combiners Dynamic combiners Voting, Sum, Product, Max, Min,Averaging Borda Counts Non-trainable Trainable Weighted Averaging {DP-DT, Genetic,…} Stacked Generalization Mixture of Experts

Stacked Generalization Wolpert (1992) - In stacked generalization, the output pattern of an ensemble of trained experts serves as an input to a second-level expert

Mixture of Experts Jacobs et al. (1991)

Combining classifiers for face recognition Zhao, Huang, Sun (2004) Pattern Recognition Letters

Combining classifiers for face recognition Jang et al (2004) LNCS

Combining classifiers for view-independent face recognition Kim, Kittler (2006) IEEE Trans. Circuits and Systems for Video Technology