Study on Ensemble Learning By Feng Zhou. Content Introduction A Statistical View of M3 Network Future Works.

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

Is Random Model Better? -On its accuracy and efficiency-
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
CS188: Computational Models of Human Behavior
Clustering II.
Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
Data Mining Classification: Basic Concepts,
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Basics of Statistical Estimation
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Albert Gatt Corpora and Statistical Methods Lecture 13.
ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
Detecting Faces in Images: A Survey
Data Mining Classification: Alternative Techniques
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Supervised Learning Recap
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Lecture 5 (Classification with Decision Trees)
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Bayesian Learning Rong Jin.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Thanks to Nir Friedman, HU
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Bayesian Networks. Male brain wiring Female brain wiring.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Benk Erika Kelemen Zsolt
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Classification Techniques: Bayesian Classification
Slides for “Data Mining” by I. H. Witten and E. Frank.
Biointelligence Laboratory, Seoul National University
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Lecture Notes for Chapter 4 Introduction to Data Mining
NTU & MSRA Ming-Feng Tsai
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Data Mining Practical Machine Learning Tools and Techniques
CS 9633 Machine Learning Support Vector Machines
Who am I? Work in Probabilistic Machine Learning Like to teach 
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Learning Coordination Classifiers
Basic machine learning background with Python scikit-learn
Neuro-Computing Lecture 5 Committee Machine
Data Mining Practical Machine Learning Tools and Techniques
Revision (Part II) Ke Chen
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Revision (Part II) Ke Chen
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
The Naïve Bayes (NB) Classifier
Prepared by: Mahmoud Rafeek Al-Farra
Text Categorization Berlin Chen 2003 Reference:
Neural networks (3) Regularization Autoencoder
A task of induction to find patterns
A task of induction to find patterns
Optimization under Uncertainty
Presentation transcript:

Study on Ensemble Learning By Feng Zhou

Content Introduction A Statistical View of M3 Network Future Works

Introduction Ensemble learning: – To combine a group of classifiers rather than to design a new one. – The decisions of multiple hypotheses are combined to produce more accurate results. Problems in traditional learning algorithms – Statistical Problem – Computational Problem – Representation Problem Related Works – Resampling techniques: Bagging, Boosting – Approaches for extending to multi-class problem: One-vs-One, One-vs-All.

Min-Max-Modular (M 3 ) Network (Lu, IEEE TNN 1999) Steps – Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005) – Training pair-wise classifiers – Integrating the outcomes (Zhao, IJCNN 2005) Min process Max process

A Statistical View Assumption – The pair-wise classifier outputs a probabilistic value. Sigmoid function (J.C. Platt, ALMC 1999): Bayesian decision theory

A Simple Discrete Example P(w|x) W+W+ W-W- X1X1 1/2 X2X2 2/5 X3X3 X4X4 1/5

A Simple Discrete Example (II) Classifier 1 (w + :w 1 - ) Classifier 2 (w + :w 2 - ) P c0 (w + |x=x 2 ) = 1/3 P c1 (w + |x=x 2 ) = 1/2 P c2 (w + |x=x 2 ) = 1/2 Classifier 0 (w + :w - ) P c0 < min(P c1,P c2 )

A More Complicated Example When consider a new more classifier, the evidence that x belong to w + is getting shrinking. P global (w + ) < min(P partial (w + )) The one reporting the minimum value contains the most information about w - (Minimization principle) If P partial (w + )=1, no information about w - is contained. Classifier 1 (w + :w 1 - )Classifier 2 (w + :w 2 - ) …… Information about w - is increasing

Analysis For each classifier c ij For each sub-positive class w i + For positive class w +

Analysis (II) Decomposition of a complex problem Restoration to the original resoluation

Composition of Training Sets w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Have been used Trivial set, useless Not used yet

Another Way of Combination w+w+ w-w- w1+w1+ …w n+ + w1-w1- …w n- - w+w+ w1+w1+ … w n+ + w-w- w1-w1- … w n- - Training and testing Time:

Experiments - Synthesis Data

Experiments – Text Categorization (20 Newsgroup copus) Experiments Setup Removing words : stemming stop words < 30 Using Naïve Bayes as the elementary classifier Estimating the probability with a sigmod function

Future Work Situation with consideration of noise – The virtue of the problem: To access the underlying distribution – Independent parameters for the model: – Constraints we get: – To obtain the best estimation. Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

References [1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann Statist [2] J. C. Platt, (Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, ALMC 1999 [3] B. Lu &, Task decomposition and module combination based on class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999 [4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005 [5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min- max modular classifier, IJCNN 2005 [6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006