ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Slides:

Advertisements

Similar presentations

Combining Multiple Learners Ethem Chp. 15 Haykin Chp. 7, pp

Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

INTRODUCTION TO Machine Learning 2nd Edition

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

INTRODUCTION TO Machine Learning 3rd Edition

Longin Jan Latecki Temple University

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 2nd Edition

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning what is an ensemble? why use an ensemble?

Ensemble Learning: An Introduction

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Machine Learning: Ensemble Methods

Sparse vs. Ensemble Approaches to Supervised Learning

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Rotation Forest: A New Classifier Ensemble Method 交通大學電子所蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.

For Better Accuracy Eick: Ensemble Learning

Ensembles of Classifiers Evgueni Smirnov

Machine Learning CS 165B Spring 2012

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

CS 391L: Machine Learning: Ensembles

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Data Mining - Volinsky Columbia University 1 Topic 10 - Ensemble Methods.

Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Methods: Bagging and Boosting

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Ensemble Methods.  “No free lunch theorem” Wolpert and Macready 1995.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Ensemble Methods in Machine Learning

Classification Ensemble Methods 1

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Boosting ---one of combining models Xin Li Machine Learning Course.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Neuro-Computing Lecture 5 Committee Machine

A “Holy Grail” of Machine Learing

INTRODUCTION TO Machine Learning

Combining Base Learners

Introduction to Data Mining, 2nd Edition

Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)

Model Combination.

Ensemble learning Reminder - Bagging of Trees Random Forest

INTRODUCTION TO Machine Learning

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

Rationale No Free Lunch Theorem: There is no algorithm that is always the most accurate Generate a group of base-learners which when combined has higher accuracy Different learners use different Algorithms Hyperparameters Representations /Modalities/Views Training sets Subproblems Diversity vs accuracy 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Voting Linear combination Classification 4 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bayesian perspective: If d j are iid Bias does not change, variance decreases by L If dependent, error increase with positive correlation 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Fixed Combination Rules 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Error-Correcting Output Codes 7 K classes; L problems (Dietterich and Bakiri, 1995) Code matrix W codes classes in terms of learners One per class L=K Pairwise L=K(K-1)/2 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

8 Full code L=2 (K-1) -1 With reasonable L, find W such that the Hamming distance btw rows and columns are maximized. Voting scheme Subproblems may be more difficult than one-per-K Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Bagging Use bootstrapping to generate L training sets and train one base-learner with each (Breiman, 1996) Use voting (Average or median with regression) Unstable algorithms profit from bagging 9Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

AdaBoost Generate a sequence of base-learners each focusing on previous one’s errors (Freund and Schapire, 1996) 10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Voting where weights are input-dependent (gating) (Jacobs et al., 1991) Experts or gating can be nonlinear Mixture of Experts 11Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Stacking Combiner f () is another learner (Wolpert, 1992) 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Fine-Tuning an Ensemble Given an ensemble of dependent classifiers, do not use it as is, try to get independence 1. Subset selection: Forward (growing)/Backward (pruning) approaches to improve accuracy/diversity/independence 2. Train metaclassifiers: From the output of correlated classifiers, extract new combinations that are uncorrelated. Using PCA, we get “eigenlearners.” Similar to feature selection vs feature extraction 13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Cascading Use d j only if preceding ones are not confident Cascade learners in order of complexity 14Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Combining Multiple Sources Early integration: Concat all features and train a single learner Late integration: With each feature set, train one learner, then either use a fixed rule or stacking to combine decisions Intermediate integration: With each feature set, calculate a kernel, then use a single SVM with multiple kernels Combining features vs decisions vs kernels 15Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)