CS639: Data Management for Data Science

Slides:

Advertisements

Similar presentations

Boosting Rong Jin.

Advertisements

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005

An Introduction to Ensemble Methods Boosting, Bagging, Random Forests and More Yisong Yue.

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning what is an ensemble? why use an ensemble?

A Brief Introduction to Adaboost

Ensemble Learning: An Introduction

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Data mining and statistical learning - lecture 13 Separating hyperplane.

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning (2), Tree and Forest

For Better Accuracy Eick: Ensemble Learning

Machine Learning CS 165B Spring 2012

Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

LOGO Ensemble Learning Lecturer: Dr. Bo Yuan

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 9: Decision Trees, Bagging & Random Forests 1.

CS Fall 2015 (© Jude Shavlik), Lecture 7, Week 3

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

CS Ensembles1 Ensembles. 2 A “Holy Grail” of Machine Learning Automated Learner Just a Data Set or just an explanation of the problem Hypothesis.

CLASSIFICATION: Ensemble Methods

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.

Konstantina Christakopoulou Liang Zeng Group G21

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Regression Tree Ensembles Sergey Bakin. Problem Formulation §Training data set of N data points (x i,y i ), 1,…,N. §x are predictor variables (P-dimensional.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Web-Mining Agents: Classification with Ensemble Methods R. Möller Institute of Information Systems University of Luebeck.

Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.

Ensemble Classifiers.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Einführung in Web- und Data-Science Ensemble Learning

Machine Learning: Ensemble Methods

Introduction to Machine Learning

JMP Discovery Summit 2016 Janet Alvarado

Reading: R. Schapire, A brief introduction to boosting

Bagging and Random Forests

Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN

Eco 6380 Predictive Analytics For Economists Spring 2016

Chapter 13 – Ensembles and Uplift

Boosting and Additive Trees (2)

COMP61011 : Machine Learning Ensemble Models

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

Basic machine learning background with Python scikit-learn

Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007

ECE 5424: Introduction to Machine Learning

Ungraded quiz Unit 6.

A “Holy Grail” of Machine Learing

Introduction to Data Mining, 2nd Edition

Lecture 18: Bagging and Boosting

Lecture 1: Introduction to Machine Learning Methods

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Multiple Decision Trees ISQS7342

Overfitting and Underfitting

Ensemble learning.

Lecture 06: Bagging and Boosting

Model Combination.

Ensemble learning Reminder - Bagging of Trees Random Forest

Model generalization Brief summary of methods

Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector classifier 1 classifier 2 classifier.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

CS639: Data Management for Data Science Lecture 19: Unsupervised Learning/Ensemble Learning Theodoros Rekatsinas

Today Unsupervised Learning/Clustering K-Means Ensembles and Gradient Boosting

What is clustering?

What do we need for clustering?

Distance (dissimilarity) measures

Cluster evaluation (a hard problem)

How many clusters?

Clustering techniques

Clustering techniques

Clustering techniques

K-means

K-means algorithm

K-means convergence (stopping criterion)

K-means clustering example: step 1

K-means clustering example: step 2

K-means clustering example: step 3

K-means clustering example

K-means clustering example

K-means clustering example

Why use K-means?

Weaknesses of K-means

Outliers

Dealing with outliers

Sensitivity to initial seeds

K-means summary

Tons of clustering techniques

Summary: clustering

Ensemble learning: Fighting the Bias/Variance Tradeoff

Ensemble methods Combine different models together to Minimize variance Bagging Random Forests Minimize bias Functional Gradient Descent Boosting Ensemble Selection

Ensemble methods Combine different models together to Minimize variance Bagging Random Forests Minimize bias Functional Gradient Descent Boosting Ensemble Selection

Bagging Goal: reduce variance Ideal setting: many training sets S’ P(x,y) S’ Goal: reduce variance Ideal setting: many training sets S’ Train model using each S’ Average predictions sampled independently Variance reduces linearly Bias unchanged ES[(h(x|S) - y)2] = ES[(Z-ž)2] + ž2 Z = h(x|S) – y ž = ES[Z] Expected Error Variance Bias “Bagging Predictors” [Leo Breiman, 1994] http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf

Bagging Goal: reduce variance S S’ Goal: reduce variance In practice: resample S’ with replacement Train model using each S’ Average predictions from S Variance reduces sub-linearly (Because S’ are correlated) Bias often increases slightly ES[(h(x|S) - y)2] = ES[(Z-ž)2] + ž2 Z = h(x|S) – y ž = ES[Z] Expected Error Variance Bias Bagging = Bootstrap Aggregation “Bagging Predictors” [Leo Breiman, 1994] http://statistics.berkeley.edu/sites/default/files/tech-reports/421.pdf

Random Forests Goal: reduce variance Bagging can only do so much Resampling training data asymptotes Random Forests: sample data & features! Sample S’ Train DT At each node, sample features (sqrt) Average predictions Further de-correlates trees “Random Forests – Random Features” [Leo Breiman, 1997] http://oz.berkeley.edu/~breiman/random-forests.pdf

Ensemble methods Combine different models together to Minimize variance Bagging Random Forests Minimize bias Functional Gradient Descent Boosting Ensemble Selection

… Gradient Boosting h(x) = h1(x) + h2(x) + … + hn(x) S’ = {(x,y)} S’ = {(x,y-h1(x))} S’ = {(x,y-h1(x) - … - hn-1(x))} … h1(x) h2(x) hn(x)

… Boosting (AdaBoost) h(x) = a1h1(x) + a2h2(x) + … + a3hn(x) S’ = {(x,y,u1)} S’ = {(x,y,u2)} S’ = {(x,y,u3))} … h1(x) h2(x) hn(x) u – weighting on data points a – weight of linear combination Stop when validation performance plateaus (will discuss later) https://www.cs.princeton.edu/~schapire/papers/explaining-adaboost.pdf

Ensemble Selection H = {2000 models trained using S’} Training S’ H = {2000 models trained using S’} Validation V’ S Maintain ensemble model as combination of H: h(x) = h1(x) + h2(x) + … + hn(x) + hn+1(x) Denote as hn+1 Add model from H that maximizes performance on V’ Repeat Models are trained on S’ Ensemble built to optimize V’ “Ensemble Selection from Libraries of Models” Caruana, Niculescu-Mizil, Crew & Ksikes, ICML 2004

Summary Method Minimize Bias? Minimize Variance? Other Comments Bagging Complex model class. (Deep DTs) Bootstrap aggregation (resampling training data) Does not work for simple models. Random Forests Complex model class. (Deep DTs) Bootstrap aggregation + bootstrapping features Only for decision trees. Gradient Boosting (AdaBoost) Optimize training performance. Simple model class. (Shallow DTs) Determines which model to add at run-time. Ensemble Selection Optimize validation performance Pre-specified dictionary of models learned on training set.