Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

Slides:

Advertisements

Similar presentations

An Introduction to Boosting Yoav Freund Banter Inc.

Advertisements

Ensemble Learning Reading: R. Schapire, A brief introduction to boosting.

On-line learning and Boosting

Lectures 17,18 – Boosting and Additive Trees Rice ECE697 Farinaz Koushanfar Fall 2006.

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Boosting Approach to ML

Boosting Ashok Veeraraghavan. Boosting Methods Combine many weak classifiers to produce a committee. Resembles Bagging and other committee based methods.

Paper by Yoav Freund and Robert E. Schapire

FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.

CMPUT 466/551 Principal Source: CMU

Longin Jan Latecki Temple University

Introduction to Boosting Slides Adapted from Che Wanxiang( 车万翔 ) at HIT, and Robin Dhamankar of Many thanks!

Machine Learning Week 2 Lecture 2.

Sparse vs. Ensemble Approaches to Supervised Learning

Ensemble Learning what is an ensemble? why use an ensemble?

2D1431 Machine Learning Boosting.

A Brief Introduction to Adaboost

Ensemble Learning: An Introduction

1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

Machine Learning: Ensemble Methods

Sparse vs. Ensemble Approaches to Supervised Learning

Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.

Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Ensemble Learning (2), Tree and Forest

Radial Basis Function Networks

Online Learning Algorithms

Machine Learning CS 165B Spring 2012

Issues with Data Mining

Chapter 10 Boosting May 6, Outline Adaboost Ensemble point-view of Boosting Boosting Trees Supervised Learning Methods.

CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:

The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.

CS 391L: Machine Learning: Ensembles

Benk Erika Kelemen Zsolt

Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

BOOSTING David Kauchak CS451 – Fall Admin Final project.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.

1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.

Ensemble Learning (1) Boosting Adaboost Boosting is an additive model

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.

Learning with AdaBoost

E NSEMBLE L EARNING : A DA B OOST Jianping Fan Dept of Computer Science UNC-Charlotte.

Linear Models for Classification

CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.

Machine Learning Concept Learning General-to Specific Ordering

Ensemble Methods in Machine Learning

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Boosting ---one of combining models Xin Li Machine Learning Course.

CSE 446: Ensemble Learning Winter 2012 Daniel Weld Slides adapted from Tom Dietterich, Luke Zettlemoyer, Carlos Guestrin, Nick Kushmerick, Padraig Cunningham.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

By Subhasis Dasgupta Asst Professor Praxis Business School, Kolkata Classification Modeling Decision Tree (Part 2)

Machine Learning: Ensemble Methods

Boosting and Additive Trees (2)

Boosting and Additive Trees

Ensemble Learning Introduction to Machine Learning and Data Mining, Carla Brodley.

ECE 5424: Introduction to Machine Learning

Introduction to Boosting

Implementing AdaBoost

Ensemble learning.

Model Combination.

Ensemble learning Reminder - Bagging of Trees Random Forest

Support Vector Machines 2

Presentation transcript:

Games of Prediction or Things get simpler as Yoav Freund Banter Inc.

The Chip game Game ends when this is empty

Plan of the talk Why is the chip game is interesting? Letting the number of chips go to infinity. The boosting game. Drifting games (let chip=sheep). Letting the number of rounds go to infinity Brownian motion and generalized boosting (nice pictures!)

Combining expert advice Binary sequence:1,0,0,1,1,0,? N experts make predictions: expert 1:1,0,1,0,1,1,1 expert 2:0,0,0,1,1,0,1 … expert N:1,0,0,1,1,1,0 Algorithm makes prediction:1,0,1,1,0,1,1 Assumption: there exists an expert which makes at most k mistakes Goal: make least mistakes under the assumption (no statistics!) Chip = expert, bin = number of mistakes made Game step = algorithm’s prediction is incorrect.

Chip game for expert advice Experts predicting 1 Experts predicting 0 No. of mistakes Made by expert 0 1 KK+1 Game ends when there is one chip here (rest have fallen off)

21 questions with k lies Player 1 chooses secret x from S1,S2,…,SN Player 2 asks “is x in the set A?” Player 1 can lie k times. Game ends when player 2 can determine x. A smart player 1 aims to have more than one consistent x for as long as possible. Chip = element (Si) Bin = number of lies made regarding element

Chip game for 21 questions element not in A element in A No. of lies made assuming element is the secret 0 1 KK+1 Game ends when there is only one chip on this side

Simple case – all chips in bin 1 21 questions without lies Combining experts where one is perfect Splitting a cookie Optimal strategies: –Player 1: split chips into two equal parts –Player 2: choose larger part Note problem when number of chips is odd

Number of chips to infinity Replace individual chips by chip mass Optimal splitter strategy: Split each bin into two equal parts

Binomial weights strategies What should chooser do if parts are not equal? Assume that on following iterations splitter will play optimally = split each bin into 2 equal parts. Future configuration independent of chooser’s choice Potential: Fraction of bin i that will remain in bins 1..k when m iterations remain Choose part so that next configuration will have maximal (or minimal) potential. Solve for m:

Optimality of strategy If the chips are infinitely divisible then the solution is min/max optimal [Spencer95] Enough for optimality if number of chips is at least

Equivalence to a random walk Both sides playing optimally is equivalent to each chip performing an independent random walk. Potential = probability of a chip in bin i ending in bins 1..K after m iteration Weight = difference between the potentials of a chip in its two possible locations on the following iteration. Chooser’s optimal strategy: choose set with smaller (larger) weight

Example potential and weight m=5; k=10

Boosting A method for improving classifier accuracy Weak Learner: a learning algorithm generating rules slightly better than random guessing. Basic idea: re-weight training examples to force weak learner into different parts of the space. Combine weak rules by a majority vote.

Boosting as a chip game Chip = training example Booster assigns a weight to each chip, weights sum to 1. Learner selects a subset with weight –Selected set moves a step right (correct) –Unselected set moves a step left (incorrect) Booster wins examples on right of origin. –Implies that majority vote is correct.

The boosting chip game h(x) = y h(x) <> y.2.1 Majority vote is correct

Binomial strategies for boosting Learner moves chip right independently with probability ½+  The potential of example that is correctly classified r times after i out of k iterations Booster assigns to each example a weight proportional to the difference between its possible next-step potentials.

Drifting games (in 2d) Target region

Target region Drifting games (in 2d)

The allowable steps B = the set of all allowable step Normal B = minimal set that spans the space. (~basis) Regular B = a symmetric regular set. (~orthonormal basis) Regular step set in R 2

The min/max solution A potential defined by a min/max recursion Shepherd’s strategy [Schapire99]

Increasing the number of steps Consider T steps in a unit time Drift  should scale like 1/T Step size O(1/T) gives game to shepherd Step size keeps game balanced

The solution when The local slope becomes the gradient The recursion becomes a PDE Same PDE describes time development of Brownian motion with drift

Special Cases Target func. = PDE boundary condition at time=1. Target func. = exponential –potential and weight are exponential at all times –Adaboost. –Exponential weights algs for online learning. Target func. = step function –Potential is the error function, weight is the normal distribution. –Potential and weight change with time. –Brownboost.

Potential applications Generalized boosting –Classification for >2 labels –Regression –Density estimation –Variational function fitting Generalized online learning –Continuous predictions (instead of binary)

Boosting for variational optimization Minimize f(x) x y