On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
On-line learning and Boosting
Lecturer: Moni Naor Algorithmic Game Theory Uri Feige Robi Krauthgamer Moni Naor Lecture 8: Regret Minimization.
BOOSTING & ADABOOST Lecturer: Yishay Mansour Itay Dangoor.
Online learning, minimizing regret, and combining expert advice
Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.
Boosting Approach to ML
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Evaluation.
Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.
Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)
Online Algorithms – II Amrinder Arora Permalink:
2D1431 Machine Learning Boosting.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
Probably Approximately Correct Model (PAC)
Evaluation.
Ensemble Learning: An Introduction
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Machine Learning: Ensemble Methods
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
A New Linear-threshold Algorithm Anna Rapoport Lev Faivishevsky.
Machine learning Image source:
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Machine learning Image source:
Ensembles of Classifiers Evgueni Smirnov
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
SVM by Sequential Minimal Optimization (SMO)
Machine Learning Algorithms in Computational Learning Theory
The Multiplicative Weights Update Method Based on Arora, Hazan & Kale (2005) Mashor Housh Oded Cats Advanced simulation methods Prof. Rubinstein.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Machine Learning CSE 681 CH2 - Supervised Learning.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Benk Erika Kelemen Zsolt
Learning from observations
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Classification Techniques: Bayesian Classification
Linear Discrimination Reading: Chapter 2 of textbook.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Online Learning Yiling Chen. Machine Learning Use past observations to automatically learn to make better predictions or decisions in the future A large.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan Lecture 2, August 26 th 2010.
CSSE463: Image Recognition Day 33 This week This week Today: Classification by “boosting” Today: Classification by “boosting” Yoav Freund and Robert Schapire.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Kernels and Margins Maria Florina Balcan 10/13/2011.
Computational Learning Theory Part 1: Preliminaries 1.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Online Learning Model. Motivation Many situations involve online repeated decision making in an uncertain environment. Deciding how to invest your money.
Machine Learning: Ensemble Methods
Chapter 7. Classification and Prediction
Dan Roth Department of Computer and Information Science
Computational Learning Theory
Winnowing Algorithm CSL758 Instructors: Naveen Garg
Classification with Perceptrons Reading:
CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.
Machine Learning: Lecture 5
CS249: Neural Language Model
Presentation transcript:

On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI

Overview 1. Introduction: online learning vs. offline learning 2. Predicting from Expert Advice  Weighted Majority Algorithm: Simple Version  Weighted Majority Algorithm: Randomized Version 3. Mistake Bound Model  Learning a Concept Class C  Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm  Learning Decision List 4. Conclusion 5. Q & A 2

Intro to Machine Learning  Offline Learning  Online Learning 1 WALEED ABDULWAHAB YAHYA AL-GOBI

 Definition  More concrete Example: Task T: Prediction traffic patterns at a busy intersection. Experience E: Historical or past traffic pattern. Performance Measure P: Accuracy of predicting future traffic patterns.  Learned Model (i.e. Target function) y = h(x) 4 “ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E ” --- [Mitchell, 1997] Machine Learning | Definition

 Offline Learning:  Learning Phase the learning algorithm is trained on a pre-defined set of learning examples to create a hypothesis.  Testing phase The hypothesis will be used to find the accurate conclusion for a given new data.  Example: MRI brain images classification 5 Machine Learning | Offline Learning vs Online Learning Training Examples  Learning Algorithm  h(x) Image Features Learned model h(x) Training Images Training Labels Training

 Online Learning  As opposite to offline learning that finds the predictor h(x) on the entire training set at once.  Online learning algorithm is a common technique used in the areas of ML where it is computationally infeasible to train on the entire dataset all at once.  Online learning is a method of ML in which data becomes available in sequential order, and is used to update our predictor h(x) at each step. 6 Machine Learning | Offline Learning vs Online Learning

 Examples of Online Learning  Stock Price Prediction: Here the data is generated as a function of time so online learning can dynamically adopt to new patterns in the new data.  Spam Filtering: Here the data is generated based on the output of learning algorithm (Spam Detector) so online learning can dynamically adopt to new pattern to minimize our losses. 7 Machine Learning | Offline Learning vs Online Learning

 Online Learning: Example: Stock Price Prediction Training Examples  Learning Algorithm  h(x) … Training Examples  Learning Algorithm  h(x) 8 Stock prices Prediction Data Features Update hypothesis h(x) Time Receiving Truth Machine Learning | Offline Learning vs Online Learning

9 Offline LearningOnline Learning Two Phase Learning: How?Multi-phase Learning: How? Entire dataset given at onceOne Example given at time Learn the dataset to Construct target function h(x) Predict, Receive correct answer, Update target function h(x) at each step of learning Predict incoming new data Learning phase is separated from testing phase Learning phase is combined with testing phase Machine Learning | Offline Learning vs Online Learning

Predicting from Expert Advice  Basic Flow  An Example 2 WALEED ABDULWAHAB YAHYA AL-GOBI

Combining Expert Advice Prediction Truth Assumption: prediction ∈ {0, 1}. 11 Receiving prediction from experts Making its own prediction Being told the correct answer Predicting from Expert Advice | Basic Flow

 Task: predicting whether it will rain today.  Input: advices of n experts ∈ {1 (yes), 0 (no)}.  Output: 1 or 0.  Goal: make the least number of mistakes. Expert 1Expert 2Expert 3Truth 21 Jan Jan Jan Jan Jan Predicting from Expert Advice | An Example

The Weighted Majority Algorithm  Simple Version  Randomized Version 3 WALEED ABDULWAHAB YAHYA AL-GOBI

14 The Weighted Majority Algorithm

DateExpert AdviceWeight∑w i PredictionCorrect Answer x1x1 x2x2 x3x3 w1w1 w2w2 w3w3 (x i =0)(x i =1) 21 Jan Jan Jan Jan Jan The Weighted Majority Algorithm

 Proof:  Let M := # of mistakes made by Weight Majority algorithm. W := total weight of all experts (initially = n).  A mistaken prediction: At least ½ W predicted incorrectly.  In step 3 total weight reduced by a factor of ¼ (= ½ W x ½). W ≤ n(¾) M  Assuming the best expert made m mistakes. W ≥ ½ m  So, ½ m ≤ n(¾)M  M ≤ 2.41(m + lgn). 16 The Weighted Majority Algorithm

MUHAMMAD BURHAN HAFEZ Randomized Weighted Majority Algorithm (RWMA)  Simple Version  Randomized Version 4

18 M WMA ≤ 2.41 (m + lg n) Suppose n = 10, m = 20, and we run 100 prediction trials. M WMA = 56!!! Can we do better? The Randomized Weighted Majority Algorithm (RWMA)

19 1.View weights as probabilities. 1.Replace “multiply by ½” with “multiply by β”. Two modifications: The Randomized Weighted Majority Algorithm (RWMA)

20 The algorithm: 1.Initialize the weights w 1, …, w n of all experts to 1. 2.Given a set of predictions {x 1, …, x n } by experts, output x i with probability w i /W. 3.Receive the correct answer l and penalize each mistaken expert by multiplying its weight by β. Go-to 2. The Randomized Weighted Majority Algorithm (RWMA)

21 RWMA in action (β = ½ ): ExpertsE1E1 E2E2 E3E3 E4E4 E5E5 E6E6 prediction Correct answer Weights Advice Weights11½½½½ Advice Weights1½¼¼¼½ The Randomized Weighted Majority Algorithm (RWMA)

22     On the i th trial,  Define F i to be the fraction of the total weight on the wrong answers at i th trial. Say we have seen t examples. Let M be our expected # of mistakes so far, so Mistake bound:  The Randomized Weighted Majority Algorithm (RWMA)

x 23     x The Randomized Weighted Majority Algorithm (RWMA)

24 The relation between β and M: β M ¼1.85m ln (n) ½1.39m + 2 ln (n) ¾1.15m + 4 ln (n) When β = ½ The simple algorithmRWMA M ≤ 2.41m ln(n) M ≤ 1.39m + 2 ln(n) The Randomized Weighted Majority Algorithm (RWMA)

25 Other advantages of RWMA: 1.Consider the case where just only %51 of the experts were mistaken. In WMA, we directly use this majority and predict accordingly, resulting in a wrong prediction. In RWMA, there is still roughly a 50/50 chance that we’ll predict correctly. 2.Consider the case where predictions are strategies (cannot easily be combined together). In WMA, since all strategies are generally different, we cannot combine experts who predicted the same strategies. RWMA can be directly applied, because it doesn’t depend on summing the weights of experts who gave the same strategy to make a decision, but rather on the individual weights of experts The Randomized Weighted Majority Algorithm (RWMA)

Learning a Concept Class in Mistake Bound Model  A Concept Class  Mistake Bound Model  Definition of learning a class in Mistake Bound Model 5 KIM HYEONGCHEOL

What we covered so far …  Input : Yes/No from the “experts” Weather experts Question to experts : Will it rain tomorrow? Experts’ prediction : Yes/No  Output : The algorithm make a prediction as well Question to the algorithm : Will it rain tomorrow? Prediction : Yes/No  Penalization according to the correctness  Simple algorithm & Better randomized algorithm 27 Quick Review

# Questions WWhat is a concept class C? WWhat is Mistake Bound Model? WWhat do we mean by learning a concept class in Mistake Bound Model? 28 On line learning a concept class C Mistake Bound Model in Learn a Concept Class

29 * Disjunction : a ∨ b * Conjunction : a ∧ b A Concept Class C

 On-line learning  Iteration: The algorithm receives unlabeled example The algorithm predicts the label of the example The algorithm is then given the true label Penalization will be applied to the algorithm depending on correctness  Mistake Bound  The mistake made by the algorithm is bounded by M (ideally, we hope M is as small as possible) 30 Mistake Bound Model

Learning a Concept Class in Mistake Bound Model 31

Learning a Concept Class in Mistake Bound Model 32

 If the algorithm takes the assumption and the condition, we can say that, it learns class C in the mistake bound learning model  Especially, if the number of mistakes made is only poly(size(c)) ∙ polylog(n), the algorithm is robust to the presence of many additional irrelevant variables : attribute efficient 33 Learning a Concept Class in Mistake Bound Model

Some examples of learning classes in Mistake Bound Model MMonotone disjunctions SSimple algorithm TThe winnow algorithm DDecision list 34 Examples of Learnings

Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm 6 KIM HYEONGCHEOL

Learning Monotone disjunctions | Problem Definition 36

Simple Algorithm 37

 When the target concept c(x) = X 2 ∨ X 3 Red -> A mistake on negative example Green -> A correct prediction ‘n’ = 6 38 Hypothesis ‘h’ c(x) Negative examples Simple Algorithm | An example

 39 Simple Algorithm | An example

Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm 6 HE RUIDAN

 The simple algorithm learns the class of disjunctions with mistakes bound by n  Winnow algorithm : An algorithm with less mistakes 41 Learning the class of disjunctions | Winnow Algorithm

 Winnow Algorithm:  Each input vector x = {x1, x2, … xn}, xi ∈ {0, 1}  Assume the target function is the disjunction of r relevant variables. i.e. c(x) = xt1 V xt2 V … V xtr  Winnow algorithm provides a linear separator 42 Winnow Algorithm | Basic Concept

 Initialize: weights w 1 = w 2 = … = w n =1  Iterate:  Receive an example vector x = {x 1, x 2, … x n }  Predict: Output 1 if Output 0 otherwise  Get the true label  Update if making a mistake: Predict negative on positve example: for each x i = 1: w i = 2*w i Predict positive on negative example: for each x i = 1: w i = w i /2 43 Winnow Algorithm | Work Flow

 Theorem: The Winnow Algorithm learns the class of disjunctions in the Mistake Bound model, making at most 2+3r(1 + lgn) mistakes when the target concept is a disjunction of r variables  Attribute efficient: the # of mistakes is only poly(r). polylog(n)  Particularly good for learning where the number of relevant variables r is much less than the total number of variables n 44 Winnow Algorithm | Mistake Bound

 u: # of mistakes made on positive examples (output 0 while true result is 1)  v: # of mistakes made on negative examples (output 1 while true result is 0)  Proof 1: u <= r(1 + lgn)  Proof 2: v < 2(u + 1)  Therefore, # of total mistakes = u + v = 3u + 2, which is bounded by 2 + 3r(1 + lgn) 45 Winnow Algorithm | Proof of Mistake Bound

 u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function 46 Winnow Algorithm | Proof of Mistake Bound

 Any mistakes made on positive examples must double at least one of the weights in the target function  For an example X, h(X) = negative, c(X) = positive  c(X) = positive  at least one target variable is one in X  According to the algorithm, when hypothesis predicts positive as negative, the weights of variables equals to one in the example are doubled, therefore at least the weight of one target variable will be doubled. 47 Winnow Algorithm | Proof of Mistake Bound

 u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function  The weights of target variables will not be halved. 48 Winnow Algorithm | Proof of Mistake Bound

 The weights of target variables will not be halved.  According to the algorithm, only when h(X) = positive while c(X) = negative, the weights of variables equals to one in X will be halved.  c(X) = negative  no target variable is one in X  no target variables’ weight will be halved 49 Winnow Algorithm | Proof of Mistake Bound

 u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function  The weights of target variables will not be halved.  Each of the weights of target variables can be doubled at most 1 + lgn times. 50 Winnow Algorithm | Proof of Mistake Bound

 Each of the weights of target variables can be doubled at most 1 + lgn times  The weight of target variable could only be doubled, will never be halved.  When the weight of any target variable equals or larger than n, hypothesis will always predict positive, if the target variable is one.  Only when hypothesis predict negative on positive examples, the weights of variables equals to one will be doubled  if the hypothesis always predict positive, no weights will be doubled.  Therefore, the weight of any target variable cannot be doubled when it equals or larger than n  The weight of any target variable can be doubled at most 1+lgn times 51 Winnow Algorithm | Proof of Mistake Bound

 u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 1: u <= r(1+lgn)  Any mistakes made on positive examples must double at least one of the weights in the target function  Each of the weight of target variables will not be halved. As when any of the target variable is one, the example must not be negative  Each of the weights of target variables can be doubled at most 1 + lgn times as only weight less than n can be doubled.  Therefore, u <= r(1+lgn) since there are r variables in target function 52 Winnow Algorithm | Proof of Mistake Bound

 u: # of mistakes made on positive examples  v: # of mistakes made on negative examples  Proof 2: v < 2(u+1)  Initially, total weight W = n  Mistake on positive examples: W < W + n  Mistake on negative examples: W <= W – n/2  Therefore, W < n + un – v(n/2)  0 <= W < n + un – v(n/2)  v < 2(u + 1) 53 Winnow Algorithm | Proof of Mistake Bound # mistakes = u + v < 3u + 2 # mistakes < 2 + 3r(1 + lgn)

Learning Decision List in Mistake Bound Model  Learning Decision List in Mistake Bound Model 7 SHANG XINDI

55 Decision List --  level 1 --  level r --  level r+1

Decision List | Example 56

57 Decision List vs Disjunction

58 Learning Decision List

59 Learning Decision List | Algorithm

60 Learning Decision List | Example

61 Learning Decision List | Example

62 Learning Decision List | Example

63 Learning Decision List | Mistake Bound

Summary 1. Introduction: online learning vs. offline learning 2. Predicting from Expert Advice  Weighted Majority Algorithm: Simple Version  Weighted Majority Algorithm: Randomized Version 3. Mistake Bound Model  Learning a Concept Class C  Learning Monotone Disjunctions  Simple Algorithm  Winnow Algorithm  Learning Decision List 4. Demo of online learning 64

Learning to Swing-Up and Balance 65

Q & A