Download presentation
Presentation is loading. Please wait.
Published byClementine Mitchell Modified over 8 years ago
1
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI
2
Overview 1. Introduction: online learning vs. offline learning 2. Predicting from Expert Advice Weighted Majority Algorithm: Simple Version Weighted Majority Algorithm: Randomized Version 3. Mistake Bound Model Learning a Concept Class C Learning Monotone Disjunctions Simple Algorithm Winnow Algorithm Learning Decision List 4. Conclusion 5. Q & A 2
3
Intro to Machine Learning Offline Learning Online Learning 1 WALEED ABDULWAHAB YAHYA AL-GOBI
4
Definition More concrete Example: Task T: Prediction traffic patterns at a busy intersection. Experience E: Historical or past traffic pattern. Performance Measure P: Accuracy of predicting future traffic patterns. Learned Model (i.e. Target function) y = h(x) 4 “ A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E ” --- [Mitchell, 1997] Machine Learning | Definition
5
Offline Learning: Learning Phase the learning algorithm is trained on a pre-defined set of learning examples to create a hypothesis. Testing phase The hypothesis will be used to find the accurate conclusion for a given new data. Example: MRI brain images classification 5 Machine Learning | Offline Learning vs Online Learning Training Examples Learning Algorithm h(x) Image Features Learned model h(x) Training Images Training Labels Training
6
Online Learning As opposite to offline learning that finds the predictor h(x) on the entire training set at once. Online learning algorithm is a common technique used in the areas of ML where it is computationally infeasible to train on the entire dataset all at once. Online learning is a method of ML in which data becomes available in sequential order, and is used to update our predictor h(x) at each step. 6 Machine Learning | Offline Learning vs Online Learning
7
Examples of Online Learning Stock Price Prediction: Here the data is generated as a function of time so online learning can dynamically adopt to new patterns in the new data. Spam Filtering: Here the data is generated based on the output of learning algorithm (Spam Detector) so online learning can dynamically adopt to new pattern to minimize our losses. 7 Machine Learning | Offline Learning vs Online Learning
8
Online Learning: Example: Stock Price Prediction Training Examples Learning Algorithm h(x) …........... Training Examples Learning Algorithm h(x) 8 Stock prices Prediction Data Features Update hypothesis h(x) Time Receiving Truth Machine Learning | Offline Learning vs Online Learning
9
9 Offline LearningOnline Learning Two Phase Learning: How?Multi-phase Learning: How? Entire dataset given at onceOne Example given at time Learn the dataset to Construct target function h(x) Predict, Receive correct answer, Update target function h(x) at each step of learning Predict incoming new data Learning phase is separated from testing phase Learning phase is combined with testing phase Machine Learning | Offline Learning vs Online Learning
10
Predicting from Expert Advice Basic Flow An Example 2 WALEED ABDULWAHAB YAHYA AL-GOBI
11
Combining Expert Advice Prediction Truth Assumption: prediction ∈ {0, 1}. 11 Receiving prediction from experts Making its own prediction Being told the correct answer Predicting from Expert Advice | Basic Flow
12
Task: predicting whether it will rain today. Input: advices of n experts ∈ {1 (yes), 0 (no)}. Output: 1 or 0. Goal: make the least number of mistakes. Expert 1Expert 2Expert 3Truth 21 Jan 20131011 22 Jan 20130101 23 Jan 20131011 24 Jan 20130111 25 Jan 20131011 12 Predicting from Expert Advice | An Example
13
The Weighted Majority Algorithm Simple Version Randomized Version 3 WALEED ABDULWAHAB YAHYA AL-GOBI
14
14 The Weighted Majority Algorithm
15
DateExpert AdviceWeight∑w i PredictionCorrect Answer x1x1 x2x2 x3x3 w1w1 w2w2 w3w3 (x i =0)(x i =1) 21 Jan 2013 22 Jan 2013 23 Jan 2013 24 Jan 2013 25 Jan 2013 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0.50 0.25 1 0.50 0.25 1 1 0.50 1 2 0.25 2 0.50 1 0.75 1 0 1 1 1 1 1 1 1 1 15 The Weighted Majority Algorithm
16
Proof: Let M := # of mistakes made by Weight Majority algorithm. W := total weight of all experts (initially = n). A mistaken prediction: At least ½ W predicted incorrectly. In step 3 total weight reduced by a factor of ¼ (= ½ W x ½). W ≤ n(¾) M Assuming the best expert made m mistakes. W ≥ ½ m So, ½ m ≤ n(¾)M M ≤ 2.41(m + lgn). 16 The Weighted Majority Algorithm
17
MUHAMMAD BURHAN HAFEZ Randomized Weighted Majority Algorithm (RWMA) Simple Version Randomized Version 4
18
18 M WMA ≤ 2.41 (m + lg n) Suppose n = 10, m = 20, and we run 100 prediction trials. M WMA = 56!!! Can we do better? The Randomized Weighted Majority Algorithm (RWMA)
19
19 1.View weights as probabilities. 1.Replace “multiply by ½” with “multiply by β”. Two modifications: The Randomized Weighted Majority Algorithm (RWMA)
20
20 The algorithm: 1.Initialize the weights w 1, …, w n of all experts to 1. 2.Given a set of predictions {x 1, …, x n } by experts, output x i with probability w i /W. 3.Receive the correct answer l and penalize each mistaken expert by multiplying its weight by β. Go-to 2. The Randomized Weighted Majority Algorithm (RWMA)
21
21 RWMA in action (β = ½ ): ExpertsE1E1 E2E2 E3E3 E4E4 E5E5 E6E6 prediction Correct answer Weights111111 Advice11000001 Weights11½½½½ Advice01111010 Weights1½¼¼¼½ The Randomized Weighted Majority Algorithm (RWMA)
22
22 On the i th trial, Define F i to be the fraction of the total weight on the wrong answers at i th trial. Say we have seen t examples. Let M be our expected # of mistakes so far, so Mistake bound: The Randomized Weighted Majority Algorithm (RWMA)
23
x 23 x The Randomized Weighted Majority Algorithm (RWMA)
24
24 The relation between β and M: β M ¼1.85m + 1.3 ln (n) ½1.39m + 2 ln (n) ¾1.15m + 4 ln (n) When β = ½ The simple algorithmRWMA M ≤ 2.41m + 2.41 ln(n) M ≤ 1.39m + 2 ln(n) The Randomized Weighted Majority Algorithm (RWMA)
25
25 Other advantages of RWMA: 1.Consider the case where just only %51 of the experts were mistaken. In WMA, we directly use this majority and predict accordingly, resulting in a wrong prediction. In RWMA, there is still roughly a 50/50 chance that we’ll predict correctly. 2.Consider the case where predictions are strategies (cannot easily be combined together). In WMA, since all strategies are generally different, we cannot combine experts who predicted the same strategies. RWMA can be directly applied, because it doesn’t depend on summing the weights of experts who gave the same strategy to make a decision, but rather on the individual weights of experts The Randomized Weighted Majority Algorithm (RWMA)
26
Learning a Concept Class in Mistake Bound Model A Concept Class Mistake Bound Model Definition of learning a class in Mistake Bound Model 5 KIM HYEONGCHEOL
27
What we covered so far … Input : Yes/No from the “experts” Weather experts Question to experts : Will it rain tomorrow? Experts’ prediction : Yes/No Output : The algorithm make a prediction as well Question to the algorithm : Will it rain tomorrow? Prediction : Yes/No Penalization according to the correctness Simple algorithm & Better randomized algorithm 27 Quick Review
28
# Questions WWhat is a concept class C? WWhat is Mistake Bound Model? WWhat do we mean by learning a concept class in Mistake Bound Model? 28 On line learning a concept class C Mistake Bound Model in Learn a Concept Class
29
29 * Disjunction : a ∨ b * Conjunction : a ∧ b A Concept Class C
30
On-line learning Iteration: The algorithm receives unlabeled example The algorithm predicts the label of the example The algorithm is then given the true label Penalization will be applied to the algorithm depending on correctness Mistake Bound The mistake made by the algorithm is bounded by M (ideally, we hope M is as small as possible) 30 Mistake Bound Model
31
Learning a Concept Class in Mistake Bound Model 31
32
Learning a Concept Class in Mistake Bound Model 32
33
If the algorithm takes the assumption and the condition, we can say that, it learns class C in the mistake bound learning model Especially, if the number of mistakes made is only poly(size(c)) ∙ polylog(n), the algorithm is robust to the presence of many additional irrelevant variables : attribute efficient 33 Learning a Concept Class in Mistake Bound Model
34
Some examples of learning classes in Mistake Bound Model MMonotone disjunctions SSimple algorithm TThe winnow algorithm DDecision list 34 Examples of Learnings
35
Learning Monotone Disjunctions Simple Algorithm Winnow Algorithm 6 KIM HYEONGCHEOL
36
Learning Monotone disjunctions | Problem Definition 36
37
Simple Algorithm 37
38
When the target concept c(x) = X 2 ∨ X 3 Red -> A mistake on negative example Green -> A correct prediction ‘n’ = 6 38 Hypothesis ‘h’ c(x) Negative examples Simple Algorithm | An example
39
39 Simple Algorithm | An example
40
Learning Monotone Disjunctions Simple Algorithm Winnow Algorithm 6 HE RUIDAN
41
The simple algorithm learns the class of disjunctions with mistakes bound by n Winnow algorithm : An algorithm with less mistakes 41 Learning the class of disjunctions | Winnow Algorithm
42
Winnow Algorithm: Each input vector x = {x1, x2, … xn}, xi ∈ {0, 1} Assume the target function is the disjunction of r relevant variables. i.e. c(x) = xt1 V xt2 V … V xtr Winnow algorithm provides a linear separator 42 Winnow Algorithm | Basic Concept
43
Initialize: weights w 1 = w 2 = … = w n =1 Iterate: Receive an example vector x = {x 1, x 2, … x n } Predict: Output 1 if Output 0 otherwise Get the true label Update if making a mistake: Predict negative on positve example: for each x i = 1: w i = 2*w i Predict positive on negative example: for each x i = 1: w i = w i /2 43 Winnow Algorithm | Work Flow
44
Theorem: The Winnow Algorithm learns the class of disjunctions in the Mistake Bound model, making at most 2+3r(1 + lgn) mistakes when the target concept is a disjunction of r variables Attribute efficient: the # of mistakes is only poly(r). polylog(n) Particularly good for learning where the number of relevant variables r is much less than the total number of variables n 44 Winnow Algorithm | Mistake Bound
45
u: # of mistakes made on positive examples (output 0 while true result is 1) v: # of mistakes made on negative examples (output 1 while true result is 0) Proof 1: u <= r(1 + lgn) Proof 2: v < 2(u + 1) Therefore, # of total mistakes = u + v = 3u + 2, which is bounded by 2 + 3r(1 + lgn) 45 Winnow Algorithm | Proof of Mistake Bound
46
u: # of mistakes made on positive examples v: # of mistakes made on negative examples Proof 1: u <= r(1+lgn) Any mistakes made on positive examples must double at least one of the weights in the target function 46 Winnow Algorithm | Proof of Mistake Bound
47
Any mistakes made on positive examples must double at least one of the weights in the target function For an example X, h(X) = negative, c(X) = positive c(X) = positive at least one target variable is one in X According to the algorithm, when hypothesis predicts positive as negative, the weights of variables equals to one in the example are doubled, therefore at least the weight of one target variable will be doubled. 47 Winnow Algorithm | Proof of Mistake Bound
48
u: # of mistakes made on positive examples v: # of mistakes made on negative examples Proof 1: u <= r(1+lgn) Any mistakes made on positive examples must double at least one of the weights in the target function The weights of target variables will not be halved. 48 Winnow Algorithm | Proof of Mistake Bound
49
The weights of target variables will not be halved. According to the algorithm, only when h(X) = positive while c(X) = negative, the weights of variables equals to one in X will be halved. c(X) = negative no target variable is one in X no target variables’ weight will be halved 49 Winnow Algorithm | Proof of Mistake Bound
50
u: # of mistakes made on positive examples v: # of mistakes made on negative examples Proof 1: u <= r(1+lgn) Any mistakes made on positive examples must double at least one of the weights in the target function The weights of target variables will not be halved. Each of the weights of target variables can be doubled at most 1 + lgn times. 50 Winnow Algorithm | Proof of Mistake Bound
51
Each of the weights of target variables can be doubled at most 1 + lgn times The weight of target variable could only be doubled, will never be halved. When the weight of any target variable equals or larger than n, hypothesis will always predict positive, if the target variable is one. Only when hypothesis predict negative on positive examples, the weights of variables equals to one will be doubled if the hypothesis always predict positive, no weights will be doubled. Therefore, the weight of any target variable cannot be doubled when it equals or larger than n The weight of any target variable can be doubled at most 1+lgn times 51 Winnow Algorithm | Proof of Mistake Bound
52
u: # of mistakes made on positive examples v: # of mistakes made on negative examples Proof 1: u <= r(1+lgn) Any mistakes made on positive examples must double at least one of the weights in the target function Each of the weight of target variables will not be halved. As when any of the target variable is one, the example must not be negative Each of the weights of target variables can be doubled at most 1 + lgn times as only weight less than n can be doubled. Therefore, u <= r(1+lgn) since there are r variables in target function 52 Winnow Algorithm | Proof of Mistake Bound
53
u: # of mistakes made on positive examples v: # of mistakes made on negative examples Proof 2: v < 2(u+1) Initially, total weight W = n Mistake on positive examples: W < W + n Mistake on negative examples: W <= W – n/2 Therefore, W < n + un – v(n/2) 0 <= W < n + un – v(n/2) v < 2(u + 1) 53 Winnow Algorithm | Proof of Mistake Bound # mistakes = u + v < 3u + 2 # mistakes < 2 + 3r(1 + lgn)
54
Learning Decision List in Mistake Bound Model Learning Decision List in Mistake Bound Model 7 SHANG XINDI
55
55 Decision List -- level 1 -- level r -- level r+1
56
Decision List | Example 56
57
57 Decision List vs Disjunction
58
58 Learning Decision List
59
59 Learning Decision List | Algorithm
60
60 Learning Decision List | Example
61
61 Learning Decision List | Example
62
62 Learning Decision List | Example 0011 0100 1000 1100
63
63 Learning Decision List | Mistake Bound
64
Summary 1. Introduction: online learning vs. offline learning 2. Predicting from Expert Advice Weighted Majority Algorithm: Simple Version Weighted Majority Algorithm: Randomized Version 3. Mistake Bound Model Learning a Concept Class C Learning Monotone Disjunctions Simple Algorithm Winnow Algorithm Learning Decision List 4. Demo of online learning 64
65
Learning to Swing-Up and Balance 65
66
Q & A
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.