Winnowing Algorithm CSL758 Instructors: Naveen Garg

Slides:



Advertisements
Similar presentations
Max Cut Problem Daniel Natapov.
Advertisements

CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
Decision Trees. DEFINE: Set X of Instances (of n-tuples x = ) –E.g., days decribed by attributes (or features): Sky, Temp, Humidity, Wind, Water, Forecast.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
A New Linear-threshold Algorithm Anna Rapoport Lev Faivishevsky.
Defining Polynomials p 1 (n) is the bound on the length of an input pair p 2 (n) is the bound on the running time of f p 3 (n) is a bound on the number.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
CS 340Chapter 2: Algorithm Analysis1 Time Complexity The best, worst, and average-case complexities of a given algorithm are numerical functions of the.
Hopefully this lesson will give you an inception of what recursion is.
1 Two-point Sampling. 2 X,Y: discrete random variables defined over the same probability sample space. p(x,y)=Pr[{X=x}  {Y=y}]: the joint density function.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Programming with Visual C++: Concepts and Projects Chapter 2B: Reading, Processing and Displaying Data (Tutorial)
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Online Learning Yiling Chen. Machine Learning Use past observations to automatically learn to make better predictions or decisions in the future A large.
How do you find the rule in a reducing pattern? For example: Find the Rule. What are the next 4 steps: 50, 40, 31, 23, 16…..
More on Efficiency Issues. Greatest Common Divisor given two numbers n and m find their greatest common divisor possible approach –find the common primes.
First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume.
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Machine Learning Concept Learning General-to Specific Ordering
1. Searching The basic characteristics of any searching algorithm is that searching should be efficient, it should have less number of computations involved.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
1 Machine Learning in Natural Language More on Discriminative models Dan Roth University of Illinois, Urbana-Champaign
Indian Institute of Technology, Delhi Randomized Algorithms (2-SAT & MAX-3-SAT) (14/02/2008) Sandhya S. Pillai(2007MCS3120) Sunita Sharma(2007MCS2927)
On-Line Algorithms in Machine Learning By: WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ KIM HYEONGCHEOL HE RUIDAN SHANG XINDI.
We are interested in methods that produce an interval: Common interval methods for: Confidence intervals Prediction intervals Tolerance intervals Credibility/Probability.
Machine Learning: Ensemble Methods
The Theory of NP-Completeness
More NP-Complete and NP-hard Problems
§ 1.5 Equations of Lines.
CS38 Introduction to Algorithms
Analysis of Algorithms
COP 3503 FALL 2012 Shayan Javed Lecture 15
UNIT- I Problem solving and Algorithmic Analysis
Computational Learning Theory
Finding a Path With Largest Smallest Edge
§ 1.5 Equations of Lines.
Hard Problems Introduction to NP
Great Theoretical Ideas in Computer Science
Algorithm Analysis CSE 2011 Winter September 2018.
Structural testing, Path Testing
Elementary Statistics
Algorithm design techniques Dr. M. Gavrilova
What’s the big idea? Many things in the natural and constructed world come in a predictable amount or in a recognisable sequence of numbers. Children learning.
Using local variable without initialization is an error.
CS 4/527: Artificial Intelligence
ICS 353: Design and Analysis of Algorithms
Algebraic Strategies Year 7.
Designing Algorithms for Multiplication of Fractions
NP-Complete Problems.
Logarithms.
Graphs and Algorithms (2MMD30)
§ § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § § ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊ ◊
Perceptron Algorithm.
CMPS 3120: Computational Geometry Spring 2013
Machine Learning: UNIT-3 CHAPTER-2
The Theory of NP-Completeness
Dynamic Programming Sequence of decisions. Problem state.
Graph Algorithms DS.GR.1 Chapter 9 Overview Representation
The Greedy Approach Young CS 530 Adv. Algo. Greedy.
The Selection Problem.
Instructor: Aaron Roth
Dynamic Programming Sequence of decisions. Problem state.
ADVANCED COMPUTATIONAL MODELS AND ALGORITHMS
Sequences Example Continue the following sequences for the next two terms 1, 2, 3, 4, 5, ……… 2, 4, 6, 8, 10, ………… 1, 3, 5, 7, 9, ………….. 4, 8, 12, 16, ………….
Chapter 8 JavaScript: Control Statements, Part 2
Presentation transcript:

Winnowing Algorithm CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Neha Dahiya March 7, 2008. 1

Concept Class A sequence of instances each having n binary attributes along with a result of (+) or (-) is presented to the algorithm to train it to predict the result. The goal is to come up with an adaptive strategy. It is assumed that a disjunction of r literals accurately describes whether a particular instance is in the required group or not. For example: if x1, x2, x3, x4 and x5 are attributes of instances, where xi = 1 if attribute ‘i’ is present, then n=5. If x1 ν x2 ν x5 exactly determines whether the instance is in required group or not, then r = 3.

The Winnowing algorithm Initialize variables w1=1, w2=1, …., wn =1. For any input instance, If ∑wi *xi >= n, then declare current example as (+). Else, declare current example as (-). Now check the actual result. If our result matches with actual result, then no change. If we declared (+) and actual result was (-), then half the weights of those attributes which were present in current example. If we declared (-) and actual result was (+), then double the weights of those attributes which were present in current example.

Upper Bound on # of mistakes Now we try to find an upper bound on total no. of mistakes that can happen using winnowing algorithm. The mistakes can be of two types: Type 1 Mistake: We declare example as (-) and it was (+). Type 2 Mistake: We declare example as (+) and it was (-).

Bound on Type 1 Mistakes For Type 1 mistakes, we increase the weights of attributes present in current example. None of the relevant attributes (attributes present in the disjunction) gets its weight reduced. Because we reduce the weight of an attribute only when an example having it makes type 2 mistake and any instance having a relevant attribute can cause type 2 error as it will always be a (+) example. The upper bound on weight of any relevant attribute is n. Once the weight of a relevant attribute reaches n, no instance having that attribute will be declared (-) as ∑wi xi >= n will be satisfied always (considering that weights are always positive).

Bound on Type 1 Mistakes So, for any given relevant attribute, the weight will not be doubled more than log2n times. And there are r such attributes. So, we cant make more than r* log2n mistakes of type 1.

Bound on Type 2 Mistakes Let the no. of type 2 mistakes be C. Lets do amortized analysis for function W = ∑wi. Its initial value = n. At case 1 mistake, W is increased by at most n.( ∑wixi was less then n before that instance, so on doubling weights of attributes present in current instance, we increase W by at most n.) At case 2 mistake, W is decreased by at least n/2.(∑wixi >=n for that instance => On halving weights of attributes present, we decrease W by at least n/2)

Bound on Type 2 Mistakes At any point of time, W is positive. So value subtracted from W should be less than Initial value + value added to W. Initial value = n. Value added to W <= n * no. of type 1 mistakes <= n * r * log2n Value subtracted from W >= (n/2) * no. of type 2 mistakes =C*n/2 C*n/2 <= n*r*log2 n + n C <= 2*r*log2 n + 2 So, the upper bound on type 2 mistakes = 2*r*log2 n + 2

Upper bound on # of mistakes Total no. of mistakes = No. of Type 1 mistakes + No. of Type 2 mistakes <= r*log2 n + 2*r*log2 n + 2 = 3*r*log2 n + 2 So total no. of mistakes made by winnowing algorithm is O(rlog n).