Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Introduction to Support Vector Machines (SVM)
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
Model Assessment and Selection
Support Vector Machine
By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
The Nature of Statistical Learning Theory by V. Vapnik
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.
The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Statistics and Machine Learning Fall, 2005 鮑興國 and 李育杰 National Taiwan University of Science and Technology.
Computational Learning Theory
Reduced Support Vector Machine
Vapnik-Chervonenkis Dimension
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
Binary Classification Problem Linearly Separable Case
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
SVM Support Vectors Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Classification and Regression
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Support Vector Machine & Image Classification Applications
Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Machine Learning CSE 681 CH2 - Supervised Learning.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Universit at Dortmund, LS VIII
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Linear Learning Machines and SVM The Perceptron Algorithm revisited
Optimal Bayes Classification
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Support Vector Machines
Support Vector Machines Tao Department of computer science University of Illinois.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Nov 20th, 2001Copyright © 2001, Andrew W. Moore VC-dimension for characterizing classifiers Andrew W. Moore Associate Professor School of Computer Science.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
STATISTICAL LEARNING THEORY & CLASSIFICATIONS BASED ON SUPPORT VECTOR MACHINES presenter: Xipei Liu Vapnik, Vladimir. The nature of statistical.
CS 9633 Machine Learning Support Vector Machines
CH. 2: Supervised Learning
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Supervised machine learning: creating a model
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent if it performed the correct classification of the training data  The ability of a classifier to correctly classify data not in the training set is known as its generalization  Bible code? 1994 Taipei Mayor election?  Predict the real future NOT fitting the data in your hand or predict the desired results

Probably Approximately Correct Learning pac Model  Key assumption: Training and testing data are generated i.i.d. according to an fixed but unknown distribution  We call such measure risk functional and denote  When we evaluate the “ quality ” of a hypothesis (classification function)we should take the unknowndistribution error” or “expected error” made by the ( i.e. “ average ) it as into account

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on the random selection of  Find a bound of the trail of the distribution of in the form r.v.  is a function of and,where is the confidence level of the error bound which is given by learner

Probably Approximately Correct  We assert:  The error made by the hypothesis then the error bound will be less that is not depend on the unknown distribution or

Find the Hypothesis with Minimum Expected Risk?  Let the trainingexamples chosen i.i.d. according to withthe probability density be  The expected misclassification error made by is  The ideal hypothesis should has the smallest expected risk Unrealistic !!!

Empirical Risk Minimization (ERM)  Find the hypothesis with the smallest empirical risk and are not needed)(  Replace the expected risk over by an average over the training example  The empirical risk:  Only focusing on empirical risk will cause overfitting

VC Confidence (The Bound between )  The following inequality will be held with probability C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998), p

Why We Maximize the Margin? (Based on Statistical Learning Theory)  The Structural Risk Minimization (SRM):   The expected risk will be less than or equal to empirical risk (training error)+ VC (error) bound 

Capacity (Complexity) of Hypothesis Space :VC-dimension  A given training set is shattered by if for every labeling of with this labeling if and only consistent  Three (linear independent) points shattered by a hyperplanes in

Shattering Points with Hyperplanes in Theorem: Consider some set of m points in. Choose a point as origin. Then the m points can be shattered by oriented hyperplanes if and only if the position vectors of the rest points are linearly independent. Can you always shatter three points with a line in ?

Definition of VC-dimension (A Capacity Measure of Hypothesis Space )  The Vapnik-Chervonenkis dimension,, of hypothesis spacedefined over the input space is the size of the (existent) largest finite subset shattered by  If arbitrary large finite set of can be shattered by, then of  Let then