Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.

Slides:

Advertisements

Similar presentations

Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.

Advertisements

Introduction to Support Vector Machines (SVM)

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.

Model Assessment and Selection

Support Vector Machine

By : L. Pour Mohammad Bagher Author : Vladimir N. Vapnik

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

The Nature of Statistical Learning Theory by V. Vapnik

Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Announcements See Chapter 5 of Duda, Hart, and Stork. Tutorial by Burge linked to on web page. “Learning quickly when irrelevant attributes abound,” by.

The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

Statistics and Machine Learning Fall, 2005 鮑興國 and 李育杰 National Taiwan University of Science and Technology.

Computational Learning Theory

Reduced Support Vector Machine

Vapnik-Chervonenkis Dimension

Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.

1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)

Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.

Binary Classification Problem Linearly Separable Case

Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.

SVM Support Vectors Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

Classification and Regression

PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp

Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Support Vector Machine & Image Classification Applications

Kernel Classifiers from a Machine Learning Perspective (sec ) Jin-San Yang Biointelligence Laboratory School of Computer Science and Engineering.

Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.

Machine Learning CSE 681 CH2 - Supervised Learning.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

Universit at Dortmund, LS VIII

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)

Linear Learning Machines and SVM The Perceptron Algorithm revisited

Optimal Bayes Classification

Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.

Support Vector Machines

Support Vector Machines Tao Department of computer science University of Illinois.

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.

Nov 20th, 2001Copyright © 2001, Andrew W. Moore VC-dimension for characterizing classifiers Andrew W. Moore Associate Professor School of Computer Science.

Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

STATISTICAL LEARNING THEORY & CLASSIFICATIONS BASED ON SUPPORT VECTOR MACHINES presenter: Xipei Liu Vapnik, Vladimir. The nature of statistical.

CS 9633 Machine Learning Support Vector Machines

CH. 2: Supervised Learning

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Computational Learning Theory

Computational Learning Theory

Computational Learning Theory Eric Xing Lecture 5, August 13, 2010

Machine Learning: Lecture 6

Machine Learning: UNIT-3 CHAPTER-1

Supervised machine learning: creating a model

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent if it performed the correct classification of the training data  The ability of a classifier to correctly classify data not in the training set is known as its generalization  Bible code? 1994 Taipei Mayor election?  Predict the real future NOT fitting the data in your hand or predict the desired results

Probably Approximately Correct Learning pac Model  Key assumption: Training and testing data are generated i.i.d. according to an fixed but unknown distribution  We call such measure risk functional and denote  When we evaluate the “ quality ” of a hypothesis (classification function)we should take the unknowndistribution error” or “expected error” made by the ( i.e. “ average ) it as into account

Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on the random selection of  Find a bound of the trail of the distribution of in the form r.v.  is a function of and,where is the confidence level of the error bound which is given by learner

Probably Approximately Correct  We assert:  The error made by the hypothesis then the error bound will be less that is not depend on the unknown distribution or

Find the Hypothesis with Minimum Expected Risk?  Let the trainingexamples chosen i.i.d. according to withthe probability density be  The expected misclassification error made by is  The ideal hypothesis should has the smallest expected risk Unrealistic !!!

Empirical Risk Minimization (ERM)  Find the hypothesis with the smallest empirical risk and are not needed)(  Replace the expected risk over by an average over the training example  The empirical risk:  Only focusing on empirical risk will cause overfitting

VC Confidence (The Bound between )  The following inequality will be held with probability C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery 2 (2) (1998), p

Why We Maximize the Margin? (Based on Statistical Learning Theory)  The Structural Risk Minimization (SRM):   The expected risk will be less than or equal to empirical risk (training error)+ VC (error) bound 

Capacity (Complexity) of Hypothesis Space :VC-dimension  A given training set is shattered by if for every labeling of with this labeling if and only consistent  Three (linear independent) points shattered by a hyperplanes in

Shattering Points with Hyperplanes in Theorem: Consider some set of m points in. Choose a point as origin. Then the m points can be shattered by oriented hyperplanes if and only if the position vectors of the rest points are linearly independent. Can you always shatter three points with a line in ?

Definition of VC-dimension (A Capacity Measure of Hypothesis Space )  The Vapnik-Chervonenkis dimension,, of hypothesis spacedefined over the input space is the size of the (existent) largest finite subset shattered by  If arbitrary large finite set of can be shattered by, then of  Let then