Computational Learning Theory

Slides:



Advertisements
Similar presentations
PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Advertisements

Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 PAC Learning and Generalizability. Margin Errors.
Chapter 6 The Structural Risk Minimization Principle Junping Zhang Intelligent Information Processing Laboratory, Fudan University.
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
The Nature of Statistical Learning Theory by V. Vapnik
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Reduced Support Vector Machine
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Binary Classification Problem Linearly Separable Case
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
SVM Support Vectors Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
T. Poggio, R. Rifkin, S. Mukherjee, P. Niyogi: General Conditions for Predictivity in Learning Theory Michael Pfeiffer
Linear Learning Machines and SVM The Perceptron Algorithm revisited
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support Vector Machines
Support Vector Machines Tao Department of computer science University of Illinois.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
STATISTICAL LEARNING THEORY & CLASSIFICATIONS BASED ON SUPPORT VECTOR MACHINES presenter: Xipei Liu Vapnik, Vladimir. The nature of statistical.
CS 9633 Machine Learning Support Vector Machines
Support Vector Machines
CH. 2: Supervised Learning
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Computational Learning Theory
Computational Learning Theory
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
Support Vector Machines and Kernels
Machine Learning: UNIT-3 CHAPTER-2
Supervised Learning Berlin Chen 2005 References:
Supervised machine learning: creating a model
DATA MINING Ronald Westra Dep. Mathematics
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
INTRODUCTION TO Machine Learning 3rd Edition
Support Vector Machines 2
Presentation transcript:

Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390

Introduction For the analysis of data structures and algorithms and their limits we have: Computational complexity theory and analysis of time and space complexity e.g. Dijkstra’s algorithm or Bellman-Ford? For the analysis of ML algorithms, there are other questions we need to answer: Computational learning theory Statistical learning theory

Computational learning theory (Wikipedia) Probably approximately correct learning (PAC learning) --Leslie Valiant inspired boosting VC theory --Vladimir Vapnik led to SVMs Bayesian inference --Thomas Bayes Algorithmic learning theory --E. M. Gold Online machine learning --Nick Littlestone

Today PAC model of learning VC dimension SRM sample complexity, computational complexity VC dimension hypothesis space complexity SRM model estimation examples throughout … yay! We will try to cover these topics while avoiding difficult formulizations 

PAC learning (L. Valiant, 1984)  

PAC learning: measures of error  

PAC-learnability: definition  

Sample complexity  

   

     

   

   

   

   

Sample complexity: example  

   

VC theory (V. Vapnik and A. Chervonenkis, 1960-1990) VC dimension Structural risk minimization

VC dimension (V. Vapnik and A. Chervonenkis, 1968, 1971)  

  For our purposes, it is enough to find one set of three points that can be shattered It is not necessary to be able to shatter every possible set of three points in 2 dimensions

VC dimension: continued  

VC dimension: intuition High capacity: not a tree, b/c different from any tree I have seen (e.g. different number of leaves) Low capacity: if it’s green then it’s a tree

Low VC dimension VC dimension is pessimistic If using oriented lines as our hypothesis class we can only learn datasets with 3 points!  This is b/c VC dimension is computed independent of distribution of instances In reality, near neighbors have same labels So no need to worry about all possible labelings There are a lot of datasets containing more points that are learnable by a line 

Infinite VC dimension Lookup table has infinite VC dimension But so does the nearest neighbor classifier b/c any number of points, labeled arbitrarily (w/o overlaps), will be successfully learned But it performs well... So infinite capacity does not guarantee poor generalization performance

Examples of calculating VC dimension  

Examples of calculating VC dimension: continued  

VC dimension of SVMs with polynomial kernels  

Sample complexity and the VC dimension (Blumer et al., 1989)  

Structural risk minimization (V. Vapnik and A. Chervonenkis, 1974) A function that fits training data and minimizes VC dimension will generalize well SRM provides a quantitative characterization of the tradeoff between the complexity of the approximating functions, and the quality of fitting the training data First, we will talk about a certain bound..

A bound   true mean error/ actual risk empirical risk

A bound: continued VC confidence   risk bound

SRM: continued To minimize true error (actual risk), both empirical risk and VC confidence term should be minimized The VC confidence term depends on the chosen class of functions Whereas empirical risk and actual risk depend on the one particular function chosen for training

SRM: continued  

SRM: continued For a given data set, optimal model estimation with SRM consists of the following steps: select an element of a structure (model selection) estimate the model from this element (statistical parameter estimation)

SRM: continued  

Support vector machine (Vapnik, 1982)  

Support vector machine (Vapnik, 1995)  

What we said today PAC bounds VC dimension is a measure of complexity only apply to finite hypothesis spaces VC dimension is a measure of complexity can be applied for infinite hypothesis spaces Training neural networks uses only ERM, whereas SVMs use SRM Large margins imply small VC dimension 

References Thanks! Questions? Machine Learning by T. M. Mitchell A Tutorial on Support Vector Machines for Pattern Recognition by C. J. C. Burges http://www.svms.org/ Conversations with Dr. S. Shiry and M. Milanifard Thanks! Questions?