Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Linear Classifiers (perceptrons)

Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
Machine learning continued Image source:
Boosting Approach to ML
An Overview of Machine Learning
CMPUT 466/551 Principal Source: CMU
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Reduced Support Vector Machine
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Machine Learning CMPT 726 Simon Fraser University
SVM Support Vectors Machines
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Online Learning Algorithms
An Introduction to Support Vector Machines Martin Law.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Outline Classification Linear classifiers Perceptron Multi-class classification Generative approach Naïve Bayes classifier 2.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Benk Erika Kelemen Zsolt
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
An Introduction to Support Vector Machines (M. Law)
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Christopher M. Bishop, Pattern Recognition and Machine Learning.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
Data Mining and Decision Support
NTU & MSRA Ming-Feng Tsai
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
1 Predictive Learning from Data Electrical and Computer Engineering LECTURE SET 5 Nonlinear Optimization Strategies.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Machine Learning – Classification David Fenyő
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Regularized risk minimization
COMP61011 : Machine Learning Ensemble Models
Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
Logistic Regression Chapter 7.
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS

Statistical Learning Pattern classification Key issue: balancing complexity Large margin classifiers Multiclass classification Prediction Control

Pattern Classification Given training data, (X 1,Y 1 ),…,(X n,Y n ), find a prediction rule f to minimize For example: –Recognizing objects in an image. –Detecting cancer from a blood serum mass spectrogram.

The key issues: Approximation error Estimation error Computation Mass Spectrometer Serum samples Classifier Normal Cancer Pattern Classification

Pattern Classification Techniques Linearly parameterized (perceptron) Nonparametric (nearest neighbor) Nonlinearly parameterized –Decision trees –Neural networks Large margin classifiers –Kernel methods –Boosting

Pattern Classification A key problem in pattern classification is balancing complexity. A very complex model class has –good approximation, but –poor statistical properties. It is important to know how the model complexity and sample size affect the performance of a classifier. Motivation: Analysis and design of learning algorithms.

Minimax Theory for Pattern Classification Optimize the performance in the worst case over classification problems (probability distributions). The minimax performance of a learning system is characterized by the capacity of its model class (VC-dimension). For many model classes, this is closely related to the number of parameters, d: –Linearly parameterized (=d) –Decision trees, neural networks ( ¼ d log d) –Kernel methods, boosting ( ¼ d= 1 ) (Vapnik and Chervonenkis, et al)

Regularization Choose model to minimize (empirical error) + (complexity penalty) e.g. complexity approximation error estimation error/penalty

Data-Dependent Complexity Estimates More refined: use data to measure (and penalize) complexity. Performance can be significantly better than minimax.

Data-Dependent Complexity Estimates Minimizing the regularized criterion is hard. Large margin classifiers solve a simpler version of this optimization problem. For example, –Support vector machines –Adaboost

Large Margin Classifiers Two class classification: Y 2 { § 1}. Aim to choose a real-valued function f to minimize risk, Consider the margin, Yf(X): –Positive if sign of f is correct, –Magnitude indicates confidence of prediction.

Large Margin Classifiers Choose a convex margin cost function,  Choose f to minimize  -risk,

Large Margin Classifiers Adaboost:  (  )=exp(-  ). Support vector machine:  (  )=max(0,1-  ). Other kernel methods:  (  )=max(0,1-  ) 2. Neural networks:  (  )=(1-  ) 2. Logistic regression:  (  )=log(1+exp(-2  )).

Large Margin Classifiers: Results Optimizing (convex)  risk: –Computation becomes tractable, –Statistical properties are improved, but –Worse approximation properties. Universal consistency. (Steinwart) Optimal estimation rates; with low noise, better than minimax.

More Complex Decision Problems Two-class classification… There are many challenges in more complex decision problems: –Data analysis: multiclass pattern classification, anomaly detection, ranking, clustering, etc. –Prediction –Control

Multiclass Pattern Classification In many pattern classification problems, the number of classes is large, and different mistakes have different costs. For example, –Computer vision: Recognizing objects in an image. –Bioinformatics: Predicting gene function from expression profiles.

Multiclass Pattern Classification The most successful approach in practice is to convert multiclass classification problems to binary classification problems. Issues: –Code design. –Simultaneously optimize code and classifiers. (Minimal statistical penalty for choosing the code after seeing the data.) –Optimization? More direct approach?

Prediction with Structured Data These problems arise, for example, in –Natural language processing, –WWW data analysis, –Bioinformatics (gene/protein networks), and –Analysis of other spatiotemporal signals. Simple heuristics (n-grams, windows) are limited. Challenge: automatically extract from the data relevant structure that is useful for prediction.

Control: Sequential Decision Problems Examples: –Robotics. –Choosing the right medical treatment. –In drug discovery, choosing a suitable sequence of candidate drugs to investigate. Approximation/Estimation/Computation: –Complexity of a model class? –Can it be measured from data and experimentation?

Control: Sequential Decision Problems Reinforcement learning + control theory: –Adaptive control schemes with performance, stability guarantees? For control problems with discrete action spaces, are there analogs of large margin classifiers, with similar advantages - improving estimation properties by sacrificing approximation properties?

Statistical Learning We understand something about two-class pattern classification. More complex decision problems: –prediction –control