Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
An Introduction of Support Vector Machine
Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Discriminative and generative methods for bags of features
1 CLUSTERING  Basic Concepts In clustering or unsupervised learning no training data, with class labeling, are available. The goal becomes: Group the.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Chapter 2: Pattern Recognition
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Machines Kernel Machines
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
This week: overview on pattern recognition (related to machine learning)
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Chapter 7 FEATURE EXTRACTION AND SELECTION METHODS Part 2 Cios / Pedrycz / Swiniarski / Kurgan.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Linear Discrimination Reading: Chapter 2 of textbook.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
An Introduction to Support Vector Machine (SVM)
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Data Mining and Decision Support
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Machine Learning for Computer Security
Deep Feedforward Networks
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
The Elements of Statistical Learning
Overview of Supervised Learning
Machine Learning Week 1.
Pattern Recognition PhD Course.
Machine Learning – a Probabilistic Perspective
Supervised machine learning: creating a model
Presentation transcript:

Chapter 4 CONCEPTS OF LEARNING, CLASSIFICATION AND REGRESSION Cios / Pedrycz / Swiniarski / Kurgan

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 2 Outline Main Modes of Learning Types of Classifiers Approximation, Generalization and Memorization

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 3 Main Modes of Learning Unsupervised learning Supervised learning Reinforcement learning Learning with knowledge hints and semi-supervised learning

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 4 Unsupervised Learning Unsupervised learning, e.g., clustering, is concerned with an automatic discovering of structure in data without any supervision. Given N-dimensional dataset X = {x 1, x 2,…, x N }, where each x k is characterized by a set of attributes, determine structure, i.e., identify and describe groups (clusters) present within X.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 5 Examples of Clusters Geometry of clusters (groups) and 4 ways of grouping patterns

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 6 Defining Distance/Closeness of Data Distance function d(x, y) plays a pivotal role when grouping data Conditions for a distance metric: d (x,x) = 0 d(x, y ) = d(y,x) symmetry d(x, z) + d(z, y) >= d(x,y) triangle inequality

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 7 Examples of Distance Functions Hamming distance Euclidean distance Tchebyschev distance

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 8 Hamming/Euclidean/ Tchebyschev Distances

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 9 Supervised Learning We are given a collection of data (patterns) in two forms: discrete labels - in which case we have a classification problem values of a continuous variable – in which case we have a regression or approximation problem

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 10 Examples of Classifiers Linear classifier Piece-wise linear classifier Nonlinear classifier

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 11 Reinforcement Learning Reinforcement learning is guided by less detailed information (supervision mechanism) than in the case of supervised learning. It comes in the form of reinforcement information (reinforcement signal). For instance, given “c” classes, the reinforcement signal r(w) could be binary:

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 12 Reinforcement Learning Reinforcement in classification- partial guidance through class combination

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 13 Reinforcement Learning Reinforcement in regression- the thresholded version of target signal

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 14 Reinforcement Learning Reinforcement in regression- partial guidance through aggregate (average) of a signal

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 15 Semi-supervised Learning Often, we possess some domain knowledge when clustering. It may be in the form of a small portion of data being labeled.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 16 Learning with Proximity Hints Instead of class labels, we may have pairs of data for which proximity levels have been provided. Advantages: Number of classes is not required Only some selected pairs of data are considered

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 17 Classification Problem Classifiers are algorithms that discriminate between classes of patterns. Depending upon the number of classes in the problem, we talk about two- and many-class classifiers. The design of the classifier depends upon the character of data, number of classes, learning algorithm, and validation procedures. Classifier can be regarded as the mapping (F) from feature space to class space F: X  {  1,  2, …,  c }

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 18 Two-Class Classifier and Output Coding

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 19 Multi Class Classifier Maximum of class membership- select class (i0) for which i0 = arg max {y1, y2,…, yc}

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 20 Multi Class Dichotomic Classifier We can split the c-class problem into a subset of two-class problems. In each, we consider class, say  1, and the other class is formed by all the patterns that do not belong to class  1. Binary/dichotomic decision:  1 (x) 0 if x belongs to  1  1 (x) < 0 if x does not belong to  1

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 21 Multi Class Dichotomic Classifier Dichotomic decision:  1 (x) 0 if x belongs to  1  1 (x) < 0 if x does not belong to  1 Cases: only one classifier generates a nonnegative value several classifiers identify the pattern as belonging to a specific class. conflict class assignment no classifier issued a classification decision – undefined class assignment.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 22 Multi Class Dichotomic Classifier

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 23 Classification vs. Regression In contrast to classification in regression we have: continuous output variable and the objective is to build a model (regressor) so that a certain approximation error is minimized For a data set formed by pairs of input-output data (x k, y k ), k = 1, 2,…,N where y k is in R the regression model (regressor) has the form of some mapping F(x) such that for any x k we obtain F(x k ) that is as close to y k as possible.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 24 Examples of Regression Models Linearly distributed data High dispersion Nonlinearly distributed data Low dispersion Linearly distributed data Low dispersion

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 25 Main Categories of Classifiers Explicit and implicit characterization of classifiers: (a)Explicitly specified function - such as linear, polynomial, neural network, etc. (b)Implicit – no formula but rather a description, such as a decision tree, nearest neighbor classifier, etc.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 26 Nearest - Neighbor Classifier Classify x considering class of the nearest neighbor L = arg min k ||x – x k || class of x is the same as the class to which x L belongs to

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 27 Decision Trees Boundaries are always parallel to the coordinate axes.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 28 Linear Classifiers Linear function of the features (variables)  (x) = w 0 + w 1 x 1 + w 2 x 2 + … +w n x n Parameters of the classifier: w 0, w 1, …. Geometry: line, plane, hyperplane Linear separability of data

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 29 Linear Classifiers Linear classifiers can be described in a compact form by using vector notation:  (x) = w T x~ where w = [w 0 w 1 …w n ] T and x~=[1 x 1 x 2 … x n ] Note that x~ is defined in an extended/augmented input space that is x~ =[1 x] T

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 30 Nonlinear Classifiers Polynomial classifiers  (x) = w 0 + w 1 x 1 + w 2 x 2 + … +w n x n + + w n+1 x w n+2 x … + w 2n x n w 2n+1 x 1 x have nonlinear boundaries formed at the expense of increased dimensionality of the feature space.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 31 Performance Assessment Loss function: L(  1,  2 ) and L(  2,  1 ) Correct classification losses

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 32 Performance Assessment A performance index is used to measure the quality of the classifier and can be expressed for the k-th data point as: We sum up the above expressions over all data to express the total cumulative error

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 33 Generalization Aspects of Classification/Regression Models Performance is assessed with regard to unseen data. Typically, the available data are split into tow or three disjoint subsets Training Validation Testing Training set - used to complete training (learning) of the classifier. All optimization activities are guided by the performance index and its changes are reported for the training data.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 34 Overtraining and Validation Sets Validation set is essential in selecting a structure of classifiers By using validation set, we can determine an optimal order of the polynomial Consider polynomial classifiers

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 35 Approximation, Generalization and Memorization Approximation – generalization dilemma: excellent performance on the training set but unacceptable performance on the testing set. Memorization effect: data becomes memorized (including those data points that are noisy) and thus classifier exhibits poor generalization abilities.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 36 Approximation, Generalization and Memorization Nonlinear classifier produced zero classification error but with poor generalization ability.

© 2007 Cios / Pedrycz / Swiniarski / Kurgan 37 References Bishop, C.M Neural Networks for Pattern Recognition, Oxford University Press Duda, R.O, Hart, PE and Stork DG Pattern Classification, 2nd edition, J. Wiley Kaufmann, L. and Rousseeuw, P.J Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Soderstrom, T. and Stoica, P System Identification, Wiley Webb, A Statistical Pattern Recognition, 2nd edition, Wiley