Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Slides:



Advertisements
Similar presentations
Component Analysis (Review)
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
An Overview of Machine Learning
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
K nearest neighbor and Rocchio algorithm
Principal Component Analysis
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
INSTANCE-BASE LEARNING
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
CS Instance Based Learning1 Instance Based Learning.
Introduction to machine learning
Radial Basis Function Networks
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Summarized by Soo-Jin Kim
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
This week: overview on pattern recognition (related to machine learning)
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Jakob Verbeek December 11, 2009
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data Mining and Decision Support
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Machine Learning Supervised Learning Classification and Regression
Machine Learning Fisher’s Criteria & Linear Discriminant Analysis
Data Transformation: Normalization
Machine Learning Clustering: K-means Supervised Learning
Ananya Das Christman CS311 Fall 2016
Instance Based Learning
Machine Learning Basics
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Instance Based Learning
COSC 4335: Other Classification Techniques
Multivariate Methods Berlin Chen
Feature Selection Methods
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning Support Vector Machine Supervised Learning
Machine Learning Perceptron: Linearly Separable Supervised Learning
Machine Learning – a Probabilistic Perspective
CAMCOS Report Day December 9th, 2015 San Jose State University
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Lecture 16. Classification (II): Practical Considerations
What is Artificial Intelligence?
Presentation transcript:

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron: Linearly Separable Multilayer Perceptron & EBP & Deep Learning, RBF Network Support Vector Machine Ensemble Learning: Voting, Boosting(Adaboost) Unsupervised Learning Principle Component Analysis Independent Component Analysis Clustering: K-means Semi-supervised Learning & Reinforcement Learning

General Classification Process

Machine learning: Classification

Machine learning: Regression

k-Nearest Neighbors Algorithm Handwritten Digit Example Some of the train examples of the digit zero, one and seven. Do As Your Neighbor Does

k-Nearest Neighbors Algorithm Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k = 3 (solid line circle) it is assigned to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the first class (3 squares vs. 2 triangles inside the outer circle).

k-Nearest Neighbors Algorithm In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:pattern recognitionnon-parametricclassificationregression feature space In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.integer In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.

k-Nearest Neighbors Algorithm k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.instance-based learninglazy learningmachine learning Both for classification and regression, it can be useful to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor. [2] [2] The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm has nothing to do with and is not to be confused with k-means, another popular machine learning techniquek-meansmachine learning

k-NN: Algorithm The training examples are vectors in a multidimensional feature space, each with a class label. The training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.feature vectors In the classification phase, k is a user-defined constant, and an unlabeled vector (a query or test point) is classified by assigning the label which is most frequent among the k training samples nearest to that query point. A commonly used distance metric for continuous variables is Euclidean distance. For discrete variables, such as for text classification, another metric can be used, such as the overlap metric (or Hamming distance).continuous variablesEuclidean distanceHamming distance A drawback of the basic "majority voting" classification occurs when the class distribution is skewed. That is, examples of a more frequent class tend to dominate the prediction of the new example, because they tend to be common among the k nearest neighbors due to their large number. [4] One way to overcome this problem is to weigh the classification, taking into account the distance from the test point to each of its k nearest neighbors. The class (or value, in regression problems) of each of the k nearest points is multiplied by a weight proportional to the inverse of the distance from that point to the test point. Another way to overcome skew is by abstraction in data representation. [4]

k-NN: Parameter selection The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, [5] but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques (see hyperparameter optimization). The special case where the class is predicted to be the class of the closest training sample (i.e. when k = 1) is called the nearest neighbor algorithm. [5]heuristichyperparameter optimization The accuracy of the k-NN algorithm can be severely degraded by the presence of noisy or irrelevant features, or if the feature scales are not consistent with their importance. Much research effort has been put into selecting or scaling features to improve classification. A particularly popular approach is the use of evolutionary algorithms to optimize feature scaling. Another popular approach is to scale features by the mutual information of the training data with the training classes. selectingscalingevolutionary algorithmsmutual information In binary (two class) classification problems, it is helpful to choose k to be an odd number as this avoids tied votes. One popular way of choosing the empirically optimal k in this setting is via bootstrap method.

k-NN: Feature Extraction When the input data to an algorithm is too large to be processed and it is suspected to be redundant (e.g. the same measurement in both feet and meters) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction.feature extraction If the features extracted are carefully chosen, it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input. Feature extraction is performed on raw data prior to applying k-NN algorithm on the transformed data in feature space.feature space

k-NN: Dimension reduction For high-dimensional data (e.g., with number of dimensions more than 10) dimension reduction is usually performed prior to applying the k-NN algorithm in order to avoid the effects of the curse of dimensionality. dimension reductioncurse of dimensionality The curse of dimensionality in the k-NN context basically means that Euclidean distance is unhelpful in high dimensions because all vectors are almost equidistant to the search query vector (imagine multiple points lying more or less on a circle with the query point at the center; the distance from the query to all data points in the search space is almost the same).Euclidean distance Feature extraction and dimension reduction can be combined in one step using principal component analysis (PCA), linear discriminant analysis (LDA), or canonical correlation analysis (CCA) techniques as a pre-processing step, followed by clustering by k-NN on feature vectors in reduced-dimension space. In machine learning this process is also called low-dimensional embedding.Feature extraction principal component analysislinear discriminant analysis canonical correlation analysisfeature vectorsmachine learningembedding

k-NN Regression In k-NN regression, the k-NN algorithm is used for estimating continuous variables. One such algorithm uses a weighted average of the k nearest neighbors, weighted by the inverse of their distance. This algorithm works as follows: 1.Compute the Euclidean or Mahalanobis distance from the query example to the labeled examples.Mahalanobis distance 2.Order the labeled examples by increasing distance. 3.Find a heuristically optimal number k of nearest neighbors, based on RMSE. This is done using cross validation.RMSE 4.Calculate an inverse distance weighted average with the k-nearest multivariate neighbors.