Salvatore giorgi Ece 8110 machine learning 5/12/2014

Slides:

Advertisements

Similar presentations

ECG Signal processing (2)

Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Data Mining Classification: Alternative Techniques

Support Vector Machines

K Means Clustering , Nearest Cluster and Gaussian Mixture

Supervised Learning Recap

ECE 8527 Homework Final: Common Evaluations By Andrew Powell.

Instance Based Learning

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Pattern Recognition and Machine Learning

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

INSTANCE-BASE LEARNING

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Radial-Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Collaborative Filtering Matrix Factorization Approach

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Neural Networks.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,

Matlab Matlab Sigmoid Sigmoid Perceptron Perceptron Linear Linear Training Training Small, Round Blue-Cell Tumor Classification Example Small, Round Blue-Cell.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Today’s Lecture Neural networks Training

Big data classification using neural network

Deep Feedforward Networks

Multimodal Learning with Deep Boltzmann Machines

Basic machine learning background with Python scikit-learn

Nearest-Neighbor Classifiers

Probabilistic Models with Latent Variables

Collaborative Filtering Matrix Factorization Approach

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

network of simple neuron-like computing elements

Advanced Artificial Intelligence Classification

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Presented by Wanxue Dong

Machine Learning – a Probabilistic Perspective

EM Algorithm and its Applications

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Machine Learning for Cyber

Presentation transcript:

Salvatore giorgi Ece 8110 machine learning 5/12/2014 Data Classification: Gaussian Mixture Models, k Nearest Neighbor, Neural Networks, and Topological Data Analysis Salvatore giorgi Ece 8110 machine learning 5/12/2014

Gaussian Mixture Models An iterative clustering method Formed by combining multivariable Normal density components The Matlab function we use fits data using an Expectation Maximization (EM) algorithm Figures taken from Duda and Hart

Gaussian Mixture Models: Algorithm First, we compute the sample means of each class in the training data Use fitgmdist function for total training data with regularization parameter and a number of mixtures Number of mixtures is always a multiple of 11 or 5, the number of classes corresponding to data set 1 and 2, respectively The regularization parameter ensures estimated covariance matrices are positive definite We then find the smallest distance between each class sample mean and each mixture Assign to each mixture the class associated with this minimum distance Use the cluster function to cluster our test data Count number of incorrect classifications Probability of Error = number of incorrect class assignments / number of test vectors

Gaussian Mixture Models: Results Data 1 Minimum Error = 54.4% Number of Mixtures per Class = 21 Regularization Variable = 0.001

Gaussian Mixture Models: Results Data 2 Minimum Error = 27.1% Number of Mixtures per Class = 11 Regularization Variable = 1

Figures taken from Wikipedia K Nearest Neighbor A non-parametric classification method Object is classified by majority vote of the class assignments of the k closest elements We use a Euclidean distance metric Figures taken from Wikipedia

K Nearest Neighbor: Algorithm Use knnsearch function with training data, test data, and k Returns a vector where each row contains index of the k nearest neighbors in training set for the corresponding row in Y Since we know the classes of each test vector, we can assign classes to the above output based on the index We then take a majority vote of the classes from each of the k neighbors This majority vote is then compared to the actual class Probability of Error = number of incorrect class assignments / number of test vectors

K Nearest Neighbor: Results Data Set 1 Minimum Error = 39.3% K = 6

K Nearest Neighbor: Results Data Set 2 Minimum Error = 24.9% K = 45, 47, and 48

Figures taken from Duda and Hart Neural Networks A computational model inspired by Neuroscience A large number of simple computational devices are interconnected Proven that a neural network with an arbitrary number of hidden layers, each containing a sigmoidal neural function, can approximate any N- dimensional continuous function Figures taken from Duda and Hart

Neural Networks: Algorithm Architeture and Neural Functions kept constant Single hidden layer with Tansig neural function / Single output layer with Softmax neural function Vary number of neurons in hidden layer: [1, 5, 10, 100, 1000, 10000] Training data is split into three sets: training set, validation set, and test set Vary percentage of training set: [60, 70, 80, 90, 95] Remaining data split 50/50 between validation and test set Vary training function: [trainlm, trainbr, trainscg, trainrp]

Neural Networks: Results Data Set 1 Percentage of Data Used for Training Number of Neurons in Hidden Layer 60 70 80 90 95 1 71.0 72.3 71.2 72.6 70.7 5 56.5 58.8 50.9 56.2 10 48.6 54.9 55.4 43.8 50.4 100 40.4 44.1 42.5 43.0 42.0 1000 48.8 45.9 45.1 44.6 46.4 10000 69.1 77.0 68.3 87.9 These results are for the Scaled Conjugate Gradient Back Propagation (trainscg) training method, which is the default setting.

Neural Networks: Results Data Set 2 Percentage of Data Used for Training Number of Neurons in Hidden Layer 60 70 80 90 95 1 58.0 57.7 56.3 56.6 5 24.3 25.1 25.7 32.6 26.0 10 21.7 22.3 22.6 24.0 31.4 100 24.6 24.9 23.4 25.4 1000 28.0 30.0 28.3 10000 30.3 These results are for the Scaled Conjugate Gradient Back Propagation (trainscg) training method, which is the default setting.

Comparison of GMM, kNN, and NN DATA SET 1 GMM: 54.4% kNN: 39.3% NN: 40.4% DATA SET 2 GMM: 27.1% kNN: 24.9% NN: 21.7%

Topological Data Analysis How does one visualize high dimensional data? Can one infer high dimensional structure from low dimensional representations? How can one infer global (possibly continuous) structure from local discrete points? Tools from Algebraic Topology can attempt to answer these questions, using the JavaPlex software within Matlab Image taken from Robert Ghrist ‘Barcodes: The Persistent Topology of Data’

Topological Data Analysis: Preliminaries Simplicial Complex A space formed by gluing together points, lines, and faces. Homology Group For a space X and integer k we assign a vector space Hk(X). For a continuous function on spaces f: X →Y, we get a map on homology groups Hk(f): Hk(X) → Hk(Y) Betti Number Rank of the Homology Group. Informally, the kth Betti Number refers to the number of k dimensional holes in a space. Image taken from Robert Ghrist ‘Barcodes: The Persistent Topology of Data’

Topological Data Analysis: Preliminaries Filtered Complex A collection of ordered complexes, which is ordered by containment. Persistent Homology Computation of topological features of a space at different spatial resolutions. Barcodes Way of viewing the persistence as the spatial resolution increases. Image taken from Robert Ghrist ‘Barcodes: The Persistent Topology of Data’

Topological Data Analysis: Results Fig: Total Training Set Fig: Class 1 in Training Set Fig: Class 7 in Training Set Class Total 1 2 3 4 5 6 7 8 9 10 11 0-Betti Number

Thank you. Questions?