CAMCOS Report Day December 9th, 2015 San Jose State University

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Data Mining Classification: Alternative Techniques
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
An Overview of Machine Learning
Indian Statistical Institute Kolkata
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Principal Component Analysis
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
嵌入式視覺 Pattern Recognition for Embedded Vision Template matching Statistical / Structural Pattern Recognition Neural networks.
Outline Separating Hyperplanes – Separable Case
This week: overview on pattern recognition (related to machine learning)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Handwritten digit recognition
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
CSE 185 Introduction to Computer Vision Face Recognition.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Dimensionality reduction
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
Data Transformation: Normalization
Support Feature Machine for DNA microarray data
Semi-Supervised Clustering
Machine Learning Basics
Overview of Supervised Learning
CAMCOS Report Day December 9th, 2015 San Jose State University
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Machine Learning Week 1.
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
K Nearest Neighbor Classification
REMOTE SENSING Multispectral Image Classification
Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.
COSC 4335: Other Classification Techniques
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Generally Discriminant Analysis
CS4670: Intro to Computer Vision
Announcements Project 2 artifacts Project 3 due Thursday night
Parametric Methods Berlin Chen, 2005 References:
Linear Discrimination
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification

On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Wilson A. Florero-Salinas Carson Sprook Dan Li Abhirupa Sen Data Set Handwritten Digits Team 2 Xiaoyan Chong Minglu Ma Yue Wang Sha Li Data Set Spring Leaf Team Advisor: Dr. Guanliang Chen

Outline What is Classification? The two Kaggle Competitions Data Preprocessing Classification Algorithms Summary Conclusions Future Work

What is Classification? start with a data set whose categories are already known this data is called the training set new data becomes available whose categories are unknown this data set is called the test set Goal: use the training set to predict the label of a new data point. INSERT PICTURE HERE

The Kaggle Competition Kaggle is an international platform that hosts data prediction competitions Students and experts in data science compete Our CAMCOS team entered two competitions Team 1: Digit Recognizer (Ends December 31st) Team 2: Springleaf Marketing Response (Ended October 19th)

Team 1: The MNIST1 data set 1 subset of data collected by NIST, the US's National Institute of Standards and Technology

Potential Applications Banking: Check deposits Surveillance: license plates Shipping: Envelopes/Packages

Initial Challenges and work-arounds High dimensional data set Each image is stored in a vector Computationally expensive digits are written differently by different people left-handed vs right-handed Preprocess the data set Reduce dimension→ increase computation speed apply some transformation to the images→ enhance features important for classification

Data Preprocessing Methods In our experiments we have used the following methods Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) 2D LDA Nonparametric Discriminant Analysis (NDA) Kernel PCA t-Distributed Stochastic Neighbor Embedding (t-sne) parametric t-sne kernel t-sne deskewing

Deskewing Before: After:

Principal Component Analysis (PCA) Using too many dimensions(784) can be computationally expensive. Uses variance as dimensionality reduction criterion Throw away variables with least amount of variance

Linear Discriminant Analysis

LDA and PCA on pairwise data PCA: LDA:

Classification Methods In our experiments we have used the following methods Nearest Neighbors Methods (Instance based) Naive Bayes (NB) Maximum a Posterior (MAP) Logistic Regression Support Vector Machines (Linear Classifier) Neural Networks Random Forests Xgboost

K nearest neighbors k neighbors decide the group to which a test data belongs The best result for our data set with k = 8 is 2.76% misclassification class 2 class 1 k = 5 The test data falls in class 1. The neighborhood of the test data includes both the classes. However, majority belongs to class 1. Prediction = class 1

K means Situation 2: Situation 1: Data is well separated. Each class has a centroid/average. The test data is closest to centroid 3 Situation 2: Here test data is closer to centroid 1. The test data actually belongs to cluster 2 centroid 1 centroid 2 centroid 1 centroid 2 centroid 3 is predicted to belong to class 1 as it is closer to the global centroid of class 1 Test data is predicted to be from class 3.

Solution : local k means Results Some of best results came from local k means in the beginning. With k = 14 misclassification of 1.75% Local PCA + local k means gave 1.53% misclassification. With deskewing this misclassification could be reduced to 1.14% For every class local centroids are calculated around the test data. The class with the local centroid closest to the test data is the prediction.

Support Vector Machines (SVM) Suppose you are given a data set in which their classes are known want to classify to construct a linear decision boundary to classify new observations

Support Vector Machines (SVM) Decision boundary chosen to maximize the separation m between classes

SVM with multiple classes SVM is a binary classifier how can we extend SVM to more than two classes? One method: One vs Rest Construct one SVM model for each class Each SVM separates one class from the rest

Support Vector Machines (SVM) What if data cannot be separated by a line? Kernel SVM

Parameter selection for SVMs It is not obvious what parameters to use when training multiple models Within each class, compute a corresponding gamma This gives a starting point for parameter selection How to obtain an approximate range of parameters for training?

An alternative approach Our approach: Separately apply PCA to each digit space This extracts the patterns from each digit class We can use different parameters for each digit group. PCA SVM(1 VS. all) Model 1 1 Model 2 9 Model 10

Some Challenges for kernel SVM •Using kNN, with k=5, gamma values for each class •Error 1.25% using kNN different gamma on each model •Error 1.2% is achieved with the averaged gamma

Known results using PCA and Kernel SVM Method Error Time SVM 1.40% 743 sec PCA + SVM 1.25% 234sec digit PCA + SVM 1.20% 379 sec LDA + local Kmeans 7.77% 660 sec

Neural Networks inspired by biology mimic the way humans learn best result 1.30%

Neural Networks: Artificial Neuron

Neural Networks: Learning

Summary of Results

Conclusion UNDER CONSTRUCTION

Future work UNDER CONSTRUCTION