Image Classification Painting and handwriting identification

Image Classification Painting and handwriting identification
Bits, Please! Garrett Neilson, Cody Rountree, Anthony Smith, Travis Tibbetts

Informal objective Classify a painting or handwritten digit correctly when presented only with the image.

Formal Objective Pre-processing stage
Define a set I = {w1, w2, … , wn}, containing n vectors, wi, of size m, representing raw pixel data from images of paintings. Define a dimension reduction transformation from the input space R^m, where set I resides, to the latent feature space of I, R^d. Δ : R^m ==> R^d, where d << m Perform the transformation Δ on each element of set I to produce a set X = {x1, x2, … , xn}, containing i = 1 … n feature vectors, xi, of size d The transformation Δ is carried out by the encoding stage of a trained convolutional autoencoder. Assumption: Empirical evidence shows that neural networks calculations can be thought of as continuous mappings of information from space to space (homeomorphic topological transformations). Δ Can be thought of as the specific mapping that attempts to minimize dimensionality while preserving invariances in the information.

Formal Objective Classification stage
Given a set S = {P1, P2, … , Pn}, containing i = 1 ... n pairs, where each pair, Pi = (xi, yi), contains a feature vector of a painting, xi (from set X), and its corresponding style label, yi, correctly predict the style label yi when presented with the feature vector xi, for any arbitrarily chosen pair, Pi ∈ S for i...n Classification stage

Disclaimer: Existence of vaguely labeled data
Labeled as: Portrait Labeled as: Landscape Labeled as: Cityscape Looks like: Sketch & Study Looks like: Abstract Looks like: Abstract Labeled as: Abstract Labeled as: Sketch & Study Labeled as: Religious Looks like: Sketch & Study Looks like: Sketch & Study Looks like: Portrait

Pre-processing: Convolutional Autoencoder
z

Post training ??? Δ:

Rebuilding results – The bad (1 epoch)

Rebuilding results – The good (300 epochs)

Visualizing the feature space
Approximate view of the feature space

First three algorithms:
1x8192 MLP NN SVM kNN 256x256 { Δ

The last algorithm CNN The only algorithm that allows connection between the classification network (right) and the encoding network (left) for extra fine tuning

The Analysis z

Data Pre-Reqs The painting data was vaguely labeled and extremely large, so we used the MNIST data (40k images of handwritten digits for training and 10k for testing) to test time complexity and the efficacy of the pipeline. The painting data consists of 40k paintings total (~32k training, ~8k test) Every algorithm uses the same test data for both MNIST and painting dataset. Every algorithm uses the same images for the incremental values of n while testing time complexity. Every reference to n in this PowerPoint refers to the number of images used to train / fit the algorithm All time complexity graphs time ONLY the time it takes to train / fit the algorithm and does not include the time it takes to predict outcomes (except for k-NN where there is no train/fit)

Algorithm 1: Multi-layered Perceptron Neural Network
MLP NN Uses the feature space as input as if it doesn't know the encoding network exists

Big O analysis, confusion matrix, and accuracy
MLP NN's, at their core, use matrix multiplication, so the time complexity can be shown as O(n3), where n represents the dimensions of the theoretical square matrix used as input. However, the n we used for analysis purposes was defined as the number of images we used to train the network. In this case, the MLP NN scales linearly with n and has Big O(n)

MNIST Accuracy vs. N MNIST Time vs. N MLP Neural Network MNIST results
> > 6 Neural Network learning rate, ADAM optimizer, 100 epochs, 128 batch size, 40k Training examples, 10k Testing examples

MNIST Data Accuracy and Loss during Training
Algorithm Specific Metrics MNIST Data Accuracy and Loss during Training Accuracy: Training Data (top). Testing Data (bottom) Loss: Training Data (top). Testing Data (bottom)

Painting Data Accuracy and Loss during Training
> > 6 Neural Network learning rate, ADAM optimizer, 100 epochs, 50 batch size, 40k Training examples, 10k Testing examples Accuracy: Training Data (top). Testing Data (bottom) Loss: Training Data (top). Testing Data (bottom)

Painting Data Confusion Matrix
After 100 epochs Test accuracy: Click to add text

Algorithm 2: Convolutional Neural Network
Uses the feature space as an intermediate step as if it was any other trainable layer

Big O analysis - CNN There are NUMEROUS different big O analysis of CNN's. When dealing with algorithms that may or may not converge to a solution after an arbitrary amount of iterations, it is difficult to pin down a single big O metric. However, the n we used for analysis purposes was defined as the number of images we used to train the network. In this case, the CNN scales linearly with n and has Big O(n) This is identical for the reasoning used in the MLP network

MNIST Accuracy vs. N MNIST Time vs. N 
Convolutional Neural Network MNIST results MNIST Time vs. N MNIST Accuracy vs. N > > 6 Neural Network attached to the encoding layer of a convolutional autoencoder learning rate, ADAM optimizer, 100 epochs, 128 batch size, 40k Training examples, 10k Testing examples. There are 11m total trainable weights

Confusion matrix on MNIST data for CNN
0.9822 total accuracy

MNIST Data Accuracy and Loss during Training
Algorithm Specific Metrics MNIST Data Accuracy and Loss during Training Accuracy: Training Data (top). Testing Data (bottom) Loss: Training Data (top). Testing Data (bottom)

Painting Data Accuracy and Loss during Training
Too large to describe structure learning rate, ADAM optimizer, 100 epochs, 50 batch size, 40k Training examples, 10k Testing examples Accuracy: Training Data (top). Testing Data (bottom) Loss: Training Data (top). Testing Data (bottom)

Confusion Matrix for Painting Data
Needs to be converted

Algorithm 3: Linear Support Vector Machine
Performed directly in the feature space we showed earlier

SVM Big O Architecture: Creates an optimal hyperplane in the input space that separates the data such that the distance between the plane and closest datapoints for 2 separate classes on either side are maximized. Time Complexity of one SVM: O(n²) because of two main for-loops in the algorithm n is the number of images used to find the equation of the hyperplane. The scikit SVC library uses a “one-against-one” approach for multi-class classification. We are using 10 classes and there are x_class * (x_class - 1) / 2 SVMs created. In total there are 45 SVMs created. Therefore, Big O(n²+…+n²+n²+n²) = Big O(n²)

z MNIST Data Time vs. n of the Linear SVM on the MNIST hand written digits data set. Quadratic curve Big O(n²) Accuracy of the Linear SVM on the MNIST hand written digits data set. Most accurate data point: 95.04% at n=36000

Linear SVM on MNIST The confusion matrix shows a count of what the numbers were classified as. The confusion matrix contains all test data, however there is not an equal number of each class because it is a random sample that is representative of the training data. The ideal matrix should show the max number on the diagonal. The most mistakes were made when classifying 4s and 9s. Accuracy Score: Program run time: seconds.

Algorithm Specific Metrics
z We included a non-linear support vector machine for comparison purposes, we did not perform any in depth analysis of its inner workings. It finds a non-linear separator in the feature space rather than a hyperplane Time vs. n of the Non-Linear SVM on the MNIST hand written digits data set. Accuracy of the Non-Linear SVM on the MNIST hand written digits data set. Most accurate data point: 94.7% at n=40000 The data was so well linearly separated due to the preprocessing that the Non- Linear SVM had nearly identical results as the Linear SVM, the main difference between the two was the longer runtime of the Non-Linear SVM (increase of 40seconds at n=40k)

Non-Linear SVM on MNIST
Most mistakes were again made on 4s and 9s. Accuracy score: 0.947 Program run time: seconds.

Linear SVM painting confusion matrix
Program run time: seconds.

Algorithm 4: k-Nearest Neighbors
Performed directly in the feature space we showed earlier

K-Nearest Neighbor Type of learning: Supervised Architecture:
The algorithm does not learn, transform, or extrapolate the data as a function like the two Neural Networks. Rather, this technique operates locally within the vector space provided by the training set vectors. K-NN uses distance between vectors to measure similarity, where a distance of 0 would be a perfect copy, and large distances indicate dissimilarity. Time Complexity of k-NN: O(n²) This algorithm has two main steps: for each new vector, calculate the distance to every other vector in the space. This naturally leads to having a Big O = Big O(n2) which is backed up by the graph below

K-NN MNIST Graphs Time in Seconds vs. n of the KNN on the MNIST hand written digits data set. Quadratic curve Big O(n²) Accuracy of the KNN on the MNIST hand written digits data set.

KNN on MNIST The confusion matrix shows a count of what the numbers were classified as. The confusion matrix contains all test data, however there is not an equal number of each class because it is a random sample that is representative of the training data. The ideal matrix should show the max number on the diagonal. The most mistakes were made when classifying 0s, 3s and 9s. K=3

Algorithm Specific Metrics
K values versus Accuracy (MNIST) K values versus Accuracy (Paintings) precision Portrait 0.68 Landscape 0.43 Cityscape 0.20 Abstract 0.37 Religious 0.24 Sketch 0.13 [[249 83 9 43] [ 22 1 177 39] [ 9 58 12 4 87 31] [ 48 44 4 16] [ 35 41 10 6 222 51] [ 10 9 4 2 63 26]]

Comparative Accuracy of All Algorithms

Comparative Timing of All Algorithms
z z

Questions Besides reducing the size, what is the other main reason we chose to use a convolutional autoencoder to pre-process our images? Answer: The CAE helps to retain spatial information, something that the classification algorithms otherwise couldn't do. Why can there be such drastically different Big-O's for algorithms like Neural Networks? Answer: Choosing n to represent different things within the algorithm can massivley change the time complexity. The convolutional neural network is a special type of neural network for use on what type of data? Answer: Images A linear support vector machine getting a high accuracy score is indicative of data that is ______ Answer: highly separated What parameters of a k-NN model can you choose? Answer: Number of neighbors being k, distance function

Image Classification Painting and handwriting identification

Similar presentations

Presentation on theme: "Image Classification Painting and handwriting identification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Image Classification Painting and handwriting identification

Similar presentations

Presentation on theme: "Image Classification Painting and handwriting identification"— Presentation transcript:

Similar presentations

About project

Feedback