Image recognition DFE implementation

Slides:

Advertisements

Similar presentations

1 RTL Example: Video Compression – Sum of Absolute Differences Video is a series of frames (e.g., 30 per second) Most frames similar to previous frame.

Advertisements

Neural Networks: Backpropagation algorithm Data Mining and Semantic Web University of Belgrade School of Electrical Engineering Chair of Computer Engineering.

Machine Learning Neural Networks

Lecture 14 – Neural Networks

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

5. 1 JPEG “ JPEG ” is Joint Photographic Experts Group. compresses pictures which don't have sharp changes e.g. landscape pictures. May lose some of the.

Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Radial-Basis Function Networks

Classification Part 3: Artificial Neural Networks

Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.

Classification / Regression Neural Networks 2

Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.

LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

EEE502 Pattern Recognition

Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.

Essential components of the implementation are:  Formation of the network and weight initialization routine  Pixel analysis of images for symbol detection.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Linear Models & Clustering Presented by Kwak, Nam-ju 1.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.

Multinomial Regression and the Softmax Activation Function Gary Cottrell.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Today’s Lecture Neural networks Training

Big data classification using neural network

Neural Network Architecture Session 2

Deep Feedforward Networks

Deep Learning Amin Sobhani.

Ananya Das Christman CS311 Fall 2016

The Gradient Descent Algorithm

Artificial Neural Networks

Gradient-based Learning Applied to Document Recognition

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Mastering the game of Go with deep neural network and tree search

Applications of Deep Learning and how to get started with implementation of deep learning Presentation By : Manaswi Advisor : Dr.Chinmay.

Boosting and Additive Trees

1-Introduction (Computing the image histogram).

Multimodal Learning with Deep Boltzmann Machines

A Simple Artificial Neuron

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Classification with Perceptrons Reading:

Intelligent Information System Lab

Introduction to CuDNN (CUDA Deep Neural Nets)

Neural Networks and Backpropagation

A brief introduction to neural network

Classification / Regression Neural Networks 2

Fitting Curve Models to Edges

Lecture 11. MLP (III): Back-Propagation

Hidden Markov Models Part 2: Algorithms

Multiple Instance Learning: applications to computer vision

Logistic Regression & Parallel SGD

Smart Robots, Drones, IoT

network of simple neuron-like computing elements

MNIST Dataset Training with Tensorflow

Emre O. Neftci iScience Volume 5, Pages (July 2018) DOI: /j.isci

Neural Networks Geoff Hulten.

ML – Lecture 3B Deep NN.

Digital Image Processing

Learning Chapter 18 and Parts of Chapter 20

Image Classification & Training of Neural Networks

David Kauchak CS158 – Spring 2019

Introduction to Neural Networks

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Principles of Back-Propagation

An introduction to neural network and machine learning

Overall Introduction for the Lecture

Machine Learning for Cyber

Presentation transcript:

Image recognition DFE implementation Student: Milana Prodanov 3040/2015 Advisor: Professor Veljko Milutinović

Discription of the Algorithm Used Selection of images for training and test Training of the model with a large enough training set (60000) Testing the model with the test images (10000) The above setting does correct recognition in about 91% of the cases chosen here 2/19

Training and Test Images Used Each image contains a single digit Each image is of the size 28*28 pixels Each digit is described with only one parameter: The intensity In the general case: Intensity related calculations are based on the standard RGB, using the conversion into the CIE XYZ model, with only the Y component extracted In the specific case of this app: There was no need for the extraction of the Y component, since it was already extracted in the training and test sets used (the inherent characteristic of the MNIST base utilized) Digit intensity is represented with eight bits, ranging from white (0), over the entire gray range, tillI black(255) Each image has a label attached; labels of the example below are 5, 0, 4, 1 3/19

Training of the Model (1/4): The Cross Entropy The first step: Defining how good the model is An alternative issue is the cost of the model, or how bad the model is The cost of the model is measured by the Cross Entropy Introduced for data compression (Huffman code); used in machine learning (in a wide plethora of applications from quantum physics, all the way till popular gambling) The generalized formula: y is the predicted probabily distribution of the model used; y‘ is the actual probabilty distribution, which is the training target The goal is to create the minimal reasonable difference between y and y‘ in order to achieve the minimal cost of the Cross Entropy The end result of the training process are parameters of the output y (they form a distribution) Generation and tuning of the parameters to create the output y is the goal of the steps to follow 4/19

Training of the Model (2/4): The Backpropagation Distribution y is the only output of the computation graph Parameters are generated using the backpropagation algorithm (an alternative is the forward propagation algorithm, which is considered less effective) An example of a computation graph: (a + b)*(b + 1) 5/19

Training of the Model (3/4): The Gradient Descent The third step is tuning of the parameters (this minimizes the cross entropy); this is done using the gradient descent algorithm In other words, the gradient descent algorithm searches for the local minimum, step by step, using the steepest direction of multidimensional function 6/19

Training of the Model (4/4): The Training Outcome Results of the training include: Weights Blue pixels represent positive weights: the more of those, the higher the likelihood that the image belongs to a given class Red pixels represent negative weights: the more of those, the lower the likelihood that the image belongs to a given class Biases 7/19

Production on the model (1/4): The Process Phases Steps in determing the class of a test picture Calculating the weighted sum of pixel intesities Applying softmax regresion in order to convert weighted sums into probabilities Classification by choosing the greatest probability 8/19

Production on the model (3/4): The Weigthed Sum Hypothesies that test picture x belongs to class i is equal to given equition j is an index for summing over the pixels of test picture x b is the bias, or classification tolerance on the given inputs 9/19

Production on the model (4/4): The Softmax Regression Softmax regression is used when an object has more then 2 hypothesis, which is the case here Softmax regression turns hypothesies into values ranging [0, 1]; in other words, hypothesies turn into probabilties 10/19

The hybrid aproach: TensorFlow on CPU for training Weighted sums on DFE for production TensorFlow is used for training, one time only Maxeler is used for picture classification 11/19

Classification (1/7): Implementation steps DFE performs calculation of weighted sums, which is based on matrix multiplication The first matrix contains pixels of all test pictures Each row corresponds to one test picture The second matrix contains weights of all classes Each row corresponds to one class (classes are 0, 1, 2, …, 9 respectively) CPU performs softmax regression over DFE outputs 12/19

Classification (2/7): DFE Problem Weighted sum preformed in DFE encounters a performance issue since the current sum is dependent on results in the previous iteration This produces a need to stall DFE for about 13 ticks, which is necessary for the last sum to pass through the computation pipeline Solution is to modify the order of input data so that the next sum doesn't depend on results of the previous iteration (Loop tiling method) 13/19

Classification (3/7): DFE Solution Increasing dependency distance by changing the input order Black arrows show data dependencies Red arrows show input order Each input will be sent to Kernel C ticks after its dependant inputs had been processed, since calculation lasts at least C ticks (C >= 13) Problem: Cycles in Kernel 14/19 Solution: Loop tiling

Classification (4/7): Kernel Code // init parameters int picSize = X; int romSize = picSize * REF_NUM; int addrBits = MathUtils.bitsToAddress(romSize); DFEVectorType<DFEVar> vectorType = new DFEVectorType<DFEVar>(floatType, REF_NUM); // Input DFEVar input = io.input("input", floatType); CounterChain chain = control.count.makeCounterChain(); DFEVar x = chain.addCounter(X, 1); // Set up counter for innermost, y loop, except we count 0..C // instead of yy..yy+C chain.addCounter(C, 1); // yy // Fill the rom Memory<DFEVar> mappedRom = mem.alloc(floatType, romSize); mappedRom.mapToCPU("mappedRom"); // Loop itself DFEVector<DFEVar> carriedSum = vectorType.newInstance(this); DFEVector<DFEVar> sum = DFEVector<DFEVar> newSum = // Address for accessing rom DFEVar addr = x.cast(dfeUInt(addrBits)); for(int i=0; i < REF_NUM; i++){ DFEVar romOut = mappedRom.read(picSize * i + addr); sum[i] <== ((x === 0) ? 0.0 : carriedSum[i]); newSum[i] <== input * romOut + sum[i]; carriedSum[i] <== stream.offset(newSum[i], -C); } // We have a controlled output to deliver the sum at the end of each row io.output("output", newSum, vectorType, x === (X - 1)); 15/19

Classification (5/7): Kernel code implemented on CPU void DigitRecognitionCPU(float *input, double *rom, float *output) { int count = 0; for (int yy=0; count < X*Y; yy += C) { for (int x=0; x<X; x += 1) { for (int y=yy; y < yy+C; y += 1) { if (x == 0) { for(int k=0; k<REF_NUM; k++) output[y * REF_NUM + k] = 0.0; } for(int k=0; k<REF_NUM; k++){ output[y * REF_NUM + k] += input[count] * rom[k * X + x]; count++; printf("DigitRecognitonCPU\n"); 16/19

Classification (6/7): Softmax layer and final step CPU invokes extractResult() on DFE outputs extractResults() puts data through softmax layer and classifies test picture according to the maximum probability given void convertToSoftmax(float* rawOutput, float* biases); void extractResult(float* rawInput, int* output, float* biases) 17/19

Classification (7/7): The Final Kernel graph 18/19

References The MNIST Data http://yann.lecun.com/exdb/mnist/ The Cross Entropy http://colah.github.io/posts/2015-09-Visual-Information/ The Backpropagation http://colah.github.io/posts/2015-08-Backprop/ The Gradient Descent https://www.youtube.com/watch?v=ZgXjKa0ChDw Understanding the Color Models https://en.wikipedia.org/wiki/Grayscale The TensorFlow Theory https://www.tensorflow.org/versions/master/tutorials/mnist/beginners/index.html The Maxeler Tutorials http://home.etf.rs/~vm/os/vlsi/razno/maxcompiler-tutorial%20(3).pdf AccelerationTutorial Loops and Pipelining