Download presentation
Presentation is loading. Please wait.
1
Image recognition DFE implementation
Student: Milana Prodanov 3040/2015 Advisor: Professor Veljko Milutinović
2
Discription of the Algorithm Used
Selection of images for training and test Training of the model with a large enough training set (60000) Testing the model with the test images (10000) The above setting does correct recognition in about 91% of the cases chosen here 2/19
3
Training and Test Images Used
Each image contains a single digit Each image is of the size 28*28 pixels Each digit is described with only one parameter: The intensity In the general case: Intensity related calculations are based on the standard RGB, using the conversion into the CIE XYZ model, with only the Y component extracted In the specific case of this app: There was no need for the extraction of the Y component, since it was already extracted in the training and test sets used (the inherent characteristic of the MNIST base utilized) Digit intensity is represented with eight bits, ranging from white (0), over the entire gray range, tillI black(255) Each image has a label attached; labels of the example below are 5, 0, 4, 1 3/19
4
Training of the Model (1/4): The Cross Entropy
The first step: Defining how good the model is An alternative issue is the cost of the model, or how bad the model is The cost of the model is measured by the Cross Entropy Introduced for data compression (Huffman code); used in machine learning (in a wide plethora of applications from quantum physics, all the way till popular gambling) The generalized formula: y is the predicted probabily distribution of the model used; y‘ is the actual probabilty distribution, which is the training target The goal is to create the minimal reasonable difference between y and y‘ in order to achieve the minimal cost of the Cross Entropy The end result of the training process are parameters of the output y (they form a distribution) Generation and tuning of the parameters to create the output y is the goal of the steps to follow 4/19
5
Training of the Model (2/4): The Backpropagation
Distribution y is the only output of the computation graph Parameters are generated using the backpropagation algorithm (an alternative is the forward propagation algorithm, which is considered less effective) An example of a computation graph: (a + b)*(b + 1) 5/19
6
Training of the Model (3/4): The Gradient Descent
The third step is tuning of the parameters (this minimizes the cross entropy); this is done using the gradient descent algorithm In other words, the gradient descent algorithm searches for the local minimum, step by step, using the steepest direction of multidimensional function 6/19
7
Training of the Model (4/4): The Training Outcome
Results of the training include: Weights Blue pixels represent positive weights: the more of those, the higher the likelihood that the image belongs to a given class Red pixels represent negative weights: the more of those, the lower the likelihood that the image belongs to a given class Biases 7/19
8
Production on the model (1/4): The Process Phases
Steps in determing the class of a test picture Calculating the weighted sum of pixel intesities Applying softmax regresion in order to convert weighted sums into probabilities Classification by choosing the greatest probability 8/19
9
Production on the model (3/4): The Weigthed Sum
Hypothesies that test picture x belongs to class i is equal to given equition j is an index for summing over the pixels of test picture x b is the bias, or classification tolerance on the given inputs 9/19
10
Production on the model (4/4): The Softmax Regression
Softmax regression is used when an object has more then 2 hypothesis, which is the case here Softmax regression turns hypothesies into values ranging [0, 1]; in other words, hypothesies turn into probabilties 10/19
11
The hybrid aproach: TensorFlow on CPU for training Weighted sums on DFE for production
TensorFlow is used for training, one time only Maxeler is used for picture classification 11/19
12
Classification (1/7): Implementation steps
DFE performs calculation of weighted sums, which is based on matrix multiplication The first matrix contains pixels of all test pictures Each row corresponds to one test picture The second matrix contains weights of all classes Each row corresponds to one class (classes are 0, 1, 2, …, 9 respectively) CPU performs softmax regression over DFE outputs 12/19
13
Classification (2/7): DFE Problem
Weighted sum preformed in DFE encounters a performance issue since the current sum is dependent on results in the previous iteration This produces a need to stall DFE for about 13 ticks, which is necessary for the last sum to pass through the computation pipeline Solution is to modify the order of input data so that the next sum doesn't depend on results of the previous iteration (Loop tiling method) 13/19
14
Classification (3/7): DFE Solution
Increasing dependency distance by changing the input order Black arrows show data dependencies Red arrows show input order Each input will be sent to Kernel C ticks after its dependant inputs had been processed, since calculation lasts at least C ticks (C >= 13) Problem: Cycles in Kernel 14/19 Solution: Loop tiling
15
Classification (4/7): Kernel Code
// init parameters int picSize = X; int romSize = picSize * REF_NUM; int addrBits = MathUtils.bitsToAddress(romSize); DFEVectorType<DFEVar> vectorType = new DFEVectorType<DFEVar>(floatType, REF_NUM); // Input DFEVar input = io.input("input", floatType); CounterChain chain = control.count.makeCounterChain(); DFEVar x = chain.addCounter(X, 1); // Set up counter for innermost, y loop, except we count 0..C // instead of yy..yy+C chain.addCounter(C, 1); // yy // Fill the rom Memory<DFEVar> mappedRom = mem.alloc(floatType, romSize); mappedRom.mapToCPU("mappedRom"); // Loop itself DFEVector<DFEVar> carriedSum = vectorType.newInstance(this); DFEVector<DFEVar> sum = DFEVector<DFEVar> newSum = // Address for accessing rom DFEVar addr = x.cast(dfeUInt(addrBits)); for(int i=0; i < REF_NUM; i++){ DFEVar romOut = mappedRom.read(picSize * i + addr); sum[i] <== ((x === 0) ? 0.0 : carriedSum[i]); newSum[i] <== input * romOut + sum[i]; carriedSum[i] <== stream.offset(newSum[i], -C); } // We have a controlled output to deliver the sum at the end of each row io.output("output", newSum, vectorType, x === (X - 1)); 15/19
16
Classification (5/7): Kernel code implemented on CPU
void DigitRecognitionCPU(float *input, double *rom, float *output) { int count = 0; for (int yy=0; count < X*Y; yy += C) { for (int x=0; x<X; x += 1) { for (int y=yy; y < yy+C; y += 1) { if (x == 0) { for(int k=0; k<REF_NUM; k++) output[y * REF_NUM + k] = 0.0; } for(int k=0; k<REF_NUM; k++){ output[y * REF_NUM + k] += input[count] * rom[k * X + x]; count++; printf("DigitRecognitonCPU\n"); 16/19
17
Classification (6/7): Softmax layer and final step
CPU invokes extractResult() on DFE outputs extractResults() puts data through softmax layer and classifies test picture according to the maximum probability given void convertToSoftmax(float* rawOutput, float* biases); void extractResult(float* rawInput, int* output, float* biases) 17/19
18
Classification (7/7): The Final Kernel graph
18/19
19
References The MNIST Data http://yann.lecun.com/exdb/mnist/
The Cross Entropy The Backpropagation The Gradient Descent Understanding the Color Models The TensorFlow Theory The Maxeler Tutorials AccelerationTutorial Loops and Pipelining
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.