Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hardware Descriptions of Multi-Layer Perceptions with Different Abstraction Levels Paper by E.M. Ortigosa , A. Canas, E.Ros, P.M. Ortigosa, S. Mota , J.

Similar presentations


Presentation on theme: "Hardware Descriptions of Multi-Layer Perceptions with Different Abstraction Levels Paper by E.M. Ortigosa , A. Canas, E.Ros, P.M. Ortigosa, S. Mota , J."— Presentation transcript:

1 Hardware Descriptions of Multi-Layer Perceptions with Different Abstraction Levels
Paper by E.M. Ortigosa , A. Canas, E.Ros, P.M. Ortigosa, S. Mota , J. Dı´az Paper Review by, Ryan MacGowan This paper was written in 2006

2 Topics What does this even mean?? Speech Recognition
Artificial Neural Networks Solutions Results Likes and Dislikes Conclusion - When I first read the title, I was a little confused, so The first thing that I am going to talk about is the actual meaning of the title of this paper. Next I am going to talk about speech recognition and why we are developing this solution Then I am going to talk artificial neural networks and the neurons that make them what they are Solutions and results Likes and dislikes Conclusion

3 Multi-Layer Perception
Multi-Layer Perception (referred to as MLP) is a type of standard feed forward neural network which uses at least 3 layers (input, hidden, and output) Multi-Layer Perception is simply a standard feed-forward neural network which contains an input, hidden, and output layer

4 Abstraction Levels Different abstraction levels just means that the solution will be realized using 2 different methods, one using low-level VHDL, and another using higher-level Handel -C In the case of this network, the different abstraction levels just means that the design in created using 2 methods, LOW level VHDL, and HIGH Level Handel-C

5 Translation Hardware Descriptions of Multi-Layer Perceptions with Different Abstraction Levels …… REALLY MEANS ………. FPGA Implementation of two Neural Networks using VHDL, and Handel-C, to solve a problem In this case the problem that is solved is Speech Recognition So, to translate this title into something that is easier to understand… This paper is about FPGA implementation of a neural network using VHDL and Handel C to solve a problem The problem this paper is addressing is Speech Recognition

6 Speech Recognition Due to the increasing power of FPGAs, solutions for Speech Recognition can be designed using an Artificial Neural Network built right into the FPGA. Useful for applications in cars, GPS, toys, and other embedded systems where control over speech would be useful A computer samples audio, and this waveform is converted into a vector using vector bank and prediction analysis. This vector is what is sent to the neural network. The Neural Network computes which word was spoken as the output -FPGAs and Neural Networks are both performance oriented techniques, they lend themselves well to Speech recognition -

7 Artificial Neural Network
All solutions were realized using Artificial Neural Network presented here. 10 vectors with 22 features are 220 input data values, sent to the 24 hidden neurons for computation, whose output are sent to the output neurons which classify the input and provide an out If a spoken command falls in a class, we expect that output node to give a high value -You can see that 220 inputs are sent to the hidden neurons, and 24 outputs from these hidden neurons are passed to the output neurons -At the top you can see the make-up of each of these neurons , which the functional unit, and the activation function which provides the output

8 MLP Hardware Implementation
In order to ensure maximum accuracy, a number of neural network structures were tested. The best result of 96.3% accuracy was present when 24 hidden neurons were used.

9 The Artificial Neuron (Functional Unit)
This is the functional unit used in the implementations. The first 8 bit input is the input value, and the second represents the connection weight for that input The output of the multiplier is sign extended due to the maximum size of the summation of weight values in EQ1 So the first part of the neuron is the Functional unit Takes 2, 8-bit inputs, the input value and the input connection weight These values are multiplied to give a 16-bit output Since up to 220 inputs and weights are multiplied, and all of these values are added togeather, a sign extension of 23 bits is required to make sure the sum can be computed

10 The Artificial Neuron (Activation Function)
The output of this functional unit presented in the previous slide is then passed into the sigmoid activation function, which gives an 8-bit output based on the 23-bit input This 8-bit output can either be passed to another layer of hidden neurons, or passed to the output neurons The second part of the neuron computation involves the activation function. This activation function is known as a Sigmoid Activation function, and it’s output is shown in the following graph A sigmoid activation function is useful as it can be used in many applications, and provides an output that is between 0 and 1

11 Serial Design - So here we see the serial design. This design uses a single weight RAM, Functional Unit, and Activation Function. - 2 counters are used, a 5 bit one to represent the address for the hidden neurons , and one to represent the addresses for the input values For the RAM weights, both addresses need to be combined in order to be able to reference both sets of weight values This serial design has a serious bottlneck at the RAM and Functional unit, as each value for the sum must be calculated sequentially For the hidden later, this functional unit must look through 220 times for EACH neuron, and each neuron must be summed sequentially Once each neuron sum is calculated , it is sent through the activation function and then stored in the hidden RAM input

12 Parallel Design We can enhance the speed of the design by placing the Ram containing the weights , and the functional units in parallel The output sums from these functional units are stored in a register, and selected by the multiplexer to be sent to the activation function This is only a partial parallel design as the outputs of each layer are still computer sequentially - For the parallel design, we can see that some of the bottlenecks have been eliminated by placing the weighted RAM for each neuron in parallel, a long with 24 functional units witch are also in parallel This allows the sums for each of the 24 neurons sums to be calculated in parallel If we consider the last case with the hidden neuron layer, the calculation of the sum will be done 24X quicker, as it only takes 220 iterations to get the sum for every neuron For this design, 24 individual registers are used instead of a RAM block because this allows quicker access times Results presented later will show a massive increase in speedup

13 Handel-C Design The Handel-C design is done in both serial and parallel, as shown below Serial Parallel NumHidden is the number of hidden neurons(24), NumInput is the number of input values (220), W is the array containing the weights, In is the input array, and Sum is the sum of the weights multiplied by the inputs As we can see here, the Handel C design allows extraordinarily easier parallelism due to the ability to use PAR statements

14 Handel-C RAM location In order to try to test more solutions and come up with an optimal one, different RAM types were used in the Handel-C Design: 1) Only Distributed RAM Blocks 2) A combination of Embedded and Distributed RAM Blocks 3) Only Embedded RAM Blocks Three types of RAM can be used for Handel-C Distributed RAM blocks which makes use of the LUTs on contained in the CLBs A combination of embedded RAM blocks and distributed RAM, where the Embedded RAM is used to contain the WEIGHTS for the large array (220X24 input weights), and the distributed is used for the rest Only embedded RAM blocks

15 Results Here are the results
-Here we can see the results from both implementations. -The top table shows the VHDL implementation results -The bottom table shows the Handel–C results -As you can see, the parallel implementations provide a speedup universally of approximately 17X -The VHDL implementation has about 1.2X higher throughput than the Handel-C implementation -The VHDL implementation uses at least 30% less resources For the Handel C design, we see that the design using the distributed RAM is the best, followed by the combination RAM, and finally the Embedded RAM design. This is due to the embedded RAMs capacity not being able to be used properly, as one neuron RAM entry takes slightly over 50% As the resources and systems gates decrease, a higher frequency is able to be obtained due to less propagation delay - The frequency is the major factor in the change in throughput between designs

16 Throughput of the Designs
When looking at the throughput, the VHDL design provides the best in both serial and parallel cases. HC(a) provides the highest throughput due to the higher frequency that can be achieved

17 Performance Cost This trend continues as we also examine the performance cost. The VHDL design is the best as it provides the highest throughput while also using the least number of gates Handel-C designs have at least a 1.6X higher performance cost

18 Number of Neurons These graphs show the linear relationship between the number of neurons in the hidden layer and the amount of resources used Since our design only uses 24 hidden neurons, the resources are manageable, however our design is also very small (only 10 words) Here we can see how the number of hidden neurons directly relates to the amount of resources required. On the FPGA used in this study, only a 64-neuron design could be implemented due to memory contraints

19 Downsides of VHDL Design
While these tables and graphs above show that the VHDL design is superior in terms of throughput and performance cost, there are also drawbacks which we must consider. Design time is 10X longer on the VHDL design Exploring different solutions with a VHDL Design takes considerably longer as it requires an entire new control unit to be designed each time.

20 Hardware Used The FPGA used is a Virtex-E 2000
CLBs contain 4 LCs(Logic Cells), each with 4- input function generator, carry logic, storage element The 4 LCs are placed in 2 slices, each slice provides 5-6 input function generators Each LC has a 4-input LUT which provides a 16x1 memory block Also contains EMB memory blocks

21 Software Used VHDL was coded with FPGA Advantage Tool 5.3 from Mentor Graphics DK Design Suite was used for Handel-C implementation Both designs were placed and routed using ISE Foundation Tool 3.5i

22 Likes The relevance to labs performed in the course
The comparison between parallel and serial for all types of implementation Great description of neural network covering all aspects The application is very practical with todays electronic culture

23 Dislikes More detail on actual voice recognition system – specifically the computer used to preprocess the audio. Paper is not entirely modern (2006) Some sources used are even older ( ) The system is unrealistically small (10 different words), with no discussion about viability in more complex environment Not much information given on VHDL. No pseudocode given, no simulations given.

24 Conclusion Using Neural Networks for Speech recognition allows compact embedded systems to be developed Parallel processing allows a speedup of up to 17X over a serial implementation VHDL results in an implementation which is X faster than the Handel-C implementation When using Handel-C, it is important to know the most optimized type of RAM for your application, in this case being distributed RAM. Handel C designs have a 1.6X higher performance cost Computation of output takes only 13-16ms

25 Questions?


Download ppt "Hardware Descriptions of Multi-Layer Perceptions with Different Abstraction Levels Paper by E.M. Ortigosa , A. Canas, E.Ros, P.M. Ortigosa, S. Mota , J."

Similar presentations


Ads by Google