Download presentation
Presentation is loading. Please wait.
Published byBrooke Reeves Modified over 9 years ago
1
HSDSL, Technion Spring 2014 Preliminary Design Review Matrix Multiplication on FPGA Project No. : 1998 Project B 044169 By: Zaid Abassi Supervisor: Rolf Hilgendorf April 2, 2014
2
Background and Motivation: 1.Matrix multiplication naively carried out is unjustifiably expensive, ergo there is a need for research into an efficient algorithm for Matrix Multiplication with a parallel approach.
3
2. In application specific (in this case Matrix Multiplication) designs, as opposed to broader architectural designs, the order and magnitude of operations is known at design time thus providing a potential to save overhead that would have been incurred.
4
3. Matrix multiplication is an elementary building block of more advanced Linear Algebra Core operations on matrices such as inverting matrices and linear transformations, so the need for efficient matrix multiplication is ever greater.
5
4. Over the years matrix multiplication complexity in software has improved with specialized data structures and we aim to research inspired approaches on an FPGA implementation.
7
Our Goal To develop a matrix multiplication algorithm especially on FPGA to maximize efficiency via parallel design, while at the same time reducing power consumption as much as possible.
8
The System Top Level View
10
Processing Entity (PE)
11
PE unit
12
PE unit The controller for each PE is a FSM to regulate PE operations : storage, computation and communication (broadcasting). The controller needs to be smart and autonomously manage synchronized PE operations with handshake and global communication depending on implicit synchronization between all PEs.
13
PE unit Each PE is equipped with its own local memory for the purpose of storing entries of the multiplied matrices upon commencing and for broadcasting via same rows and columns
14
Handling Larger Matrices For handling larger matrices, we choose the possibility of breaking down the input matrices to a sequence of smaller updates using a hierarchical blocking of input matrices. Each update in the hierarchy is called a “loop”. No loop-carried dependency so we aim to pipeline outer loop to overlap current cycle’s computation along with previous cycle’s write back and next cycle’s prefetching of matrices.
15
A Problem With Larger Matrices Moving data in and out of the computational grid for each hierarchy block independently can be expensive and so we need to amortize the cost.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.