Paper Review Avelino Zepeda Martinez High Performance Reconfigurable Pipelined Matrix Multiplication Module Designer.

Slides:



Advertisements
Similar presentations
Verifying Performance of a HDL design block
Advertisements

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
DSPs Vs General Purpose Microprocessors
Programmable FIR Filter Design
Logic Synthesis – 3 Optimization Ahmed Hemani Sources: Synopsys Documentation.
© 2003 Xilinx, Inc. All Rights Reserved Looking Under the Hood.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
General information Course web page: html Office hours:- Prof. Eyal.
Architectural Optimization of Decomposition Algorithms for Wireless Communication Systems Ali Irturk †, Bridget Benson †, Nikolay Laptev ‡, Ryan Kastner.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
A Systolic FFT Architecture for Real Time FPGA Systems.
© 2004 Xilinx, Inc. All Rights Reserved Implemented by : Alon Ben Shalom Yoni Landau Project supervised by: Mony Orbach High speed digital systems laboratory.
Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
Firmware implementation of Integer Array Sorter Characterization presentation Dec, 2010 Elad Barzilay Uri Natanzon Supervisor: Moshe Porian.
VHDL Intro What does VHDL stand for? VHSIC Hardware Description Language VHSIC = Very High Speed Integrated Circuit Developed in 1982 by Govt. to standardize.
Programmable logic and FPGA
Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Students: Oleg Korenev Eugene Reznik Supervisor: Rolf Hilgendorf
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Isabel Gambin, Ivan Grech, Owen Casha, Edward Gatt and Joseph Micallef Department of Microelectronics and Nanoelectronics University of Malta.
CS3350B Computer Architecture Winter 2015 Lecture 5.2: State Circuits: Circuits that Remember Marc Moreno Maza [Adapted.
Sub-Nyquist Reconstruction Final Presentation Winter 2010/2011 By: Yousef Badran Supervisors: Asaf Elron Ina Rivkin Technion Israel Institute of Technology.
ECE 545 Project 1 Part IV Key Scheduling Final Integration List of Deliverables.
Computer Architecture
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
High Speed, Low Power FIR Digital Filter Implementation Presented by, Praveen Dongara and Rahul Bhasin.
FINITE word length effect in fixed point processing The Digital Signal Processors have finite width of the data bus. The word-length after mathematical.
Efficient FPGA Implementation of QR
Abdullah Aldahami ( ) March 12, Introduction 2. Background 3. Proposed Multiplier Design a.System Overview b.Fixed Point Multiplier.
Scott Robinson Aaron Sikorski Peter Phelps.  Introduction  FIR Filter Design  Optimization  Application  Edge Detection  Sobel Filter  Communications.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
ECE 448 – FPGA and ASIC Design with VHDL Lecture 12 PicoBlaze Overview.
Introduction to structured VLSI Projects 4 and 5 Rakesh Gangarajaiah
Class of Service Distribution SW/HW interface Clusters of VPUs Clusters of VPUs Clusters of VPUs LBS Arbitration Clusters of VPUs.
1 TPUTCACHE: HIGH-FREQUENCY, MULTI-WAY CACHE FOR HIGH- THROUGHPUT FPGA APPLICATIONS Aaron Severance University of British Columbia Advised by Guy Lemieux.
ECE 448: Lab 6 DSP and FPGA Embedded Resources (Digital Downconverter)
Scott Robinson Aaron Sikorski Peter Phelps.  Introduction  FIR Filter Design  Optimization  Application  Edge Detection  Sobel Filter  Communications.
High Speed Digital Systems Lab. Agenda  High Level Architecture.  Part A.  DSP Overview. Matrix Inverse. SCD  Verification Methods. Verification Methods.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
Final Presentation Final Presentation OFDM implementation and performance test Performed by: Tomer Ben Oz Ariel Shleifer Guided by: Mony Orbach Duration:
ECEn 191 – New Student Seminar - Session 6 Digital Logic Digital Logic ECEn 191 New Student Seminar.
Sub-Nyquist Reconstruction Characterization Presentation Winter 2010/2011 By: Yousef Badran Supervisors: Asaf Elron Ina Rivkin Technion Israel Institute.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Introduction to ASIC flow and Verilog HDL
Baseband Implementation of an OFDM System for 60GHz Radios: From Concept to Silicon Jing Zhang University of Toronto.
THE MICROPROCESSOR A microprocessor is a single chip of silicon that performs all of the essential functions of a computer central processor unit (CPU)
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
CS61C L24 State Elements : Circuits that Remember (1) Garcia, Fall 2014 © UCB Senior Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
Project 2 - MAC Nathan Paternoster Andrew O’Neil-Smith Garrett Clausen EEN 316 University of Miami 4/16/14.
VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:
Disk Drive Architecture Exploration VisualSim Mirabilis Design.
-1- Soft Core Viterbi Decoder EECS 290A Project Dave Chinnery, Rhett Davis, Chris Taylor, Ning Zhang.
Flip Flops Lecture 10 CAP
FIRST REVIEW.
Lecture 15 PicoBlaze Overview
Instructor: Dr. Phillip Jones
Prof. Sirer CS 316 Cornell University
Spartan FPGAs مرتضي صاحب الزماني.
Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #21 State Elements: Circuits that Remember Hello to James Muerle in the.
Lecture 14 PicoBlaze Overview
Lecture 16 PicoBlaze Overview
A Comparison of Field Programmable Gate
ECE-C662 Introduction to Behavioral Synthesis Knapp Text Ch
Prof. Sirer CS 316 Cornell University
Design of Digital Circuits Lab 5 Supplement: Implementing an ALU
Computer Architecture Assembly Language
Presentation transcript:

Paper Review Avelino Zepeda Martinez High Performance Reconfigurable Pipelined Matrix Multiplication Module Designer

Usage –Communication Systems –Signal and Video Processing Issues –Operations of square matrices increase as functions of n 3  Area  Speed  Power 2.- Background

3.- Matrix Multiplication

4.- Matrix Multiplication (Cont.)

Basic Matrix Multiplication block using d t Can perform any matrix multiplication –Inefficient 5.- Matrix Multiplication (Cont.)

Three types of errors –Number Representation  ADCs  Sampling Rate  Available Bits –Rounding Error  Round to Nearest Even (RNE)  Round Towards Zero, or Truncation (TRA)  Round Down (Floor)  Round Up (Ceiling)  Round Away from Zero –Algorithm/Design Error 6.- Error Analysis

7.- Error Analysis (Cont.)

Reconfigurable Matrix Multiplication Module Designer (RMD) –Designed in Pearl scripting language –Outputs:  RTL of Multiplication Module  Testbench  MATLAB files  Modelsim verification files –Designed to output RTL for FPGA and VLSI 8.- Design Overview

Three main sections –Module Designer –Area, Speed, and Error Analysis –High Speed Memory Interface 9.- RMD FPGA Design Flow

Main Design Outputs RTL –Matrix Multiplication Processing Unit (MMPU) –Memory Interface –Control Unit (CU) 10.- Module Designer

RTL created for 2x2 matrix to 2048x2048 matrix Composed of: –Matrix Multiplier Block (p-MMB) –Internal Logic 11.- Module Designer (MMPU)

Bottom-Up Design Approach Start with 2-MMB, or 2X2, which is the pipelined version of d t Insert adders after 2 -MMB blocks 12.- Module Designer (p-MMB)

13.- Module Designer (p-MMB Cont.)

14.- Module Designer (Memory Interface)

Can be created for Fixed or Variable Operation Size Designed to use Finite State Machine For variable size each operation size has a sub-FSM 15.- Module Designer (Control Unit)

RMD also generates MATLAB and Testbench files –Improves accuracy of output Matrix –Reduces design and verification time MATLAB creates data files for the Testbench –Maximum input values supported  Bit size: 64 bits  Matrix Size: 2048 x 2048  Test Vectors: 100 Data tested on Testbench using Modelsim 16.- Area, Speed, and Error Analysis

17.- Area, Speed, and Error Analysis (Cont.)

RMD calculates the estimated area –Area = Matrix Multiplier Block + Memory + Control These calculations use: –n:Maximum Matrix Multiplication Size –r:Input bits –p:Matrix Multiplier Block Size –M r :r-bit Multiplier –A r :r-bit Adder –R r :r-bit Register –Mux r :r-bit ( 2-1 ) Mux –RNE:( 2 r+k max to r) bit Rounding –HA:Half Adder –Mem r :r-bit Memory –FF:Flip Flops –F max :Maximum Frequency 18.- Area, Speed, and Error Analysis (Cont.)

19.- Area, Speed, and Error Analysis (Cont.)

20.- Area, Speed, and Error Analysis (Cont.)

21.- Area, Speed, and Error Analysis (Cont.)

Two Native Port Interfaces –Interface with DDR 2 memory –Width of 64 bits –Supports Back-to-Back Transfers –Transfer Sizes:  Byte  Half-word  Word  4 -word and 8 -word cache line  16 -word, 32 -word, and 64 -word bursts 22.-High Speed Memory Interface

23.- Area Results (Spartan 3 E)

24.- Area Results (Virtex- 5 )

25.- Time Results

Design Pipelined –Increase Throughput and Reduce Operation Latency Incremental Adder –Reduce Area and Increase Accuracy Modifiable –Increased Accuracy –Faster Operation –Lower Area 26.- Conclusion