Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.

Slides:



Advertisements
Similar presentations
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Advertisements

1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Developing Video Applications on Xilinx FPGAs
MotoHawk Training Model-Based Design of Embedded Systems.
Device Driver for Generic ASC Module Project Presentation By: Yigal Korman Erez Fuchs Instructor: Evgeny Fiksman Sponsored by: High Speed Digital Systems.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
MEMOCODE 2007 HW/SW Co-design Contest Documentation of the submission by Eric Simpson Pengyuan Yu Sumit Ahuja Sandeep Shukla Patrick Schaumont Electrical.
Conversion Between Video Compression Protocols Performed by: Dmitry Sezganov, Vitaly Spector Instructor: Stas Lapchev, Artyom Borzin Cooperated with:
Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
© 2004 Xilinx, Inc. All Rights Reserved Implemented by : Alon Ben Shalom Yoni Landau Project supervised by: Mony Orbach High speed digital systems laboratory.
Fall 2006Lecture 16 Lecture 16: Accelerator Design in the XUP Board ECE 412: Microcomputer Laboratory.
Network based System on Chip Students: Medvedev Alexey Shimon Ofir Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.
Configurable System-on-Chip: Xilinx EDK
MAPLD 2005 A High-Performance Radix-2 FFT in ANSI C for RTL Generation John Ardini.
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
Hardware accelerator for PPC microprocessor by: Dimitry Stolberg Reem Kopitman Instructor: Evgeny Fiksman.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Midterm Presentation.
Final Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
Device Driver for Generic ASC Module - Project Presentation - By: Yigal Korman Erez Fuchs Instructor: Evgeny Fiksman Sponsored by: High Speed Digital Systems.
Company LOGO Hashing System based on MD5 Algorithm Characterization Students: Eyal Mendel & Aleks Dyskin Instructor: Evgeny Fiksman High Speed Digital.
Hardware accelerator for PPC microprocessor By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Started: Spring 2008 Part A final Presentation.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Spring 2008 – Winter 2009 Midterm Presentation.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל Technion - Israel institute of technology department of Electrical Engineering Virtex II-PRO Dynamical.
Technion Digital Lab Project Performance evaluation of Virtex-II-Pro embedded solution of Xilinx Students: Tsimerman Igor Firdman Leonid Firdman.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.
Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.
Digital signature using MD5 algorithm Hardware Acceleration
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
© 2004 Xilinx, Inc. All Rights Reserved EDK Overview.
Implementation of MAC Assisted CORDIC engine on FPGA EE382N-4 Abhik Bhattacharya Mrinal Deo Raghunandan K R Samir Dutt.
RiceNIC: A Reconfigurable and Programmable Gigabit Network Interface Card Jeff Shafer, Dr. Scott Rixner Rice Computer Architecture:
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
J. Christiansen, CERN - EP/MIC
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
טכניון – מכון טכנולוגי לישראל הפקולטה להנדסת חשמל PowerPC based reliable computer Students:Guy Derry Gil Wiechman Instructor: Isaschar Walter Winter 2003.
NIOS II Ethernet Communication Final Presentation
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
LZRW3 Data Compression Core Dual semester project April 2013 Project part A final presentation Shahar Zuta Netanel Yamin Advisor: Moshe porian.
Part A Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
Low-Power Wireless Video System Advisor: Professor Alex Doboli Students: Christian Austin Artur Kasperek Edward Safo.
Final Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
Network On Chip Platform
1 Hardware/Software Co-Design Final Project Emulation on Distributed Simulation Co-Verification System 陳少傑 教授 R 黃鼎鈞 R 尤建智 R 林語亭.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Ethernet Bomber Ethernet Packet Generator for network analysis
Encryption / Decryption on FPGA Final Presentation Written by: Daniel Farcovich ID Saar Vigodskey ID Advisor: Mony Orbach Summer.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Survey of Reconfigurable Logic Technologies
Final Presentation Hardware DLL Real Time Partial Reconfiguration Management of FPGA by OS Submitters:Alon ReznikAnton Vainer Supervisors:Ina RivkinOz.
Implementing JPEG Encoder for FPGA ECE 734 PROJECT Deepak Agarwal.
Back-end Electronics Upgrade TileCal Meeting 23/10/2009.
CORDIC Based 64-Point Radix-2 FFT Processor
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
Dynamo: A Runtime Codesign Environment
Design Flow System Level
Highly Efficient and Flexible Video Encoder on CPU+FPGA Platform
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Presentation transcript:

Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri

Agenda Ways to implement an algorithm Starting with ASC HW architecture SW architecture System optimization Generic module (iDCT) Timing results

Abstract Problem There are complex functions (e.g. FFT) which takes a lot of CPU recourses Consider the ways of implementation of such functions and choose the best solution according to specified constraints Solutions Pure SW implementation Pure HW implementation Combinational HW + SW - ASC technology

Abstract SW Low cost Low performance HW High cost High performance Combinational

Project Goals Study of ASC (A Stream compiler) Study of functions in PamDC library Implementation of interface between a generic module and the CPU using ASC Implementation of some specific module to test the interface Implementation of the same module in SW and make conclusions about performance

ASC - A Stream Compiler Combinational (SW/HW) code Familiar C++ writing Generates a flexible HW Standard NetList output (edif) Supported by standard Cad tools Provides HW optimization UNIX oriented

ASC – code example #include "asc.h" main(int argc, char **argv) { printf("Hello World\n"); STREAM_START; // ASC code start // Hardware Variable Declarations HWint in(IN); HWint out(OUT); HWint tmp(TMP); STREAM_LOOP(16); tmp = (in << 1) + 55; out = tmp; STREAM_END; // ASC code end } Software Hello World Hardware

System components Memec evaluation board Xilinx Virtex II Pro FPGA with PPC405 JTAG LCD, Serial port for debug SW tools Xilinx EDK Xilinx Platform Studio Chip Scope

Design Approach - general FPGA module PPC405 Processor Memory EDAC Memory EDAC Memory DRAM Peripheral ASC Peripheral module Monitor other peripheral Monitor module System Bus (PLB)

ASC interface (General view) DMA engine DMA Buffer Serdes Generic Module PLB bus Interrupt controller FIFO_in FIFO_out Data Addr CTRL Fifo_full Data_in Data_out

SW review – main algorithm Start/reset System blocks initialization(FIFO, DMA,GPIO,LCD) DMA busy Yes Write data packets to ASC application No Calculatio n complete No Read data packets from ASC application Yes

SW review – C code fundament DMA – control and data TX/RX func. LCD – setup and data TX func. Data size manipulation Timers control func. MASK definition – user friendly orientation

iDCT abstract Reconstructs an image or audio block from it’s discrete cosine transform Why iDCT? Complex iterative algorithm which takes a lot of CPU resources

ASC design – IDCT module Discrete Cosine Transform This transform is utilized in the current standards for still images (JPEG) and video compression (MPEG). The principle: Xm - matrix of discrete samples (iDCT samples) Tm - cosine coefficient matrix Fm - DCT matrix

ASC design – Optimization (1) ASC supports: Latency Throughput Area For large amount of data: Throughput – calculation time optimized

ASC design – Optimization (2) Optimization… Throughput, Area, Latency? AreaThroughputLatency 3 stream cycles9 stream cycles1 stream cycle Max latency 28 clk8 clk Stream cycle

ASC design – Optimization (3) Optimization – Area consumption Absolute values refer to Xilinx Virtex II Pro XC2VP7 FPGA LatencyThroughputArea %absolute value% % 2,905 4,440 3,883 FF used in total design total num 9,856 75%7,42279%7,85060%5,915design 4 input LUTs in total total num 9,856 3%26418%1,79913%1,242FF used for ASC total num 9,856 36%3,50940%3,93720%2,002ASC 4 input LUTs for total num 9,856 1,387,906 1,442,767 1,370,908Total equivalent gate count for design Total gate count Comparing to Empty

ASC design – Optimization (4) Optimization – Area Consumption Optimization by latency is the choice. Best throughput and latency, with average area consumption

Clock calculations Get time 1 Set DMA control Tx / Rx data packet complete No Get time 2 Yes Calk_time = time2 – time1 LCD write Data + calculation time

iDCT running results – SW (1) Linear calculation time growth vs. data packet length as expected in iDCT Basic packet size is 32 bytes. Packet length scale is in num. of basic packets

iDCT running results – SW (2) Exponential time calculation growth with exp. data length increasing Exponential Data incease log (Packet length) (x*32) log (Calculation time[us]) Exponential Data increase

iDCT running results – HW (1) FIFO size influence (512 bytes) High calculation time vs. writing new data to FIFO

iDCT running results – HW (2) FIFO size influence (512 bytes) High calculation time vs. writing new data to FIFO Basic packet size is 32 bytes. Packet length scale is in num. of basic packets

iDCT running results – SW vs. HW

Innovations Make this generic interface hard coded and include it as part of FPGA (IP) development packet. Development becomes to C++ coding only Interconnection between PPC & Generic Module becomes transparent Make current design faster using separate DMA channels for read and write