Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Nios Multi Processor Ethernet Embedded Platform Final Presentation
Bus Specification Embedded Systems Design and Implementation Witawas Srisa-an.
Categories of I/O Devices
Interfacing mixed signal peripherals by protocols of packet type Emil Gueorguiev Saramov Angel Nikolaev Popov Computer Systems Department, Technical University.
Internal Logic Analyzer Final presentation-part A
Maciej Gołaszewski Tutor: Tadeusz Sondej, PhD Design and implementation of softcore dual processor system on single chip FPGA Design and implementation.
CS-334: Computer Architecture
Super Fast Camera System Performed by: Tokman Niv Levenbroun Guy Supervised by: Leonid Boudniak.
Mid semester Presentation Data Packages Generator & Flow Management Data Packages Generator & Flow Management Data Packages Generator & Flow Management.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
1 Optimizing multi-processor system composition Characterization Presentation November 20 th – 2007 Performing: Isaac Yarom Supervising: Mony Orbach Annual.
© 2004 Xilinx, Inc. All Rights Reserved Implemented by : Alon Ben Shalom Yoni Landau Project supervised by: Mony Orbach High speed digital systems laboratory.
Hardware accelerator for PPC microprocessor Final presentation By: Instructor: Kopitman Reem Fiksman Evgeny Stolberg Dmitri.
1 Project supervised by: Dr Michael Gandelsman Project performed by: Roman Paleria, Avi Yona 12/5/2003 Multi-channel Data Acquisition System Mid-Term Presentation.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Final Presentation Packet I/O Software Management Application PISMA® Supervisor: Mony Orbach D0317 One-Semester Project Liran Tzafri Michael Gartsbein.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
HS/DSL Project Yael GrossmanArik Krantz Implementation and Synthesis of a 3-Port PCI- Express Switch Supervisor: Mony Orbach.
Presenter : Cheng-Ta Wu Antti Rasmus, Ari Kulmala, Erno Salminen, and Timo D. Hämäläinen Tampere University of Technology, Institute of Digital and Computer.
ECE Department: University of Massachusetts, Amherst Lab 1: Introduction to NIOS II Hardware Development.
GCSE Computing - The CPU
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Started: Spring 2008 Part A final Presentation.
Ethernet Bomber Ethernet Packet Generator for network analysis Oren Novitzky & Rony Setter Advisor: Mony Orbach Spring 2008 – Winter 2009 Midterm Presentation.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
Technion Digital Lab Project Performance evaluation of Virtex-II-Pro embedded solution of Xilinx Students: Tsimerman Igor Firdman Leonid Firdman.
1 Mid-term Presentation Implementation of generic interface To electronic components via USB2 Connection Supervisor Daniel Alkalay System architectures.
OS Implementation On SOPC Final Presentation
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11 Date: Technion – Israel Institute of Technology Faculty of Electrical Engineering High Speed.
Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.
Viterbi Decoder Project Alon weinberg, Dan Elran Supervisors: Emilia Burlak, Elisha Ulmer.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Spring 2009.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
1 Abstract & Main Goal המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory The focus of this project was the creation of an analyzing device.
Performed by: Yaron Recher & Shai Maylat Supervisor: Mr. Rolf Hilgendorf המעבדה למערכות ספרתיות מהירות הטכניון - מכון טכנולוגי לישראל הפקולטה להנדסת חשמל.
Class of Service Distribution SW/HW interface Clusters of VPUs Clusters of VPUs Clusters of VPUs LBS Arbitration Clusters of VPUs.
Performed by:Yulia Turovski Lior Bar Lev Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
1 Presented By: Eyal Enav and Tal Rath Eyal Enav and Tal Rath Supervisor: Mike Sumszyk Mike Sumszyk.
Performed by: Guy Assedou Ofir Shimon Instructor: Yaniv Ben-Yitzhak המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון.
Network On Chip Platform
Lab 2 Parallel processing using NIOS II processors
Proposal for an Open Source Flash Failure Analysis Platform (FLAP) By Michael Tomer, Cory Shirts, SzeHsiang Harper, Jake Johns
Ethernet Bomber Ethernet Packet Generator for network analysis
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
Class of Service Distribution SW/HW interface Clusters of VPUs Clusters of VPUs Clusters of VPUs LBS Arbitration Clusters of VPUs.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Instructor: Evgeny Fiksman Students: Meir.
Encryption / Decryption on FPGA Final Presentation Written by: Daniel Farcovich ID Saar Vigodskey ID Advisor: Mony Orbach Summer.
OPTIMIZING A MULTI- PROCESSOR SYSTEM Performing: Isaac Yarom Supervised by: Mony Orbach 15/5/2008 Annual Project – Semester A (2007-1) Mid-term presentation.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Parallel accelerator project Final presentation Summer 2008 Student Vitaly Zakharenko Supervisor Inna Rivkin Duration semester.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
GCSE Computing - The CPU
Lab 1: Using NIOS II processor for code execution on FPGA
Introduction to Programmable Logic
GCSE Computing - The CPU
Design principles for packet parsers
Presentation transcript:

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel Project supervised by: Ina Rivkin Winter 2008

 This project is a part of a much larger project dealing with signal processing acceleration using hardware. In our project we created a hardware acceleration to a given algorithm and analyzed the advantages and disadvantages of such system, in comparison to a pure software implementation.

 Running the signal processing algorithm takes too much time using a software on a standard PC.  A system designed especially for that target, using multiple processors and management unit

 Simulator- a program running on the host PC responsible to generate data packets, sending them for processing and retrieving back the results.  Processing units, each running the same signal processing program upon the incoming packets.  Switch- responsible of the correct transfer of data between the host PC and the processing units and vise versa.

switch PCI BUS Processor I Processor N Processor II On chip memory On chip memory On chip memory Stratix II FIFO_IN Board memory FIFO_OUT Board memory Gidel ProcStar II Data Packages Generator

 Building the above system, understanding multiprocessors issues.  Learning the tools and techniques for building such complex systems.  Optimizing the system configuration, in a search for the ideal NiosII type and number  Finding the optimal configuration in which throughput is brought to maximum  Performances comparison between working with PC and working with the system

 The project is un integration between tow levels: software and hardware:  The software level is composed of Vitaly’s packet processing algorithm and the HOST program which generate the data packets and retrieves the results.  The hardware level, implemented on the PROC board, includes the switch, the processing units and GIDEL IP’s

 This program generates vectors of Time Of Arrivals (TOA), each made up of a basic chain with a specific period and noise elements.  Every vector is wrapped with header and tail used for identification; control signals and synchronization  The packet structure:

 The algorithm job is to recognize such basic chains in the incoming vectors and to associate each TOA element to his chain.  The results send back to the simulator in the following packet structure:

 Hardware level is implemented on a Gidel PROCStar II board.  4 powerful Stratix II FPGA each annexed to 2 external DDRII.  The packets are sent to the processing units via the PCI bus  the packets are stored on the 2 external memories, which are configured to act as FIFO’s.

 The switch, designed by Oleg & Maxim, manages the data transfer between the host PC and the multiple processing units.  The switch is composed of the following main modules:  Input reader- reads packets from FIFO_IN to processing units  Output writer-writes the answers from the processing units to FIFO_OUT  Main controller- as it name implies- issues all control signals required to the correct transfer of data  Clusters- a wrapper around the processing units used to give another abstraction layer to the system.

 Management policy: FCFS for input packets, RR for output packets  Statistics reporter  Error reporter  up to 16 clusters.

 Switch has up to 16 clusters inside.  Same cluster is duplicated many times to create a multi Nios system  Switch ports:  Every cluster has one processing unit, as seen in the next slide  Cluster ports:

 1 NiosII CPU  12 KB on chip memory for code, stack and heap  2 4KB buffers used by the algorithm to build the histograms  4KB buffer for input packets  dual port, also mastered by the switch  1KB buffer for output packets  dual port, also mastered by the switch  Timer.

 input vector and output vector- the connection to the switch. Without their ports no ack/req protokols could be implemented  The modules “export” signals, would be connected to the cluster, as seen in the “cluster structure” slide.

 Duplicating the clusters inside the switch would create a multi Nios system.  The switch support up to 16 clusters.  This example include 14 Nios s  Logic utilization is only 20%  While almost all ram blocks are used, memory utilization is only 33%, mainly because M-RAM cells are ineffectively used.

 Gidel IP- The MegaFIFO  provide a simple and convenient way to transfer data to/from Gidel PROC board.  In this system there are tow FIFO’s : FIFO_IN for the incoming packets and FIFO_OUT for the processed packets.  To access those memories the host uses Gidel predefined HAL functions while the hardware uses ack/req protocol.  Gidel IP- Register  used to transfer data from hardware to software and vise versa.  In this system- they are used for error and statistics reporting.

 In the PROCWizard tool we define the top level entity of the design. It generates the HDL code for the design and an H file for the host.  We can see here the definition of one IC (FPGA), tow FIFO’s, some registers and the LBS module.

 Basic system:  1 NiosII s (s for standard) system  1 simulator we built  1 algorithm

 3 methods:  Timer module- inside the sopc, used as timestamp. Resolution: 10 us.  Statistics reporter module- counts packet entering and exiting the system. As long as their numbers are not identical it counts clock cycles and info register has the value of 128. Resolution: 0.01 us.  Software timer- initiated by the host from the moment it writes the data to the moment info register is zero- indicating all packets returned. Resolution: 1 us.  Later we will demonstrate how 3 methods converge

 Computing time as a function of TOA number.  Computing time= O(n^2)  %Absent=0  %noise=0

 Computing time as a function of % absent  Around 6% the algorithm finds more then 1 sequence, with double frequencies…  %noise=0

 Computing time as a function of %noise

 According to the above results, we choose an average vector to check different systems  Vector length: 495  % absent= 4%  % noise= 25%

 In order to decide what Nios configuration to use we checked them with the same vector  The economic CPU needs little space on the FPGA, but has poor performance  The fast version has some advantage over the simple CPU, but needs a lot more FPGA resources, and so we choose the simple one

 The CPUs are independent and so doubling their number doubles the performance  There is no major overhead for adding CPUs

 In order to come to final conclusions we sent 10k random vectors to both PC and most powerful FPGA system  The PC does the job 7.64 times faster

 No!  The Nios CPUs we used are no match for the PC Pentium CPU, but there are a few ways to get better performances  1. Increasing the system’s 100 MHz clock rate  2. Adding an accelerator unit to each CPU  3. Shrinking the code lines from 8.5kbytes to 8kbytes and by that gaining more Ram cells for more CPUs  4. Optimize utilization of Mram cells.

 Ina Rivkin  Lab staff- Eli Shoshan and Moni Orbach  Oleg and Maxim  Vitaly  Michael and Liran