Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.

Similar presentations

Presentation on theme: "Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel."— Presentation transcript:

1 Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel Project supervised by: Ina Rivkin Winter 2008

2  This project is a part of a much larger project dealing with signal processing acceleration using hardware. In our project we created a hardware acceleration to a given algorithm and analyzed the advantages and disadvantages of such system, in comparison to a pure software implementation.

3  Running the signal processing algorithm takes too much time using a software on a standard PC.  A system designed especially for that target, using multiple processors and management unit

4  Simulator- a program running on the host PC responsible to generate data packets, sending them for processing and retrieving back the results.  Processing units, each running the same signal processing program upon the incoming packets.  Switch- responsible of the correct transfer of data between the host PC and the processing units and vise versa.

5 switch PCI BUS Processor I Processor N Processor II On chip memory On chip memory On chip memory Stratix II FIFO_IN Board memory FIFO_OUT Board memory Gidel ProcStar II Data Packages Generator

6  Building the above system, understanding multiprocessors issues.  Learning the tools and techniques for building such complex systems.  Optimizing the system configuration, in a search for the ideal NiosII type and number  Finding the optimal configuration in which throughput is brought to maximum  Performances comparison between working with PC and working with the system

7  The project is un integration between tow levels: software and hardware:  The software level is composed of Vitaly’s packet processing algorithm and the HOST program which generate the data packets and retrieves the results.  The hardware level, implemented on the PROC board, includes the switch, the processing units and GIDEL IP’s

8  This program generates vectors of Time Of Arrivals (TOA), each made up of a basic chain with a specific period and noise elements.  Every vector is wrapped with header and tail used for identification; control signals and synchronization  The packet structure:

9  The algorithm job is to recognize such basic chains in the incoming vectors and to associate each TOA element to his chain.  The results send back to the simulator in the following packet structure:

10  Hardware level is implemented on a Gidel PROCStar II board.  4 powerful Stratix II FPGA each annexed to 2 external DDRII.  The packets are sent to the processing units via the PCI bus  the packets are stored on the 2 external memories, which are configured to act as FIFO’s.

11  The switch, designed by Oleg & Maxim, manages the data transfer between the host PC and the multiple processing units.  The switch is composed of the following main modules:  Input reader- reads packets from FIFO_IN to processing units  Output writer-writes the answers from the processing units to FIFO_OUT  Main controller- as it name implies- issues all control signals required to the correct transfer of data  Clusters- a wrapper around the processing units used to give another abstraction layer to the system.


13  Management policy: FCFS for input packets, RR for output packets  Statistics reporter  Error reporter  up to 16 clusters.

14  Switch has up to 16 clusters inside.  Same cluster is duplicated many times to create a multi Nios system  Switch ports:  Every cluster has one processing unit, as seen in the next slide  Cluster ports:


16  1 NiosII CPU  12 KB on chip memory for code, stack and heap  2 4KB buffers used by the algorithm to build the histograms  4KB buffer for input packets  dual port, also mastered by the switch  1KB buffer for output packets  dual port, also mastered by the switch  Timer.


18  input vector and output vector- the connection to the switch. Without their ports no ack/req protokols could be implemented  The modules “export” signals, would be connected to the cluster, as seen in the “cluster structure” slide.

19  Duplicating the clusters inside the switch would create a multi Nios system.  The switch support up to 16 clusters.  This example include 14 Nios s  Logic utilization is only 20%  While almost all ram blocks are used, memory utilization is only 33%, mainly because M-RAM cells are ineffectively used.

20  Gidel IP- The MegaFIFO  provide a simple and convenient way to transfer data to/from Gidel PROC board.  In this system there are tow FIFO’s : FIFO_IN for the incoming packets and FIFO_OUT for the processed packets.  To access those memories the host uses Gidel predefined HAL functions while the hardware uses ack/req protocol.  Gidel IP- Register  used to transfer data from hardware to software and vise versa.  In this system- they are used for error and statistics reporting.

21  In the PROCWizard tool we define the top level entity of the design. It generates the HDL code for the design and an H file for the host.  We can see here the definition of one IC (FPGA), tow FIFO’s, some registers and the LBS module.

22  Basic system:  1 NiosII s (s for standard) system  1 simulator we built  1 algorithm

23  3 methods:  Timer module- inside the sopc, used as timestamp. Resolution: 10 us.  Statistics reporter module- counts packet entering and exiting the system. As long as their numbers are not identical it counts clock cycles and info register has the value of 128. Resolution: 0.01 us.  Software timer- initiated by the host from the moment it writes the data to the moment info register is zero- indicating all packets returned. Resolution: 1 us.  Later we will demonstrate how 3 methods converge

24  Computing time as a function of TOA number.  Computing time= O(n^2)  %Absent=0  %noise=0

25  Computing time as a function of % absent  Around 6% the algorithm finds more then 1 sequence, with double frequencies…  %noise=0

26  Computing time as a function of %noise

27  According to the above results, we choose an average vector to check different systems  Vector length: 495  % absent= 4%  % noise= 25%

28  In order to decide what Nios configuration to use we checked them with the same vector  The economic CPU needs little space on the FPGA, but has poor performance  The fast version has some advantage over the simple CPU, but needs a lot more FPGA resources, and so we choose the simple one

29  The CPUs are independent and so doubling their number doubles the performance  There is no major overhead for adding CPUs

30  In order to come to final conclusions we sent 10k random vectors to both PC and most powerful FPGA system  The PC does the job 7.64 times faster

31  No!  The Nios CPUs we used are no match for the PC Pentium CPU, but there are a few ways to get better performances  1. Increasing the system’s 100 MHz clock rate  2. Adding an accelerator unit to each CPU  3. Shrinking the code lines from 8.5kbytes to 8kbytes and by that gaining more Ram cells for more CPUs  4. Optimize utilization of Mram cells.

32  Ina Rivkin  Lab staff- Eli Shoshan and Moni Orbach  Oleg and Maxim  Vitaly  Michael and Liran


Download ppt "Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel."

Similar presentations

Ads by Google