Final Presentation Annual project (Part A) Winter semesterתש"ע ((2009 Students: Oren Hyatt, Alex Dutov Supervisor: Mony Orbach.

Slides:



Advertisements
Similar presentations
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
Advertisements

Programmable FIR Filter Design
A MATLAB function is a special type of M-file that runs in its own independent workspace. It receives input data through an input argument list, and returns.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
Software and Hardware Circular Buffer Operations First presented in ENCM There are 3 earlier lectures that are useful for midterm review. M. R.
The Design Process Outline Goal Reading Design Domain Design Flow
Students: Shalev Dabran Eran Papir Supervisor: Mony Orbach In association with: Spring 2005 High Speed Digital Systems Lab.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 5, 2005 Lecture 2.
COS 323: Computing for the Physical and Social Sciences Szymon Rusinkiewicz.
Presenting: Itai Avron Supervisor: Chen Koren Final Presentation Spring 2005 Implementation of Artificial Intelligence System on FPGA.
Lecture 16: Basic CPU Design
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Multicore experiment: Plurality Hypercore Processor Performed by: Anton Fulman Ze’ev Zilberman Supervised by: Mony Orbach Final presentation Winter 2008.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Picture Manipulation using Hardware Presents by- Uri Tsipin & Ran Mizrahi Supervisor– Moshe Porian Final Presentation – Part B Dual-semester project
INS/GPS Integration Based Navigation using Particle Filter GPS Control System MidTerm presentation Performed by: Yuval Yosef Adi Weissman Supervised by:
Tightly coupled INS/GPS system using particle filter D0928- system architecture and math functions Part A - Final presentation Students: Royzman Danny.
Presented by : Maya Oren & Chen Feigin Supervisor : Moshe Porian Lab: High Speed Digital System One Semester project – Spring
Matrix Multiplication on FPGA Final presentation One semester – winter 2014/15 By : Dana Abergel and Alex Fonariov Supervisor : Mony Orbach High Speed.
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
1 WORLD CLASS – through people, technology and dedication High level modem development for Radio Link INF3430/4431 H2013.
© 2003 Xilinx, Inc. All Rights Reserved Answers DSP Design Flow.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
8-1 Embedded Systems Fixed-Point Math and Other Optimizations.
Picture Manipulation using Hardware Presents by- Uri Tsipin & Ran Mizrahi Supervisor– Moshe Porian Final Presentation – Part B Dual-semester project
Floating Point Arithmetic
Company LOGO Mid semester presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Recap Script M-file Editor/Debugger Window Cell Mode Chapter 3 “Built in MATLAB Function” Using Built-in Functions Using the HELP Feature Window HELP.
6.375 Final Presentation Jeff Simpson, Jingwen Ouyang, Kyle Fritz FPGA Implementation of Whirlpool and FSB Hash Algorithms.
REGISTER MANAGEMENT TOOL Preformed by: Liat Honig Nitzan Carmel Supervisor: Moshe Porian Date: 24/11/2011, winter semester 2011 Duration: One semester.
Final Presentation Annual project (Part A) Winter semesterתשע"ב (2011/12) Students: Dan Hofshi, Shai Shachrur Supervisor: Mony Orbach INS/GPS navigation.
Computational Methods of Scientific Programming Lecturers Thomas A Herring, Room , Chris Hill, Room ,
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.
Final Presentation Winter Final Presentation Winter Students Naftali Weiss Nadav Melke Instructor Mony Orbach Duration Single Semester.
Integrated Smart Sensor Calibration Abstract Including at the sensor or sensor interface chip a programmable calibration facility, the calibration can.
High Speed Digital Systems Lab. Agenda  High Level Architecture.  Part A.  DSP Overview. Matrix Inverse. SCD  Verification Methods. Verification Methods.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
AMB HW LOW LEVEL SIMULATION VS HW OUTPUT G. Volpi, INFN Pisa.
Final Presentation Implementation of DSP Algorithm on SoC Student : Einat Tevel Supervisor : Isaschar Walter Accompanying engineer : Emilia Burlak The.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 5 - Functions Outline 5.1Introduction 5.2Program.
Company LOGO Final presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
© 2010 Altera Corporation - Public Lutiac – Small Soft Processors for Small Programs David Galloway and David Lewis November 18, 2010.
© 2003 Xilinx, Inc. All Rights Reserved Answers DSP Design Flow.
GPS Computer Program Performed by: Moti Peretz Neta Galil Supervised by: Mony Orbach Spring 2009 Part A Presentation High Speed Digital Systems Lab Electrical.
1 Lecture 3 Post-Graduate Students Advanced Programming (Introduction to MATLAB) Code: ENG 505 Dr. Basheer M. Nasef Computers & Systems Dept.
Company LOGO Final presentation Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Company LOGO Project Characterization Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
Performed by: Alexander Pavlov David Domb Instructor: Mony Orbach המעבדה למערכות ספרתיות מהירות High speed digital systems laboratory הטכניון - מכון טכנולוגי.
CS 232: Computer Architecture II Prof. Laxmikant (Sanjay) Kale Floating point arithmetic.
Virtual-Channel Flow Control William J. Dally
EE3A1 Computer Hardware and Digital Design Lecture 9 Pipelining.
GPS-INS RESAMPLING VERIFICATION Final Presentation Annual project (Part B) Winter semesterתש"ע ((2009 Students: Oren Hyatt, Alex Dutov Supervisor: Mony.
GPS Computer Program Performed by: Moti Peretz Neta Galil Supervised by: Mony Orbach Spring 2009 Characterization presentation High Speed Digital Systems.
Progress on Simulation Software Hai-Ping Peng(USTC) Xiao-Shuai Qin(IHEP) Xiao-Rong Zhou(USTC) Yu Hu(IHEP) 2014 STC Workshop (ITP) Hai-Ping Peng.
Buffering Techniques Greg Stitt ECE Department University of Florida.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
CS Spring 2008 – Lec #17 – Retiming - 1
SLP1 design Christos Gentsos 9/4/2014.
CS 232: Computer Architecture II
Cache Memory Presentation I
Deitel- C:How to Program (5ed)
Chapter 6 Floating Point
Objective of This Course
Recall: ROM example Here are three functions, V2V1V0, implemented with an 8 x 3 ROM. Blue crosses (X) indicate connections between decoder outputs and.
Random access memory Sequential circuits all depend upon the presence of memory. A flip-flop can store one bit of information. A register can store a single.
Presentation transcript:

Final Presentation Annual project (Part A) Winter semesterתש"ע ((2009 Students: Oren Hyatt, Alex Dutov Supervisor: Mony Orbach

 Problem: A GPS system isn’t fast enough, to meet updating requirements of high speed systems.  Solution: Implementation of a system that integrates GPS, INS, and a particles filter.

 Implementation of the resampling & regularization parts.  Interface with the other parts of the system.  Meet hardware\software requirements (see specs.).

Format’s Notation: X.YY.ZZZ ~ sign ; number ; fraction W[0.0.28]Index_out[0.16.0] Reject insignificant particles. Duplicate remained particles, with respect to their relative weight.

Logic Accuracy Speed Hardware Resources usage working fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicat ed logic regs. Total regs. Total block memory bits DSP 18bits 187[Mhz]161[usec]524 (<1%) 146 (<1%) 146 (<1%) 2,097,152 (25%) 4 (<1%) Footnotes: Rand(); can ran at 122 [Mhz], pre-called to boost performance. Could not further decrease storage cells. In order to get the exact same results, as C’s, use Rand.enableseed(var_seed); before the iteration.

N(0,1) sqp_in[23..0]*** epsilon[23..0]** hopt[52..0]* *hopt is a constant **epsilon is normally distributed ***Generated by lab’s: D1828 Xp_reg_new[105..0] const

Logic Accuracy Speed Hardware Resources usage working fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicated logic regs. Total regs. Total block memory bits DSP 18bits 23.8[Mhz]1.26[usec]13378 (16%) 318 (<1%) 318 (<1%) 0154 (17%) norm dist [Mhz]1.26[usec]12810 (15%) 4775 (6%) (<1%) 32 (4%) Footnotes: *used a high margin of safety. Rand.norm() is problematic. Throughput of Matrix multiplication step, could be doubled. With low cost of resources (later). According to D1828: sqp_in, takes more than allowed time. Thus, latency would be unacceptable. Step I

Limit to [-pi,pi] [1.1.22] Input is output of StepI All 3 angles, are normalized: 1=2*pi [rad] Range of both angles and q’s is: [-1,1] q1[1.1.22] q2[1.1.22] q3[1.1.22] q4[1.1.22] convert Euiler to quternions qi=sign(q1)

Footnotes: Could successfully avoid usage of SQRT(), which saved both speed and recourses. A room for trade-off between area and time, in using trig. Fuctions. Logic Accuracy Speed Hardware Resources usage simulation fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicated logic regs. Total regs. Total block memory bits DSP 18bits 15.7 [Mhz] 1910 [usec] (16%) 318 (<1%) (17%)

1. Seed of the randomly generated numbers, depends on the number of times the method is being called. 2. Normally distributed numbers, generated in an acceptance-rejection method.

1. Uniformly generated numbers, would be generated in the exact same method as in C’s library file. Each part would have it’s own generating object. 2. As acceptance-rejection method, could not be used, ditched and an alternative method was used.

1. Can be used in synchronic logic. 2. Not necessarily more costly in time, nor in HW resources. 3. Working.

1. The generation is very costly in both time and logic (could be even worse).very costly 2. We rely on mathematical functions, and any usage in them, should be done carefully. 3. Since there is no faster reasonable way to generate the numbers. And the method is called a large number of times. A bottleneck was created.

A thorough simulation with regularizations steps integrated. Would be done. An enveloping state machine, to interact with the FIFO. Would be done, Approx. two weeks. There’s a way to double regularization’s speed, for each particle, with a low HW cost. A detailed guide would be added to our book. Further analyzing normal distribution component. Return on time spent is too low.

As time passed, and problems arise, the Gantt should be modified. A periodically meeting, with the other teams, including a summary to all other teams. Could help a lot.

Thank you.

Logic Accuracy Speed Hardware Resources usage working fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicat ed logic regs. Total regs. Total block memory bits DSP 18bits 187[Mhz]161[usec] ,097,1524 Footnotes: Rand(); can ran at 122 [Mhz], pre-called to boost performance. Could not further decrease storage cells. In order to get the exact same results, as C’s, use Rand.enableseed(var_seed); before the iteration.

Logic Accuracy Speed Hardware Resources usage working fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicat ed logic regs. Total regs. Total block memory bits DSP 18bits 187[Mhz]161[usec]524(<1%)146 (<1%) 146 (<1%) 2,097,152 (25%) 4 (<1%) Footnotes: Rand(); can ran at 122 [Mhz], pre-called to boost performance. Could not further decrease storage cells. In order to get the exact same results, as C’s, use Rand.enableseed(var_seed); before the iteration.

Index_out = [0,0,2,3,4,5,5,5,5,5] * Offset of 1, because matlab starts arrays with index 1.

PerformanceResource usage Pipe sizeClock(MHz) Mean accuracy(%) Comb.ALU Dedicated logic registers Trig block* 2÷ ÷ ÷ ÷1381 SQRT block* ALTFP_INV_SQRT Single Precision3521,392 ALTFP_LOG21360Single Precision *See lab’s project: D0928- system architecture and math functions Reg. 1

Matlab HW Error [pct] Matlab HW Error [pct] …. …..

Footnotes: Could successfully avoid usage of SQRT(), which saved both speed and recourses. A room for trade-off between area and time, in using trig. Fuctions. Logic Accuracy Speed Hardware Resources usage simulation fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicated logic regs. Total regs. Total block memory bits DSP 18bits 15.7 [Mhz] 1910 [usec]

Total hardware report. Regularization’s matrix mult + xp_reg_new.

Logic simulation. Regularization’s stepII (e2q_all -> end). Phy Psi Theta q1 q2 q3 q E00000 Expected:

Logic Accuracy Speed Hardware Resources usage working fmax (timing sim.)timing sim N /fmaxComb.ALU Dedicated logic regs. Total regs. Total block memory bits DSP 18bits 0.1*%23.8[Mhz]1.26[usec]13378 (16%) 318(<1%)318 (<1%) 0154 (17%) norm dist [Mhz]1.26[usec] Step I