Adaptive beamforming using QR in FPGA Richard Walke, Real-time System Lab Advanced Processing Centre S&E Division.

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

Intel Pentium 4 ENCM Jonathan Bienert Tyson Marchuk.
1 Reconfigurable Computing Lab UCLA FPGA Polyphase Filter Bank Study & Implementation Raghu Rao Matthieu Tisserand Mike Severa Prof. John Villasenor Image.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and.
A Systolic FFT Architecture for Real Time FPGA Systems.
Chapter 15 Digital Signal Processing
1 Pupil Detection and Tracking System Lior Zimet Sean Kao EE 249 Project Mentors: Dr. Arnon Amir Yoshi Watanabe.
Prototype SKA Technologies at Molonglo: 3. Beamformer and Correlator J.D. Bunton Telecommunications and Industrial Physics, CSIRO. Australia. Correlator.
MAPLD 2005 A High-Performance Radix-2 FFT in ANSI C for RTL Generation John Ardini.
Digital Kommunikationselektronik TNE027 Lecture 4 1 Finite Impulse Response (FIR) Digital Filters Digital filters are rapidly replacing classic analog.
Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
1 Real time signal processing SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
1 DSP Implementation on FPGA Ahmed Elhossini ENGG*6090 : Reconfigurable Computing Systems Winter 2006.
GPGPU platforms GP - General Purpose computation using GPU
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
Delevopment Tools Beyond HDL
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
1 Design of an SIMD Multimicroprocessor for RCA GaAs Systolic Array Based on 4096 Node Processor Elements Adaptive signal processing is of crucial importance.
The French Aerospace Lab HYCAM : A Radar Cross Section Measurement and Analysis System for Time-Varying Targets Yoann Paichard.
Highest Performance Programmable DSP Solution September 17, 2015.
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
Telecommunications and Signal Processing Seminar Ravi Bhargava * Lizy K. John * Brian L. Evans Ramesh Radhakrishnan * The University of Texas at.
Efficient FPGA Implementation of QR
1 of 23 Fouts MAPLD 2005/C117 Synthesis of False Target Radar Images Using a Reconfigurable Computer Dr. Douglas J. Fouts LT Kendrick R. Macklin Daniel.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
NDA Confidential. Copyright ©2005, Nallatech.1 Implementation of Floating- Point VSIPL Functions on FPGA-Based Reconfigurable Computers Using High- Level.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
NTD/xNTD Signal Processing Presented by: John Bunton Signal Processing team: Joseph Pathikulangara, Jayasri Joseph, Ludi de Souza and John Bunton Plus.
J. Christiansen, CERN - EP/MIC
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
© 2010 Altera Corporation—Public Modeling and Simulating Wireless Systems Using MATLAB ® and Simulink ® 2010 Technology Roadshow.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
High Speed Digital Systems Lab. Agenda  High Level Architecture.  Part A.  DSP Overview. Matrix Inverse. SCD  Verification Methods. Verification Methods.
1 Implementation in Hardware of Video Processing Algorithm Performed by: Yony Dekell & Tsion Bublil Supervisor : Mike Sumszyk SPRING 2008 High Speed Digital.
Floating-Point Divide and Square Root for Efficient FPGA Implementation of Image and Signal Processing Algorithms Xiaojun Wang, Miriam Leeser
Ariadne’s Thread Kristian Zarb Adami. Simulator Aims ۞ Provide the system architect a tool to visualise trade-offs in designs ۞ Provide the scientist.
LIST OF EXPERIMENTS USING TMS320C5X Study of various addressing modes of DSP using simple programming examples Sampling of input signal and display Implementation.
© 2002 ® Wireless Solution Update Asif Batada Marketing Manager, Wireless Business Unit Asif Batada Marketing Manager, Wireless Business Unit.
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Teaching Digital Logic courses with Altera Technology
1 Design of an MIMD Multimicroprocessor for DSM A Board Which turns PC into a DSM Node Based on the RM Approach 1 The RM approach is essentially a write-through.
Cray XD1 Reconfigurable Computing for Application Acceleration.
© 2009 Altera Corporation Floating Point Synthesis From Model-Based Design M. Langhammer, M. Jervis, G. Griffiths, M. Santoro.
EEL 5722 FPGA Design Fall 2003 Digit-Serial DSP Functions Part I.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
VLSI SP Course 2001 台大電機吳安宇 1 Why Systolic Architecture ? H. T. Kung Carnegie-Mellon University.
K-Nearest Neighbor Digit Recognition ApplicationDomainConstraintsKernels/Algorithms Voice Removal and Pitch ShiftingAudio ProcessingLatency (Real-Time)FFT,
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Backprojection Project Update January 2002
Centar ( Global Signal Processing Expo
Lect5 A framework for digital filter design
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Presentation transcript:

Adaptive beamforming using QR in FPGA Richard Walke, Real-time System Lab Advanced Processing Centre S&E Division

3 Contents 1Architecture of adaptive beamformer 2FPGA components Digital receiver QR processor – for adaptive weight calculation 3Design methodology 4Demonstration overview 5Conclusions

Architecture of adaptive beamformer Section 1

5 Rx Beamformed signal Antenna Array Analogue receivers DRx Rx Digital receivers Adaptive Weight Calculation + W1W1 W2W2 WnWn Spatial filter Adaptive beamformer Architecture of adaptive beamformer

FPGA components Section 2

7 Runtime configuration parameters  2 Image reject Bandwidth control Complex equaliser Image reject Bandwidth control Image reject Bandwidth control Image reject Bandwidth control Radar receiver specification Array of programmable ‘tap-processors’ Software configurable FIR FPGA Components

8 Software programmable parameters include: –filter length –decimation ratio –complex/real arithmetic –number of channels –time varying filtering (inter & intra-pulse) Performance GOPS on XC2V –100 GOPS on Virtex2 Pro (2003) FPGA Components Software configurable FIR

9 Building block for a range of adaptive algorithms –Sample matrix inversion (SMI) –Soft constraints –ESMI –STAP Pre-processorPost-processing QR decomposition RISC/DSP FPGA Beamforming weights Antenna data Constraints Weight calculation using QR FPGA Components

10 QR decomposition FPGA Components r 2,2 r 2,3 x1x1 y r 1,1 u1u1 r 1,2 r 1,3 r 1,4 r 2,4 u2u2 u3u3 r 3,4 r 3,3 r 4,4 u4u4 x2x2 x3x3 x4x4 Input data Vectorise cell: 11 real floating- point operations Rotate cell: 16 real floating-point operations For SMI: Rw=u  (r,x) (r’,0) cos  = r r 2 +x 2 sin  = x r 2 +x 2  (u,y)(u’,y’) u’ y’ = u y cos  sin  -sin  cos 

11 Good numerical properties. Arithmetic choices: –CORDIC: shift-add –Fixed-point: multiply-add –Floating-point: Higher dynamic range, allows algorithms with fewer operations & lower wordlength. Smallest! Highly parallel (Givens rotations) –Suits FPGA –Need to reduce parallelism for many applications! Features of QR FPGA Components

12 1,11,31,21,41,5 2,2 2,42,32,5 3,33,53,4 4,44,5 1,6 2,6 3,6 4,6 5,55,6 2. Discrete mapping 67% utilisation 80% 60% 40% 20% 100% 83% 67% 50% 33% 100% 67% utilisation 1. Mixed mapping Obtaining lower-levels of parallelism FPGA Components

13 Linear systolic array Novel mapping of QR FPGA Components

14 FPGA implementation bit FP 160MHz = 22 GFLOPSNumber of Ops 8xFP multipliers 2 x complex adder Complex multiplier FP add FP divider vectorise processor rotate processor rotate processor rotate processor rotate processor rotate processor rotate processor rotate processor FP add Complex multiplier XCV3200E-8 FPGA Components

15 QR processor - main features FPGA Components 1 Boundary 3 Internal 32 23%6 GFLOPS2.24W 4 1 Boundary 12 Internal 82% GFLOPS8W 6 LUTS Rams 3 FFs Mults 34 15K 16K 22% 23% 22% 23% 4 GFLOPS70WPentium TM 4 2GHz 1 1 Estimated (based on data from Richard Linderman) 2 Estimated (design too large for PC) 3 Also depends upon number of inputs 4 Obtained via Xpower 5 For XC2V Extrapolated 101MHz 5 1 Boundary 9 Internal 74% 15 GFLOPS7W 6 14-bit mantissa 17-bit mantissa 14-bit mantissa 97MHz 100MHz 2 SizeUtilisation (XC2V6000)OperationsPower Mantissa wordlength Clock x50

Heterogeneous design methodology Section 3

17 GEDAE TM Heterogeneous design methodology Graphically specify system

18 GEDAE TM Heterogeneous design methodology Graphically specify system –primitive functions in ‘c’

19 GEDAE TM Heterogeneous design methodology Graphically specify system –primitive functions in ‘c’ –executable specification

20 GEDAE TM Heterogeneous design methodology Graphically specify system –primitive functions in ‘c’ –executable specification Auto-code generation –parallel programme constructed by GEDAE

21 GEDAE TM Heterogeneous design methodology Graphically specify system –primitive functions in ‘c’ –executable specification Auto-code generation –parallel programme constructed by GEDAE Currently no support for FPGA –highly compatible model

22 Core based methodology Heterogeneous design methodology Cores used for key functions –FFT, QR, FIR filter... –Build in parallelism (manually) –Parameterised Processor

23 Core based methodology Heterogeneous design methodology Cores used for key functions –FFT, QR, FIR filter... –Build in parallelism (manually) –Parameterised Automatically generated system –communications inserted FPGA Processor FFT core from library

24 Core based methodology Heterogeneous design methodology Cores used for key functions –FFT, QR, FIR filter... –Build in parallelism (manually) –Parameterised Automatically generated system –communications inserted Architectural exploration –Compaan gives Matlab NLP to VHDL –RTL output in future version FPGA Processor FFT core from library

Adaptive beamformer demonstration overview Section 4

26 System mapping Demonstration overview Delay Sample PCI IF Virtual channel IF Chan 0 Chan 1 FPGA (TransTech PMC) Chan 2 Chan 3 QR DRx Virtual Channel API PowerPC (DNA VQG4 VME) Back- substitution Host PC (laptop) Environment synthesis and system configuration Host PC (laptop) Verification and display

27 Conclusion FPGAs –Performance dependent upon level of optimisation –Floating-point is realistic –10x compute improvement –5 - 20x power improvement Design is main issue –Hardware design: High levels of parallelism required –Core-based design approaches offer interim solution –Architectural synthesis tools are emerging

28 Acknowledgements The project to develop a core-based design methodology is a collaboration between: –QinetiQ Ltd poc: –BAE SYSTEMS ATC, Gt Baddow poc: Contributions have been made by John McAllister under contract with the Queen’s University of Belfast. This work was sponsored by the United Kingdom Ministry of Defence Corporate Research Programme.