RAW 2014 Over-Clocking of Linear Projection Designs Through Device Specific Optimisations Rui Policarpo Duarte 1, Christos-Savvas Bouganis

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Pattern Recognition and Machine Learning
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Biointelligence Laboratory, Seoul National University
Pattern Recognition and Machine Learning
Mapping for Better Than Worst-Case Delays In LUT-Based FPGA Designs Kirill Minkovich and Jason Cong VLSI CAD Lab Computer Science Department University.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Multi-Task Compressive Sensing with Dirichlet Process Priors Yuting Qi 1, Dehong Liu 1, David Dunson 2, and Lawrence Carin 1 1 Department of Electrical.
Chapter 4: Linear Models for Classification
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Presenter: Yufan Liu November 17th,
Visual Recognition Tutorial
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Lecture 5: Learning models using EM
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Linear and Non-Linear ICA-BSS I C A  Independent Component Analysis B S S  Blind Source Separation Carlos G. Puntonet Dept.of Architecture.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Petros OikonomakosBashir M. Al-Hashimi Mark Zwolinski Versatile High-Level Synthesis of Self-Checking Datapaths Using an On-line Testability Metric Electronics.
Automating Shift-Register-LUT Based Run-Time Reconfiguration Karel Heyse, Brahim Al Farisi, Karel Bruneel, Dirk Stroobandt
Summarized by Soo-Jin Kim
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Introduction to Adaptive Digital Filters Algorithms
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Optimising Explicit Finite Difference Option Pricing For Dynamic Constant Reconfiguration 1 Qiwei Jin*, David Thomas^, Tobias Becker*, and Wayne Luk* *Department.
Student : Andrey Kuyel Supervised by Mony Orbach Spring 2011 Final Presentation High speed digital systems laboratory High-Throughput FFT Technion - Israel.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
Low-Power Multipliers with Data Wordlength Reduction Kyungtae Han Brian L. Evans Earl E. Swartzlander, Jr.
AMIN FARMAHININ-FARAHANI CHARLES TSEN KATHERINE COMPTON FPGA Implementation of a 64-bit BID-Based Decimal Floating Point Adder/Subtractor.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Evaluation of Non-Uniqueness in Contaminant Source Characterization based on Sensors with Event Detection Methods Jitendra Kumar 1, E. M. Zechman 1, E.
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
J. Greg Nash ICNC 2014 High-Throughput Programmable Systolic Array FFT Architecture and FPGA Implementations J. Greg.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
September Bound Computation for Adaptive Systems V&V Giampiero Campa September 2008 West Virginia University.
Stable Multi-Target Tracking in Real-Time Surveillance Video
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
ITERATIVE CHANNEL ESTIMATION AND DECODING OF TURBO/CONVOLUTIONALLY CODED STBC-OFDM SYSTEMS Hakan Doğan 1, Hakan Ali Çırpan 1, Erdal Panayırcı 2 1 Istanbul.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Chapter 8 Lossy Compression Algorithms. Fundamentals of Multimedia, Chapter Introduction Lossless compression algorithms do not deliver compression.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
VLSI Design of 2-D Discrete Wavelet Transform for Area-Efficient and High- Speed Image Computing - End Presentation Presentor: Eyal Vakrat Instructor:
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
FPGA BASED REAL TIME VIDEO PROCESSING Characterization presentation Presented by: Roman Kofman Sergey Kleyman Supervisor: Mike Sumszyk.
Hybrid Bayesian Linearized Acoustic Inversion Methodology PhD in Petroleum Engineering Fernando Bordignon Introduction Seismic inversion.
RECONSTRUCTION OF MULTI- SPECTRAL IMAGES USING MAP Gaurav.
Shape2Pose: Human Centric Shape Analysis CMPT888 Vladimir G. Kim Siddhartha Chaudhuri Leonidas Guibas Thomas Funkhouser Stanford University Princeton University.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
Signal Prediction and Transformation Trac D. Tran ECE Department The Johns Hopkins University Baltimore MD
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Chapter 8 Lossy Compression Algorithms
EEE4176 Applications of Digital Signal Processing
Mahdi Nazemi, Shahin Nazarian, and Massoud Pedram July 10, 2017
Pattern Recognition and Machine Learning
Multivariate Methods Berlin Chen
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

RAW 2014 Over-Clocking of Linear Projection Designs Through Device Specific Optimisations Rui Policarpo Duarte 1, Christos-Savvas Bouganis Department of Electrical and Electronic Engineering Imperial College London, United Kingdom 1 Would like to thank the support from Fundação para a Ciência e Tecnologia (Foundation for Science and Technology in Portugal) through PhD grant SFRH/BD/ st Reconfigurable Architectures Workshop May 19-20, 2014, Phoenix, USA

RAW 2014 Introduction  Ever increasing demand for DSP applications processing more data and faster  Linear Projection is a widely adopted algorithm in DSP applications  FPGAs offer high performance, low-power, reconfigurabillity and small size implementation 2

RAW 2014 Introduction  Linear Projection examples:  Data compression, face recognition, synthetic apperture radar (high-performance)  EEG, ECG (low-power) 3 Images from

RAW 2014 KLT Algorithm  Karhunen-Loéve Transform  Describe data from a higher dimensional space in a smaller one using an orthogonal basis matrix Λ.  N data points in original space:  Projected data points:  Recover data in the original space via:  Using the Λ that best describes the data by minimising the objective function: 4

RAW 2014 KLT Implementation  Based on the dot-product operator  Architectures for the projection of 1 dimension Folded Unfolded Area savings Maximum performance 5

RAW 2014 Extreme Over-Clocking  Tools are conservative in their estimates  Go beyond error-free regime tested on the board  Applications that can tolerate some errors. 6

RAW 2014 Low-Power / High-Performance 7  Pipeline can’t always be applied  What options for high-throughput constraints in latency sensitive algorithms? Tool Fmax = 160 MHz Test Freq = 260 MHz

RAW 2014 Optimisation Framework (OF) Pre-characterisation of the arithmetic units under over-clocking Error and area models Problem parameters + input data Output VHDL with values for  coefficients Generic RTL 8

RAW 2014 Device Characterisation Use FPGA reconfiguration capability Over-clocked data-path under test via PLL Supports other operators Many units on the same device simultaneously Constant operating conditions –Voltage & Temperature Limitations: –Placement & Routing –Cyclone III, IV and V from Altera 9 e.g. Characterisation of a generic LUT-based multiplier

RAW 2014 Device Characterisation When operating in the error-prone regime constant coefficients aren’t equally affected Gap in performance more than 60 MHz 10

RAW 2014 Device Characterisation 11 Constant coefficient 222 Differencies in the error profiles for both locations due to varation in placement and routing and process variation

RAW 2014 Design Generation  Bayesian Factor Analysis model assumes error terms are independent and multivariate normally distributed with zero mean  Probability for each observed case:  As a result of a linear projection:  The framework iteratively samples, from a posterior distribution, (Gibbs) projection vectors for different word- lengths,  Selects the ones that minimise the objective function (i.e. MSE of back-projection) 12

RAW 2014 Test Case Linear Projection Z 6 to Z 3 –Folded dot-product operator Data sets: –Model: 100 cases –Test: 5k cases Reference design: KLT KLT Fmax: 160MHz Target clock frequency: 310 MHz –1.85x speedup 13

RAW 2014 Optimisation Results Model vs Actual KLT vs Optimisation Framework OF 10x better reconstruction MSE OF able to model performance under extreme over-clocking 14

RAW 2014 Optimisation Results Results: –Model: from the framework using the error model –Simulation*: characterisation w/ problem data –Actual: on the FPGA *used to evaluate the model generated by the optimisation framework 15

RAW 2014 Conclusions Novel unified methodology for implementation of extreme over-clocked Linear Projection designs on FPGAs It combines the problem of data approximation and error minimisation under over-clocking Performed better than typical implementation without extra resources Demonstrated at 1.85x the maximum clock frequency while providing best area-errors tradeoff 16

RAW 2014 Ongoing Developments  Low-power Designs (voltage variation)  Variation of operating temperatures  Temperature is very expensive to control and its variation changes the error models  DSP-based arithmetic units  Fixed P&R Rui Policarpo Duarte, Christos-Savvas Bouganis, A Unified Framework for Over-Clocking Linear Projections on FPGAs under PVT Variation., pp , 2014, ARC,  Bayesian formulation of other problems (e.g. FIR)  Acceleration of the sampling process 17

RAW 2014 Thank you Questions/Comments ? 18