RAW 2014 Over-Clocking of Linear Projection Designs Through Device Specific Optimisations Rui Policarpo Duarte 1, Christos-Savvas Bouganis Department of Electrical and Electronic Engineering Imperial College London, United Kingdom 1 Would like to thank the support from Fundação para a Ciência e Tecnologia (Foundation for Science and Technology in Portugal) through PhD grant SFRH/BD/ st Reconfigurable Architectures Workshop May 19-20, 2014, Phoenix, USA
RAW 2014 Introduction Ever increasing demand for DSP applications processing more data and faster Linear Projection is a widely adopted algorithm in DSP applications FPGAs offer high performance, low-power, reconfigurabillity and small size implementation 2
RAW 2014 Introduction Linear Projection examples: Data compression, face recognition, synthetic apperture radar (high-performance) EEG, ECG (low-power) 3 Images from
RAW 2014 KLT Algorithm Karhunen-Loéve Transform Describe data from a higher dimensional space in a smaller one using an orthogonal basis matrix Λ. N data points in original space: Projected data points: Recover data in the original space via: Using the Λ that best describes the data by minimising the objective function: 4
RAW 2014 KLT Implementation Based on the dot-product operator Architectures for the projection of 1 dimension Folded Unfolded Area savings Maximum performance 5
RAW 2014 Extreme Over-Clocking Tools are conservative in their estimates Go beyond error-free regime tested on the board Applications that can tolerate some errors. 6
RAW 2014 Low-Power / High-Performance 7 Pipeline can’t always be applied What options for high-throughput constraints in latency sensitive algorithms? Tool Fmax = 160 MHz Test Freq = 260 MHz
RAW 2014 Optimisation Framework (OF) Pre-characterisation of the arithmetic units under over-clocking Error and area models Problem parameters + input data Output VHDL with values for coefficients Generic RTL 8
RAW 2014 Device Characterisation Use FPGA reconfiguration capability Over-clocked data-path under test via PLL Supports other operators Many units on the same device simultaneously Constant operating conditions –Voltage & Temperature Limitations: –Placement & Routing –Cyclone III, IV and V from Altera 9 e.g. Characterisation of a generic LUT-based multiplier
RAW 2014 Device Characterisation When operating in the error-prone regime constant coefficients aren’t equally affected Gap in performance more than 60 MHz 10
RAW 2014 Device Characterisation 11 Constant coefficient 222 Differencies in the error profiles for both locations due to varation in placement and routing and process variation
RAW 2014 Design Generation Bayesian Factor Analysis model assumes error terms are independent and multivariate normally distributed with zero mean Probability for each observed case: As a result of a linear projection: The framework iteratively samples, from a posterior distribution, (Gibbs) projection vectors for different word- lengths, Selects the ones that minimise the objective function (i.e. MSE of back-projection) 12
RAW 2014 Test Case Linear Projection Z 6 to Z 3 –Folded dot-product operator Data sets: –Model: 100 cases –Test: 5k cases Reference design: KLT KLT Fmax: 160MHz Target clock frequency: 310 MHz –1.85x speedup 13
RAW 2014 Optimisation Results Model vs Actual KLT vs Optimisation Framework OF 10x better reconstruction MSE OF able to model performance under extreme over-clocking 14
RAW 2014 Optimisation Results Results: –Model: from the framework using the error model –Simulation*: characterisation w/ problem data –Actual: on the FPGA *used to evaluate the model generated by the optimisation framework 15
RAW 2014 Conclusions Novel unified methodology for implementation of extreme over-clocked Linear Projection designs on FPGAs It combines the problem of data approximation and error minimisation under over-clocking Performed better than typical implementation without extra resources Demonstrated at 1.85x the maximum clock frequency while providing best area-errors tradeoff 16
RAW 2014 Ongoing Developments Low-power Designs (voltage variation) Variation of operating temperatures Temperature is very expensive to control and its variation changes the error models DSP-based arithmetic units Fixed P&R Rui Policarpo Duarte, Christos-Savvas Bouganis, A Unified Framework for Over-Clocking Linear Projections on FPGAs under PVT Variation., pp , 2014, ARC, Bayesian formulation of other problems (e.g. FIR) Acceleration of the sampling process 17
RAW 2014 Thank you Questions/Comments ? 18