Session 4: Reconfigurable Computing

Slides:

Advertisements

Similar presentations

The HPEC Challenge Benchmark Suite

Advertisements

Chapter 3 Embedded Computing in the Emerging Smart Grid Arindam Mukherjee, ValentinaCecchi, Rohith Tenneti, and Aravind Kailas Electrical and Computer.

Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical.

Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project.

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.

EEL6686 Guest Lecture February 25, 2014 A Framework to Analyze Processor Architectures for Next-Generation On-Board Space Computing Tyler M. Lovelly Ph.D.

Applications of Systolic Array FTR, IIR filtering, and 1-D convolution. 2-D convolution and correlation. Discrete Furier transform Interpolation 1-D and.

Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.

Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.

© 2011 The MITRE Corporation. All rights reserved. Sharon Sacco / The MITRE Corporation HPEC 2011 Focus 3: GPU.

Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.

BENCHMARK SUITE RADAR SIGNAL & DATA PROCESSING CERES EPC WORKSHOP

Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.

03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.

Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Computational Technologies for Digital Pulse Compression

Efficient FPGA Implementation of QR

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.

An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 24, 2003 Authors Ken Cameron

HPEC-1 MIT Lincoln Laboratory Session 1: GPU: Graphics Processing Units Miriam Leeser / Northeastern University HPEC Conference 15 September 2010.

 Embedded Digital Signal Processing (DSP) systems  Specification with floating-point data types  Implementation in fixed-point architectures  Precision.

Polymorphous Computing Architectures Run-time Environment And Design Application for Polymorphous Technology Verification & Validation (READAPT V&V) Lockheed.

Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.

Distributed WHT Algorithms Kang Chen Jeremy Johnson Computer Science Drexel University Franz Franchetti Electrical and Computer Engineering.

Cousins HPEC 2002 Session 4: Emerging High Performance Software David Cousins Division Scientist High Performance Computing Dept. Newport,

MIT Lincoln Laboratory PCAKernels-1 JML 24 Sep 2003 Kernel Benchmarks and Metrics for Polymorphous Computer Architectures Hank Hoffmann James Lebak (Presenter)

HPEC2002_Session1 1 DRM 11/11/2015 MIT Lincoln Laboratory Session 1: Novel Hardware Architectures David R. Martinez 24 September 2002 This work is sponsored.

QCAdesigner – CUDA HPPS project

Lecture 6. VFP & NEON in ARM

An Integrated Design Environment to Evaluate Power/Performance Tradeoffs for Sensor Network Applications Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna.

Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.

An Open Architecture for an Embedded Signal Processing Subsystem

Flexible Filters for High Performance Embedded Computing Rebecca Collins and Luca Carloni Department of Computer Science Columbia University.

Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.

High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory

PTCN-HPEC-02-1 AIR 25Sept02 MIT Lincoln Laboratory Resource Management for Digital Signal Processing via Distributed Parallel Computing Albert I. Reuther.

GTRI_B-1 ECRB - HPC - 1 GPU SAS - 1 Using Graphics Processors to Accelerate Synthetic Aperture Sonar Imaging via Backpropagation 2010 High Performance.

Custom Reduction of Arithmetic in Linear DSP Transforms S. Misra, A. Zelinski, J. C. Hoe, and M. Püschel Dept. of Electrical and Computer Engineering Carnegie.

HPEC-1 SMHS 7/7/2016 MIT Lincoln Laboratory Focus 3: Cell Sharon Sacco / MIT Lincoln Laboratory HPEC Workshop 19 September 2007 This work is sponsored.

Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.

Lecture 10 CUDA Instructions

Atindra Mitra Joe Germann John Nehrbass

Fang Fang James C. Hoe Markus Püschel Smarahara Misra

Two-Dimensional Phase Unwrapping On FPGAs And GPUs

Dynamo: A Runtime Codesign Environment

Accelerators to Applications

ELEC 7770 Advanced VLSI Design Spring 2016 Introduction

M. Richards1 ,D. Campbell1 (presenter), R. Judd2, J. Lebak3, and R

SmartCell: A Coarse-Grained Reconfigurable Architecture for High Performance and Low Power Embedded Computing Xinming Huang Depart. Of Electrical and Computer.

A Quantitative Analysis of Stream Algorithms on Raw Fabrics

Introduction to Control Systems Objectives

Chapter 6 Floating Point

Reconfigurable Computing MONARCH/MCHIP

ELEC 7770 Advanced VLSI Design Spring 2014 Introduction

Centar ( Global Signal Processing Expo

Floating Point Representation

Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke

ELEC 7770 Advanced VLSI Design Spring 2012 Introduction

PACE: Power-Aware Computing Engines

CPU Design & Computer Arithmetic

ELEC 7770 Advanced VLSI Design Spring 2010 Introduction

Chapter 1 Introduction.

Mapping DSP algorithms to a general purpose out-of-order processor

ECE Computer Engineering Design Project

Lecture #18 FAST FOURIER TRANSFORM ALTERNATE IMPLEMENTATIONS

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

Introduction SYSC5603 (ELG6163) Digital Signal Processing Microprocessors, Software and Applications Miodrag Bolic.

COMPUTER ORGANIZATION AND ARCHITECTURE

Why systolic architectures?

Presentation transcript:

Session 4: Reconfigurable Computing HPEC 2003 Workshop Session 4: Reconfigurable Computing Session Chair: David Cousins Division Scientist High Performance Computing Dept BBN Technologies

It’s always been about Performance “Civilization advances by extending the number of important operations which we can perform without thinking.” Alfred North Whitehead (1861 - 1947) Introduction to Mathematics (1911) “Never promise more than you can perform.” Publilius Syrus (~100 BC), Maxims

Increasing performance is the common thread in this session Increasing Performance with Reconfigurable Computing through: Algorithm decomposition Arithmetic bit-width manipulation A new SIMD on-a-die architecture Morph-able computing architecture Stability across multiple kernels and data sizes

Custom Reduction of Arithmetic in Linear DSP Transforms Smarahara Misra, James C. Hoe, Markus Püschel, Electrical and Computer Engineering, CMU Performance through algorithm decomposition Defines a process for generating cost optimal multiplier-less algorithms with SPIRAL Manipulate SPIRAL output to increase numerical stability Use constrained optimization to reduce the number of operations while still satisfying quality threshold Evolutionary and greedy search algorithms Map to Verilog Presents experimental results: DCT8 and DFT16

Performance through bit-width manipulation Precision Modeling and Bit-width Optimization of Floating-Point Applications Zhihong Zhao, Alternative System Concepts Miriam Leeser, Northeastern University Performance through bit-width manipulation Optimal FP bit-widths are the smallest bit-widths that satisfy accuracy requirements. Apply an FP precision modeling approach Avoids computational intensity of simulation-based approaches Models takes the form: error = f(op, bit-width) Models are built by profiling a Control and Data Flow Graph of the application Application Precision model is then optimized using Grid Steepest Descent

Performance through a new SIMD processing architecture: An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications Stewart Reddaway, Pete Rogina, WorldScape Defense Co. Ken Cameron, Simon McIntosh-Smith, David Stuttard, ClearSpeed Tech. Michael Koch, Rick Pancoast, Joe Racosky, Lockheed Martin Performance through a new SIMD processing architecture: Multi-Threaded Array Processor Array of processing elements on a single die. Packet switched bus architecture HPEC application performance benchmarks Cycle-accurate simulator

Performance through morphable architectures DARPA PCA for Embedded Defense Signal and Image Processing Applications Michael Koch, Joe Racosky, Mike Iaquinto, Rick Pancoast, Lockheed Martin Steve Crago, Matt French, University of Southern California Performance through morphable architectures Describes an embedded processing application Radar waveform signal processing Architecture morph Non-coherent integration processing Processing functions: Radar pulse compression, magnitude computation, range-walk compensation, and non-coherent integration Benchmark results compare conventional PowerPC, PCA simulation, and actual PCA hardware

Kernel Benchmarks and Metrics for Polymorphous Computer Architectures James Lebak, Hank Hoffmann, Janice McMahon; MIT Lincoln Laboratory Performance measurement across seven kernel benchmarks Considerable variation in throughput Stability  Minimum/maximum throughput A chief goal of PCA is for stable performance across a range of kernels and data sizes. Presents performance results for several kernels on the MIT RAW simulator

Invited Speaker: Robert Graybill Program Manager DARPA IPTO Data Intensive Systems, Power Aware Computing and Communications, Polymorphous Computing Architectures, High Productivity Computing Systems Topic: Are we adrift in the sea of COTS? Review HPEC technology directions from an historical perspective DARPA ITO-IPTO and MTO Future suggestions