DUKEMIT-LLHPEC-2007 FFTs of Arbitrary Dimensions on GPUs Xiaobai Sun and Nikos Pitsianis Duke University September 19, 2007 At High Performance Embedded.

Slides:



Advertisements
Similar presentations
Speed, Accurate and Efficient way to identify the DNA.
Advertisements

Is There a Real Difference between DSPs and GPUs?
Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
Lecture 1 Computer Graphics Hardware Basic graphics hardware –Display devices –Video controller –Memory –CPU –System bus Graphics Hardware # 1 CG show.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Image Compression System Megan Fuller and Ezzeldin Hamed 1.
DUKE CSNUFFT on Multicores HPEC-2008 Sept. 25, 2008NUFFTs on Multicores 1 Parallelization of NUFFT with Radial Data on Multicore Processors Nikos Pitsianis.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
Spatial Information Systems (SIS) COMP Raster-based structures (2) Data conversion.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Introduction to Volume Visualization Mengxia Zhu Fall 2007.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.
IAT 3551 Computer Graphics Overview Color Displays Drawing Pipeline.
Lecture 6 Sept 15, 09 Goals: two-dimensional arrays matrix operations circuit analysis using Matlab image processing – simple examples.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Components Text Text--Processing Software A Word Processor is a software application that provides the user with the tools to create and edit text.
Basic 3D Graphics Chapter 5. Bird’s Eye View  Basic 3D Graphics –Basic concepts of 3D graphics, rendering pipeline, Java 3D programming, scene graph,
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 Fast Extraction of Feature Salience Maps for Rapid Video Data Analysis Nikos P.
Course Overview, Introduction to CG Glenn G. Chappell U. of Alaska Fairbanks CS 381 Lecture Notes Friday, September 5, 2003.
COMP Bitmapped and Vector Graphics Pages Using Qwizdom.
Enhancing GPU for Scientific Computing Some thoughts.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
COMP 261 Lecture 16 3D Rendering. input: set of polygons viewing direction direction of light source(s) size of window. output: an image Actions rotate.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
Geometric Transformations
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
High Performance Scalable Base-4 Fast Fourier Transform Mapping Greg Nash Centar 2003 High Performance Embedded Computing Workshop
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
GPU Architecture and Programming
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Mar. 1, 2001Parallel Processing1 Parallel Processing (CS 730) Lecture 9: Distributed Memory FFTs * Jeremy R. Johnson Wed. Mar. 1, 2001 *Parts of this lecture.
Based on paper by: Rahul Khardekar, Sara McMains Mechanical Engineering University of California, Berkeley ASME 2006 International Design Engineering Technical.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 FAST PATTERN MATCHING IN 3D IMAGES ON GPUS Patrick Eibl, Dennis Healy, Nikos P.
Vector and symbolic processors
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Copyright © 2004, Dillon Engineering Inc. All Rights Reserved. An Efficient Architecture for Ultra Long FFTs in FPGAs and ASICs  Architecture optimized.
MSIM 842 VISUALIZATION II INSTRUCTOR: JESSICA R. CROUCH 1 A Particle System for Interactive Visualization of 3D Flows Jens Krüger Peter Kipfer.
CISC 110 Day 3 Introduction to Computer Graphics.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
1 GPU programming Dr. Bernhard Kainz. 2 Dr Bernhard Kainz Overview About myself Motivation GPU hardware and system architecture GPU programming languages.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
An FFT for Wireless Protocols Dr. J. Greg Nash Centar ( HAWAI'I INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES Mobile.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Chapter 3 Data Representation
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Buffers Ed Angel Professor Emeritus of Computer Science
Yuanrui Zhang, Mahmut Kandemir
Chapter III, Desktop Imaging Systems and Issues: Lesson IV Working With Images
The Graphics Rendering Pipeline
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
Buffers Ed Angel Professor Emeritus of Computer Science
Presentation transcript:

DUKEMIT-LLHPEC-2007 FFTs of Arbitrary Dimensions on GPUs Xiaobai Sun and Nikos Pitsianis Duke University September 19, 2007 At High Performance Embedded Computing 2007 MIT-LL

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 2 Overview Motivation – FFTs of arbitrary dimensions and their applications – Graphics processing units (GPUs) Basic facts on dimensionality FFTs on GPUs 1.2D FFT is chosen as the primitive one at API level 2.2D FFT performance is conveyed to FFTs of other dimensions Experimental results Discussion of related issues and works

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 3 Remove Keystone Remove Keystone Spatially Variant Refocus Spatially Variant Refocus Motion Comp Motion Comp Auto- Focus Auto- Focus Motivation : FFT Applications From S. Bellofiore and H. Schmitt at Polar Format NUFFT Polar Format NUFFT

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 4 Image degradation by traditional interpolation priori to 2-D FFT –Loss of resolution –Loss of data Range Samples  PRFs 2-D FFT Polar Format 1:1 Range 1.25:1 Azimuth From S. Bellofiore and H. Schmitt at

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 5 Motivation : GPU Architecture GPU : Graphics Processing Unit Highly parallel multi-processors Affordable commodity product Initially dedicated to graphics processing and rendering Presently capable of co-processing on Desktop, Laptop Increasing programmability and API support Image processing & rendering GP-GPU

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 6 Basic Facts on Dimensionality 1.In mathematics, FFTs are considered dimensionless in the sense that the factorizations can be described in a unified, recursive representation with provided scaling factors some of which are dimension dependent. In computation, trivial scaling may be skipped. 2.In application, it is often required that phase-frequency information, or spatial and geometric relation, be provided explicitly at input and output. FFT data are not shapeless. 3.In architecture, extra dimensions are induced by the data access patterns most efficiently supported. FFT data are transfigured at different memory level. GPUs support fine-granularity, 2D access to memory frames at the API level

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 7 2D FFT as the Primitive 2D FFT 2D data placement –Two complex numbers per pixel vector (4 floating point numbers) : one at the front, one at the back –Even columns at the front layers, odd columns at the back layers 2D array operations through –utilizing best the architectural support of 2D data access at API level –Radix-2, radix 3 and mixed radices Direct 2D bit-reversal –Up to certain sub-array size –2D data partitioning in large data array Re( X ) Im( X )

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 8 Radix 2, Radix 3 and Mixed Radices

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 9 Direct Two Dimensional Bit Reversal Not one dimension after another Not recursion up to certain frame block size (low bits) Not recursion up to certain frame block size (low bits) For large data size, block swaps (bit reversal in high bits) For large data size, block swaps (bit reversal in high bits) X( i, j ) X( R m (i), R n (j) )

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 10 2D Bit Reversal 2K 30.4 ms 2K 19.6 ms

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 11 2D Bit Reversal 1K 2K 17.5 ms 9.8 ms 14.8 ms

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 12

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 13

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 14 FFTs of Other Dimensions 1D FFT of size n 3D FFT of dimensions Add a scaling stage in 2D FFT Skip a scaling stage in 2D FFT ( In a simple case )

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 15

DUKEMIT-LLHPEC-2007 Sept. 19, 2007FFTs of Arbitrary Dimensions on GPUs 16 Other Issues and Works Twiddle factors : –Pre-calculated, partially calculated, calculate on the fly –Numerical behavior Data loading and unloading –Data placement in main memory –A sequence of successive FFTs Automated tuning Other commodity products –IBM Cell FPGAs