Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

DCSP-13 Jianfeng Feng Department of Computer Science Warwick Univ., UK
David Hansen and James Michelussi
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Unstructured Data Partitioning for Large Scale Visualization CSCAPES Workshop June, 2008 Kenneth Moreland Sandia National Laboratories Sandia is a multiprogram.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Adapted from: CULLIDE: Interactive Collision Detection Between Complex Models in Large Environments using Graphics Hardware Naga K. Govindaraju, Stephane.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Some Properties of the 2-D Fourier Transform Translation Distributivity and Scaling Rotation Periodicity and Conjugate Symmetry Separability Convolution.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Fourier Transform and Applications
Input image Output image Transform equation All pixels Transform equation.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
Parallelizing the Fast Fourier Transform David Monismith cs599.
Topic 7 - Fourier Transforms DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Enhancing GPU for Scientific Computing Some thoughts.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
1 Chapter 5 Image Transforms. 2 Image Processing for Pattern Recognition Feature Extraction Acquisition Preprocessing Classification Post Processing Scaling.
Cg Programming Mapping Computational Concepts to GPUs.
Chapter 6 Digital Filter Structures
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Digital Image Processing Chapter 4 Image Enhancement in the Frequency Domain Part I.
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Accelerated Stereoscopic Rendering using GPU François de Sorbier - Université Paris-Est France February 2008 WSCG'2008.
Fast Fourier Transform & Assignment 2
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
Practical Image Processing1 Chap7 Image Transformation  Image and Transformed image Spatial  Transformed domain Transformation.
Digital Signal Processing
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
Ray Tracing using Programmable Graphics Hardware
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
CS 376b Introduction to Computer Vision 03 / 17 / 2008 Instructor: Michael Eckmann.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
The Frequency Domain Digital Image Processing – Chapter 8.
1 “A picture speaks a thousand words.” Art By Ranjith & Waquas Islamiah Evening College.
Chapter 2 Divide-and-Conquer algorithms
The content of lecture This lecture will cover: Fourier Transform
GPU Architecture and Its Application
Image Enhancement and Restoration
CS 591 S1 – Computational Audio
CS 591 S1 – Computational Audio
CE Digital Signal Processing Fall Discrete-time Fourier Transform
Chapter 2 Divide-and-Conquer algorithms
Lecture 4: Imaging Theory (2/6) – One-dimensional Fourier transforms
Linear Filters in StreamIt
Graphics Processing Unit
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
Fast Fourier Transforms Dr. Vinu Thomas
Ray-Cast Rendering in VTK-m
Introduction to Computer Graphics with WebGL
Real-time 1-input 1-output DSP systems
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
CSCE 643 Computer Vision: Thinking in Frequency
Static Image Filtering on Commodity Graphics Processors
Graphics Processing Unit
Z TRANSFORM AND DFT Z Transform
Chapter 9 Computation of the Discrete Fourier Transform
RADEON™ 9700 Architecture and 3D Performance
Lecture 4 Image Enhancement in Frequency Domain
Electrical Communication Systems ECE Spring 2019
6- General Purpose GPU Programming
Presentation transcript:

Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Overview Introduction FFT Techniques Implementation Results Motivation, FFT review. FFT Techniques Exploitable FFT properties. Implementation Results Performance, applications, conclusions. Graphics Hardware 2003

Motivation The Fourier transform is a principal tool for digital image processing. Filtering. Correction. Compression. Classification. Generation. As such, should not our graphics hardware support such a tool? Graphics Hardware 2003

The Discrete Fourier Transform Converts data in the spatial or temporal domain into frequencies the data comprise. Graphics Hardware 2003

The Discrete Fourier Transform 2D transform can be computed by applying the transform in one direction, then the other. DFT IDFT Graphics Hardware 2003

The Fast Fourier Transform Divide and Conquer Algorithm Input sequence is divided into subsequences consisting of values from even and odd indices, respectively. Graphics Hardware 2003

Index Magic Do not use recursion. Indexing is non-obvious. Use dynamic programming: iterate over entire array computing all values for each recursive depth together, like mergesort. Indexing is non-obvious. Unlike mergesort, recursive step does not divide array into contiguous chunks. At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions? Graphics Hardware 2003

Index Magic Common solution: rearrange data by reversing the bits of indices. FFT can occur with contiguous partitions. Requires an extra data copy. Our solution, determine indexing in place. Note that the paper has a typo. Graphics Hardware 2003

Fourier Symmetry of Real Sequences In general, the frequency spectra of even real functions contain imaginary values. Captures magnitude and phase shift of sinusoids. Brute force FFT doubles computation and storage costs. But, Fourier transforms of real functions have symmetry. Values at and are real (because they are conjugates with themselves). Graphics Hardware 2003

Fourier Transform of Real Functions Pick two functions, let them be f(x) and g(x). Let h(x) = f(x) + j g(x). Note that there is no loss of information. Can perform FFT of h in half the time as performing the brute force FFT of f and g individually. Simply point to one row of image as real components and another as imaginary components. f g Graphics Hardware 2003

Untangling Fourier Transform Pairs Fourier transform is linear. H(u) = F(u) + j G(u) We can “untangle” using symmetry of F and G. Add and subtract H(u) and H(N – u) to cancel out conjugate terms of F and G. Graphics Hardware 2003

Untangling Fourier Transform Pairs Graphics Hardware 2003

Packing Transforms of Real Functions We can store Fourier transform in an array the same size as the input. Throw away conjugate duplicates. Throw away imaginary values known to be zero. Real Values Imaginary Values Graphics Hardware 2003

Column-wise FFT We have two columns with real values. Use same “tangled” approach. All other columns are complex numbers. Use regular FFT. Real Real Paired for Complex Graphics Hardware 2003

Packing 2D Transforms of Real Functions Rows transformed from complex values are already packed appropriately. The two rows transformed from real values are untangled and packed to follow suite. Real Values Imaginary Values Graphics Hardware 2003

Available Resources nVidia GeForce FX 5800 Ultra. Cg Full 32-bit floating point pipeline and frame buffers. Fully programmable vertex and fragment units. Cg High level language for vertex and fragment programs. Traditional CPU: 1.7 GHz Intel Zeon Freely available high performance FFT implementations. Graphics Hardware 2003

Implementation Using a SIMD model for parallel computation. Draw quadrilateral parallel to screen. Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral. Inputs/output dependent on location of pixel the fragment program is running. We require many rendering passes. Use “render to texture” extension. Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation. Graphics Hardware 2003

Implementation Frequency Spectra Images FFT Untangle FFT Untangle FFT Imaginary Tangled Real Real G F Imag. Scale Untangled Real Real, Tangled Imag., Tangled Imaginary Scale R, F I, F R, G I, G Images Frequency Spectra FFT Untangle FFT Untangle Imaginary Tangled Real Real G F Imag. Pass Untangled Real Real, Tangled Imag., Tangled Imaginary Pass R, F I, F R, G I, G Graphics Hardware 2003

Fragment Programs Written in Cg, compiled for GeForce FX. Program Instructions Arithmetic Texture FFT 27 3 Untangle 4 2 Scale 1 Tangle Pass Multiply 66 Graphics Hardware 2003

Applications Digital image filtering. Graphics Hardware 2003

Applications Texture generation. Volume rendering. Graphics Hardware 2003

Performance Computation speed: 2.5 GigaFLOPS Texture read rate: 3.4 GB/sec Image Size Rendering Rate (Hz) Arithmetic (sec) Texture Lookup (sec) 10242 0.37 1.9 0.6 5122 1.6 0.44 0.13 2562 6.7 0.09 0.03 1282 25 0.01 0.007 Graphics Hardware 2003

Conclusions The Fourier transform on the GPU has many potential applications. A well established FFT on the CPU (FFTW) still has an edge over GPU implementation. Both software and hardware of GPU are first generations. Room for improvement. Graphics Hardware 2003

Get the Cg Code http://www.cgshaders.org ? http://www.cs.unm.edu/~kmorel/documents/fftgpu kmorel@sandia.gov Graphics Hardware 2003

Questions? Graphics Hardware 2003