GPU-Based Frequency Domain Volume Rendering Ivan Viola, Armin Kanitsar, and Meister Eduard Gröller Institute of Computer Graphics and Algorithms Vienna.

Slides:



Advertisements
Similar presentations
Scientific Computing on Graphics Hardware GPGPU Applications Examples Showcase.
Advertisements

David Hansen and James Michelussi
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Siggraph/Eurographics Workshop on Graphics Hardware 2001
EE4H, M.Sc Computer Vision Dr. Mike Spann
Hardware-Accelerated Adaptive EWA Volume Splatting Wei Chen ZJU Liu Ren CMU Matthias Zwicker MIT Hanspeter Pfister MERL.
Texture Synthesis Tiantian Liu. Definition Texture – Texture refers to the properties held and sensations caused by the external surface of objects received.
Interactive Deformation and Visualization of Level-Set Surfaces Using Graphics Hardware Aaron Lefohn Joe Kniss Charles Hansen Ross Whitaker Aaron Lefohn.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Reminder Fourier Basis: t  [0,1] nZnZ Fourier Series: Fourier Coefficient:
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Introduction to Fast Fourier Transform (FFT) Algorithms R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Shadow Volumes on Programmable Graphics Hardware Speaker: Alvin Date: 2003/11/3 EUROGRAPHICS 2003.
Interactive, GPU-Based Level Sets for 3D Segmentation Aaron Lefohn Joshua Cates Ross Whitaker University of Utah Aaron Lefohn Joshua Cates Ross Whitaker.
Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.
Focus of Attention for Volumetric Data Inspection Ivan Viola 1, Miquel Feixas 2, Mateu Sbert 2, and Meister Eduard Gröller 1 1 Institute of Computer Graphics.
Image reproduction. Slice selection FBP Filtered Back Projection.
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.
Hardware-Based Nonlinear Filtering and Segmentation using High-Level Shading Languages I. Viola, A. Kanitsar, M. E. Gröller Institute of Computer Graphics.
Accelerating Marching Cubes with Graphics Hardware Gunnar Johansson, Linköping University Hamish Carr, University College Dublin.
2D Fourier Theory for Image Analysis Mani Thomas CISC 489/689.
Acceleration on many-cores CPUs and GPUs Dinesh Manocha Lauri Savioja.
Real-Time Stereo Matching on Programmable Graphics Hardware Liang Wei.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
VIS Group, University of Stuttgart Tutorial T4: Programmable Graphics Hardware for Interactive Visualization Volume Graphics - Advanced Klaus Engel Volume.
Ray Tracing and Photon Mapping on GPUs Tim PurcellStanford / NVIDIA.
Enhancing GPU for Scientific Computing Some thoughts.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
Computer-Aided Design and Manufacturing Laboratory: 3D Minkowski sum computation Sara McMains UC Berkeley.
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
By : Arjun Radhakrishnan Supervisor : Prof. M. Inggs.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
Cg Programming Mapping Computational Concepts to GPUs.
Fast Memory Addressing Scheme for Radix-4 FFT Implementation Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Xin Xiao, Erdal Oruklu and.
Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.
Radar Pulse Compression Using the NVIDIA CUDA SDK
CSE 690: GPGPU Lecture 7: Matrix Multiplications Klaus Mueller Computer Science, Stony Brook University.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252.
Hardware-accelerated Rendering of Antialiased Shadows With Shadow Maps Stefan Brabec and Hans-Peter Seidel Max-Planck-Institut für Informatik Saarbrücken,
Evaluation of a Bricked Volume Layout for a Medical Workstation based on Java Peter Kohlmann, Stefan Bruckner, Armin Kanitsar, M. Eduard Gröller Institute.
Accelerated Stereoscopic Rendering using GPU François de Sorbier - Université Paris-Est France February 2008 WSCG'2008.
Vincent DeVito Computer Systems Lab The goal of my project is to take an image input, artificially blur it using a known blur kernel, then.
GEOMETRIC OPERATIONS. Transformations and directions Affine (linear) transformations Translation, rotation and scaling Non linear (Warping transformations)
Hardware-accelerated Point-based Rendering of Surfaces and Volumes Eduardo Tejada, Tobias Schafhitzel, Thomas Ertl Universität Stuttgart, Germany.
High Quality Silhouette Illustration for Texture Based Volume Rendering, Nagy and Klein.
by Arjun Radhakrishnan supervised by Prof. Michael Inggs
Reconfigurable FFT architecture
Cool Stuff. Emerging Capabilities Image Processing Massively Multipass.
Professor A G Constantinides 1 Discrete Fourier Transforms Consider finite duration signal Its z-tranform is Evaluate at points on z-plane as We can evaluate.
GPGPU: Parallel Reduction and Scan Joseph Kider University of Pennsylvania CIS Fall 2011 Credit: Patrick Cozzi, Mark Harris Suresh Venkatensuramenan.
Vincent DeVito Computer Systems Lab The goal of my project is to take an image input, artificially blur it using a known blur kernel, then.
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
A novel approach to visualizing dark matter simulations
… Sampling … … Filtering … … Reconstruction …
SIFT on GPU Changchang Wu 5/8/2007.
Fast Fourier Transforms Dr. Vinu Thomas
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Real-time 1-input 1-output DSP systems
Static Image Filtering on Commodity Graphics Processors
Sorting and Searching Tim Purcell NVIDIA.
GPGPU: Parallel Reduction and Scan
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
Mapping the FFT Algorithm to the IBM Cell Processor
RADEON™ 9700 Architecture and 3D Performance
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM
Presentation transcript:

GPU-Based Frequency Domain Volume Rendering Ivan Viola, Armin Kanitsar, and Meister Eduard Gröller Institute of Computer Graphics and Algorithms Vienna University of Technology

Ivan ViolaVienna University of Technology 2 / 16 Motivation volume rendering is time consuming volume rendering is time consuming computational complexity is O(N 3 ) computational complexity is O(N 3 ) our goal: fastest volume rendering our goal: fastest volume rendering GPUs GPUs  very fast fragment processor  very fast memory access Fourier Volume Rendering (FVR) Fourier Volume Rendering (FVR)  theoretically fastest volume rendering

Ivan ViolaVienna University of Technology 3 / 16 GPU Frequency Domain Volume Rendering CPU

Ivan ViolaVienna University of Technology 4 / 16 FVR Characteristics Pros computational complexity O(N 2 log(N)) computational complexity O(N 2 log(N)) renders the whole volume not iso-surfaces renders the whole volume not iso-surfaces very fast rendering stage: very fast rendering stage:  slicing in frequency domain  inverse 2D Fourier transform Cons rendering results into X-ray images rendering results into X-ray images time-consuming preprocessing time-consuming preprocessing

Ivan ViolaVienna University of Technology 5 / 16 Rendering Stage 1: Slicing stage with the highest speed-up stage with the highest speed-up nearest neighbor interpolation nearest neighbor interpolation  supported by GPU tri-linear interpolation tri-linear interpolation tri-cubic interpolation tri-cubic interpolation windowed sinc of width four windowed sinc of width four

Ivan ViolaVienna University of Technology 6 / 16 Tri-Linear Interpolation not natively supported by graphics hardware not natively supported by graphics hardware can be computed using the LRP instruction can be computed using the LRP instruction [1,1] [0,0] [X,Y] frac(8X)

Ivan ViolaVienna University of Technology 7 / 16 Cubic Interpolation & Windowed sinc not natively supported by graphics hardware not natively supported by graphics hardware no equivalent to LRP instruction no equivalent to LRP instruction filter kernel stored in textures [Hadwiger et al. VMV’01] filter kernel stored in textures [Hadwiger et al. VMV’01]  separability of 3D kernel  filters of width four  stored in RGBA 1D texture

Ivan ViolaVienna University of Technology 8 / 16 Rendering Stage 2: Inverse 2D FFT 1D FFT consists of two parts 1D FFT consists of two parts  scrambling  butterfly operation

Ivan ViolaVienna University of Technology 9 / 16 Fast Fourier Transform in 1D a0a0a0a0 a1a1a1a1 a2a2a2a2 a3a3a3a3 a4a4a4a4 a5a5a5a5 a6a6a6a6 a7a7a7a7 a0a0a0a0 a4a4a4a4 a2a2a2a2 a6a6a6a6 a1a1a1a1 a5a5a5a5 a3a3a3a3 a7a7a7a7 scramble WkNWkNWkNWkN W08W08W08W08 W28W28W28W28 W48W48W48W48 W68W68W68W68 W08W08W08W08 W28W28W28W28 W48W48W48W48 W68W68W68W68 butterfly W08W08W08W08 W18W18W18W18 W28W28W28W28 W38W38W38W38 W48W48W48W48 W58W58W58W58 W68W68W68W68 W78W78W78W78 A0A0A0A0 A1A1A1A1 A2A2A2A2 A3A3A3A3 A4A4A4A4 A5A5A5A5 A6A6A6A6 A7A7A7A7

Ivan ViolaVienna University of Technology 10 / 16 Fast Fourier Transform on the GPU two buffers – ping-pong rendering two buffers – ping-pong rendering two channels rendering buffers required two channels rendering buffers required scramble pass scramble pass  1D lookup butterfly passes butterfly passes  log 2 (N) passes  texture encodes W k N W k N p and q coordinate p and q coordinate butterfly sign butterfly sign

Ivan ViolaVienna University of Technology 11 / 16 Hartley Transform - Alternative to FFT real input is transformed into real output real input is transformed into real output  ½ memory requirements scrambling the same as in FFT scrambling the same as in FFT double-butterfly operation double-butterfly operation  three source values, cos and sin HT not separable HT not separable  additional correction pass required  GPU implementation not faster than FFT

Ivan ViolaVienna University of Technology 12 / 16 Fast Hartley Transform on the GPU similar to FFT – ping-pong rendering similar to FFT – ping-pong rendering only one channel rendering buffers required only one channel rendering buffers required scrambling the same scrambling the same double-butterfly double-butterfly  two lookup textures addresses of source values (3 channels) addresses of source values (3 channels) cos and sin terms (2 channels) cos and sin terms (2 channels)

Ivan ViolaVienna University of Technology 13 / 16 Results Framerates for ATI Radeon 9800 XT ResolutionNNTLTC IFFT 2D 256x x

Ivan ViolaVienna University of Technology 14 / 16 Demo

Ivan ViolaVienna University of Technology 15 / 16 Conclusions rendering stage of FVR very fast on GPU rendering stage of FVR very fast on GPU  slicing – high performance gain  wrap around is “for free”  speed-up also for inverse FFT nearest neighbour – very poor quality nearest neighbour – very poor quality tri-linear interpolation – high performace tri-linear interpolation – high performace tri-cubic interpolation – high quality tri-cubic interpolation – high quality

Ivan ViolaVienna University of Technology 16 / 16 Thank You!