DUKE CSNUFFT on Multicores HPEC-2008 Sept. 25, 2008NUFFTs on Multicores 1 Parallelization of NUFFT with Radial Data on Multicore Processors Nikos Pitsianis.

Slides:



Advertisements
Similar presentations
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
Advertisements

2D Geometric Transformations
Lecture 6: Multicore Systems
Kinematic Synthesis of Robotic Manipulators from Task Descriptions June 2003 By: Tarek Sobh, Daniel Toundykov.
N-Body I CS 170: Computing for the Sciences and Mathematics.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
High Performance Computing and the FLAME Framework Prof C Greenough, LS Chin and Dr DJ Worth STFC Rutherford Appleton Laboratory Prof M Holcombe and Dr.
Session: Computational Wave Propagation: Basic Theory Igel H., Fichtner A., Käser M., Virieux J., Seriani G., Capdeville Y., Moczo P.  The finite-difference.
Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft.
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
Project Overview Reconstruction in Diffracted Ultrasound Tomography Tali Meiri & Tali Saul Supervised by: Dr. Michael Zibulevsky Dr. Haim Azhari Alexander.
Reminder Fourier Basis: t  [0,1] nZnZ Fourier Series: Fourier Coefficient:
DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.
Fast Fourier Transform. Agenda Historical Introduction CFT and DFT Derivation of FFT Implementation.
Java for High Performance Computing Jordi Garcia Almiñana 14 de Octubre de 1998 de la era post-internet.
Image Enhancement in the Frequency Domain Part I Image Enhancement in the Frequency Domain Part I Dr. Samir H. Abdul-Jauwad Electrical Engineering Department.
UMass Lowell Computer Science Advanced Algorithms Computational Geometry Prof. Karen Daniels Spring, 2004 Project.
Some Properties of the 2-D Fourier Transform Translation Distributivity and Scaling Rotation Periodicity and Conjugate Symmetry Separability Convolution.
Introduction to Algorithms
Transforms: Basis to Basis Normal Basis Hadamard Basis Basis functions Method to find coefficients (“Transform”) Inverse Transform.
Low power and cost effective VLSI design for an MP3 audio decoder using an optimized synthesis- subband approach T.-H. Tsai and Y.-C. Yang Department of.
HELSINKI UNIVERSITY OF TECHNOLOGY *Laboratory for Theoretical Computer Science Helsinki University of Technology **Department of Computing Science University.
HPEC_GPU_DECODE-1 ADC 8/6/2015 MIT Lincoln Laboratory GPU Accelerated Decoding of High Performance Error Correcting Codes Andrew D. Copeland, Nicholas.
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 Fast Extraction of Feature Salience Maps for Rapid Video Data Analysis Nikos P.
Topic 7 - Fourier Transforms DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
Numerical Analysis – Digital Signal Processing Hanyang University Jong-Il Park.
By Yevgeny Yusepovsky & Diana Tsamalashvili the supervisor: Arie Nakhmani 08/07/2010 1Control and Robotics Labaratory.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
7 th Annual Workshop on Charm++ and its Applications ParTopS: Compact Topological Framework for Parallel Fragmentation Simulations Rodrigo Espinha 1 Waldemar.
Numerical ElectroMagnetics & Semiconductor Industrial Applications Ke-Ying Su Ph.D. National Central University Department of Mathematics 12 2D-NUFFT &
Solving the Poisson Integral for the gravitational potential using the convolution theorem Eduard Vorobyov Institute for Computational Astrophysics.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
Utilizing Multi-threading, Parallel Processing, and Memory Management Techniques to Improve Transportation Model Performance Jim Lam Andres Rabinowicz.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Radar Pulse Compression Using the NVIDIA CUDA SDK
Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental.
HPEC SMHS 9/24/2008 MIT Lincoln Laboratory Large Multicore FFTs: Approaches to Optimization Sharon Sacco and James Geraci 24 September 2008 This.
DUKEMIT-LLHPEC-2007 FFTs of Arbitrary Dimensions on GPUs Xiaobai Sun and Nikos Pitsianis Duke University September 19, 2007 At High Performance Embedded.
1 Complex Images k’k’ k”k” k0k0 -k0-k0 branch cut   k 0 pole C1C1 C0C0 from the Sommerfeld identity, the complex exponentials must be a function.
2009/4/21 Third French-Japanese PAAP Workshop 1 A Volumetric 3-D FFT on Clusters of Multi-Core Processors Daisuke Takahashi University of Tsukuba, Japan.
2007/11/2 First French-Japanese PAAP Workshop 1 The FFTE Library and the HPC Challenge (HPCC) Benchmark Suite Daisuke Takahashi Center for Computational.
Scalable Multi-core Sonar Beamforming with Computational Process Networks Motivation Sonar beamforming requires significant computation and input/output.
CAS 721 Course Project Implementing Branch and Bound, and Tabu search for combinatorial computing problem By Ho Fai Ko ( )
Introduction to Research 2011 Introduction to Research 2011 Ashok Srinivasan Florida State University Images from ORNL, IBM, NVIDIA.
Reconstruction of Solid Models from Oriented Point Sets Misha Kazhdan Johns Hopkins University.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA
Computer Science Department, Duke UniversityPhD Defense TalkMay 4, 2005 FAST PATTERN MATCHING IN 3D IMAGES ON GPUS Patrick Eibl, Dennis Healy, Nikos P.
CS654: Digital Image Analysis
Cone-beam image reconstruction by moving frames Xiaochun Yang, Biovisum, Inc. Berthold K.P. Horn, MIT CSAIL.
Professor A G Constantinides 1 Discrete Fourier Transforms Consider finite duration signal Its z-tranform is Evaluate at points on z-plane as We can evaluate.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
Locate Potential Support Vectors for Faster
FFTC: Fastest Fourier Transform on the IBM Cell Broadband Engine David A. Bader, Virat Agarwal.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Visualization in Scientific Computing (or Scientific Visualization) Multiresolution,...
EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.
GPU Acceleration of Particle-In-Cell Methods B. M. Cowan, J. R. Cary, S. W. Sides Tech-X Corporation.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Parallel Computers Today LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating point.
George Viamontes, Mohammed Amduka, Jon Russo, Matthew Craven, Thanhvu Nguyen Lockheed Martin Advanced Technology Laboratories 3 Executive Campus, 6th Floor.
Albert I. Reuther & Joel Goodman HPEC Sept 2003
CS 591 S1 – Computational Audio
Yuanrui Zhang, Mahmut Kandemir
Real-Time Ray Tracing Stefan Popov.
Generalized Hough Transform
Clusters of Computational Accelerators
Parallel Computers Today
Coding Approaches for End-to-End 3D TV Systems
CHAPTER 3-2. Planar Cartesian Kinematics
Presentation transcript:

DUKE CSNUFFT on Multicores HPEC-2008 Sept. 25, 2008NUFFTs on Multicores 1 Parallelization of NUFFT with Radial Data on Multicore Processors Nikos Pitsianis Xiaobai Sun Computer Science Duke University HPEC 2008 Lincoln Laboratory, MIT

DUKE CSNUFFT on Multicores HPEC-2008 Sept. 25, 2008NUFFTs on Multicores 2 NUDFTs & NUFFTs Data samples : on-equally spaced on Cartesian Grid, or with non-uniform density explicitly specified NUDFT not exceptional ; FFTs often follow sample translation, interpolation re-gridding not restricting on sampling distribution or conflicting with acquisition conditions not combinatorial as DFT NUFFT O( N log(N) log(1/ε)) in ε-approximation and in terms of arithmetic complexity Replace heuristic sample translations with unified theory and methods (1995) Great potential to enable many high-precision image/data processing applications However, many obstacles in reaching the potential

DUKE CSNUFFT on Multicores HPEC-2008 Sept. 25, 2008NUFFTs on Multicores 3 Sample Translation Sample Translation by the Convolution Theorem While an equally-spaced convolution may be accelerated by FFTs, well-chosen locally- supported convolution transforms can accelerate NUDFTs. Two basic approaches for choosing a translation-scaling pair : analytical, numerical Sample translation is specific to sample distribution and acquisition ordering Sample translation is subject to the constraints in FFT implementation

DUKE CSNUFFT on Multicores HPEC-2008 Sept. 25, 2008NUFFTs on Multicores 4 NUFFT Acceleration History with Radial Data streaming of data reads 1 hr 50 mins 4 dynamic coordinates decoding 2 hrs 50 mins 1 hr 10 mins 3 geometric binning, multithreading & scaling fusion 15 mins 5 scratch memory reuse 3 hrs 45 mins 2 sub-expression extraction 12 hrs 4 hrs 1 original version 27 hrs 0 Program & Algorithm Transformations 8-core 3.0 GHz 8GB RAM 4-core PPC 2.5 GHz 12GB RAM NUFFT Version