Simulation of Microwave Induced Thermoacoustic Imaging Model using GPU Nilangshu Bidyanta Ramaprasad Kulkarni ECE 562 Term Project.

Slides:



Advertisements
Similar presentations
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Advertisements

+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Timothy Blattner and Shujia Zhou May 18, This project is sponsored by Lockheed Martin We would like to thank Joseph Swartz, Sara Hritz, Michael.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
OpenFOAM on a GPU-based Heterogeneous Cluster
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPGPU platforms GP - General Purpose computation using GPU
Synergy.cs.vt.edu Power and Performance Characterization of Computational Kernels on the GPU Yang Jiao, Heshan Lin, Pavan Balaji (ANL), Wu-chun Feng.
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
Tracking with CACTuS on Jetson Running a Bayesian multi object tracker on an embedded system School of Information Technology & Mathematical Sciences September.
GPU Architecture and Programming
Offloading to the GPU: An Objective Approach
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.
CUDA-based Volume Rendering in IGT Nobuhiko Hata Benjamin Grauer.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Adam Wagner Kevin Forbes. Motivation  Take advantage of GPU architecture for highly parallel data-intensive application  Enhance image segmentation.
Simulating the Nonlinear Schrodinger Equation using MATLAB with CUDA
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
QCAdesigner – CUDA HPPS project
ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
Jason Li Jeremy Fowers 1. Speedups and Energy Reductions From Mapping DSP Applications on an Embedded Reconfigurable System Michalis D. Galanis, Gregory.
Accelerating Spherical Harmonic Transforms on the NVIDIA® GPGPU
CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.
1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE The 42 nd ACM Technical.
1 ECE 1304 Introduction to Electrical and Computer Engineering Section 1.7 Linear Algebra with MATLAB.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.
Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Sunpyo Hong, Hyesoon Kim
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
Date of download: 6/1/2016 Copyright © 2016 SPIE. All rights reserved. Triangulated shapes of human head layer boundaries employed in simulations: (a)
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
“SMT Capable CPU-GPU Systems for Big Data”
Advanced Computing Facility Introduction
Using the VTune Analyzer on Multithreaded Applications
Two-Dimensional Phase Unwrapping On FPGAs And GPUs
Analysis of Sparse Convolutional Neural Networks
Stencil-based Discrete Gradient Transform Using
GPU Computing Jan Just Keijser Nikhef Jamboree, Utrecht
Enabling machine learning in embedded systems
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
Ray-Cast Rendering in VTK-m
MASS CUDA Performance Analysis and Improvement
NVIDIA Fermi Architecture
Advanced Computing Facility Introduction
All-Pairs Shortest Paths
Introduction to CUDA.
Graphics Processing Unit
6- General Purpose GPU Programming
Multicore and GPU Programming
Presentation transcript:

Simulation of Microwave Induced Thermoacoustic Imaging Model using GPU Nilangshu Bidyanta Ramaprasad Kulkarni ECE 562 Term Project

Project Overview Introduction Figure: Schematic model (a) side view (the longer transverse dimension of the waveguide lies on the x-axis.) and (b) top view. Figure taken from reference [1].

Project Overview Motivation Hypothesis Project goals

Methodology Implementation of PSTD paper [Ref 1] Existing Matlab code, wrote C++ code using CUDA

Methodology Code analysis: A simplified version of the PSTD implementation can be written as follows: for(i=0; i<2000; i++) { Bx[i] = a*(Bx[i-1] + Bx[i-2] + Bx[i-3]) – b*myFunc(Cx[i-1], Cx[i-2], Cx[i-3]); Bx[i-3] = Bx[i-2];Bx[i-2] = Bx[i-1];Bx[i-1] = Bx[i];. Cx[i] = a*(Cx[i-1] + Cx[i-2] + Cx[i-3]) – b*myFunc(Bx[i-1], Bx[i-2], Bx[i-3]); Cx[i-3] = Cx[i-2];Cx[i-2] = Cx[i-1];Cx[i-1] = Cx[i];. } In each iteration, the set of four equations for Bx is repeated for By and Bz and similarly for Cy and Cz. Also, the function ‘myFunc’ has FFT and IFFT computations.

Results and Discussions Observed speedup w.r.t Matlab code – 11.84x speedup for GPU vs Matlab (2 core) – 2.68x speedup for GPU vs Matlab (6 core) Due to significant amount of data transfers, the GPU speedup vs Mutlicore machine is not significant. Matlab was run on a 6-core Intel i7 machine and 2-core Intel Core 2 Duo machine runningWindows 7, 64-bit OS. GPU code was run on a 4-core Intel Xeon machine running Linux 64-bit OS.

Results and Discussions

Limitation for further speedup is – Loop-carried dependency – Data intensive rather than computation intensive – Memory requirement increases exponentially – For 320x320x320 matrix, out of 3000ms for each iteration, around 2000ms is taken for data transfer – Data transfer time is significant compared to computation time.

Lessons learnt Access to entire code, potential for further speedup Serial computation on CPU can be improved by parallelizing the code to utilize multiple cores.

References Wang, X.; Bauer, D. R.; Witte, R.; Xin, H., "Microwave-Induced Thermoacoustic Imaging Model for Potential Breast Cancer Detection," IEEE Transactions on Biomedical Engineering,vol.59, no.10, pp , Oct doi: /TBME NVIDIA Corp NVIDIA CUDA Compute Unified device architecture programming guide 1.1. Technical report, NVIDA, Santa Clara, CA

THANK YOU QUESTIONS