GPU Accelerated Vessel Segmentation Using Laplacian Eigenmaps Lin Cheng, Hyunsu Cho and Peter A. Yoon Trinity College.

Slides:



Advertisements
Similar presentations
SHREYAS PARNERKAR. Motivation Texture analysis is important in many applications of computer image analysis for classification or segmentation of images.
Advertisements

Copyright 2011, Data Mining Research Laboratory Fast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining Xintian Yang, Srinivasan.
Shredder GPU-Accelerated Incremental Storage and Computation
Partitional Algorithms to Detect Complex Clusters
Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
A Parallel GPU Version of the Traveling Salesman Problem Molly A. O’Neil, Dan Tamir, and Martin Burtscher* Department of Computer Science.
A Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems Hyunsu Cho and Peter A. Yoon Trinity College,
Institute of Medical Engineering 1 20th Annual International Conference on Magnetic Resonance Angiography Graz, Real Time Elimination of.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Why GPU Computing. GPU CPU Add GPUs: Accelerate Science Applications © NVIDIA 2013.
Eigenvalue and eigenvectors  A x = λ x  Quantum mechanics (Schrödinger equation)  Quantum chemistry  Principal component analysis (in data mining)
FSOSS Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada GPU Research Capabilities.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
OpenFOAM on a GPU-based Heterogeneous Cluster
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Three Algorithms for Nonlinear Dimensionality Reduction Haixuan Yang Group Meeting Jan. 011, 2005.
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
Active Set Support Vector Regression
All-Pairs-Shortest-Paths for Large Graphs on the GPU Gary J Katz 1,2, Joe Kider 1 1 University of Pennsylvania 2 Lockheed Martin IS&GS.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
Panda: MapReduce Framework on GPU’s and CPU’s
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
A performance analysis of multicore computer architectures Michel Schelske.
Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.
Structure Preserving Embedding Blake Shaw, Tony Jebara ICML 2009 (Best Student Paper nominee) Presented by Feng Chen.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
A GPU Implementation of Inclusion-based Points-to Analysis Mario Méndez-Lojo (AMD) Martin Burtscher (Texas State University, USA) Keshav Pingali (U.T.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
+ CUDA Antonyus Pyetro do Amaral Ferreira. + The problem The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
On a Few Ray Tracing like Algorithms and Structures. -Ravi Prakash Kammaje -Swansea University.
Use of GPUs in ALICE (and elsewhere) Thorsten Kollegger TDOC-PG | CERN |
Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.
GPU Architecture and Programming
A Parallel Implementation of MSER detection GPGPU Final Project Lin Cao.
JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.
CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.
MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
An Out-of-core Implementation of Block Cholesky Decomposition on A Multi-GPU System Lin Cheng, Hyunsu Cho, Peter Yoon, Jiajia Zhao Trinity College, Hartford,
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
An Accelerated Procedure for Hypergraph Coarsening on the GPU Lin Cheng, Hyunsu Cho, and Peter Yoon Trinity College Hartford, CT, USA.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Why Parallel/Distributed Computing Sushil K. Prasad
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
High Performance Computing Seminar
Linchuan Chen, Xin Huo and Gagan Agrawal
Grouping.
Advanced Computing Facility Introduction
About Hardware Optimization in Midas SW
Presentation transcript:

GPU Accelerated Vessel Segmentation Using Laplacian Eigenmaps Lin Cheng, Hyunsu Cho and Peter A. Yoon Trinity College

Problem Image segmentation Partition pictures of vessels into segments

Local info embedded in high dimensional space Project local info onto low-dimensional plane Optimize the projection to preserve essential characteristics Cluster the projected data points into segments [1] Tziakos, Laskaris, and Fotopoulos

Segmentation process Build graph of local info Apply Laplace operator Solve optimization problem

Build graph of local info Store the resulting graph in a weight matrix Edges reflect variations among different regions (global variation)

Apply Laplace operator Form Laplacian matrix L = I – D 1/2 WD 1/2 encoding the Laplace operator. The operator formulates an optimization problem: Projections of well-connected nodes should also be tightly clustered.

Solve optimization problem Solutions to eigenvalue problem Ly = λy are optimal solutions If solutions are good, we can detect clusters

Characteristics of GPUs Massively parallel – lots of small cores (workers) Good for high-throughput, compute-bound tasks Separate memory space from main memory

Strategy: Reduce memory footprint On-GPU memory is limited Reduce memory usage and we can pack in more work into GPU

Strategy: Reduce memory footprint Weight matrix generation: Do not store intermediate results More entries can be calculated in parallel; 10x faster

Worker allocation

Strategy: use Lanczos method We need only a few smallest eigenvalues of L Lanczos method iteratively solve for the eigenvalues needed Takes 1/28 time of conventional method

Performance

Performance: vs. multicore CPUs CPU: two Intel® Xeon® E GPU: one Nvidia Tesla® K20c

Acknowledgement Trinity College, Student Research Program Nvidia Corporation, CUDA Teaching Center Program