2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.

Slides:



Advertisements
Similar presentations
Intermediate GPGPU Programming in CUDA
Advertisements

INF5063 – GPU & CUDA Håkon Kvale Stensland iAD-lab, Department for Informatics.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2011 GPUMemories.ppt GPU Memories These notes will introduce: The basic memory hierarchy.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Sparse LU Factorization for Parallel Circuit Simulation on GPU Ling Ren, Xiaoming Chen, Yu Wang, Chenxi Zhang, Huazhong Yang Department of Electronic Engineering,
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.
1 Threading Hardware in G80. 2 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
Basic CUDA Programming Shin-Kai Chen VLSI Signal Processing Laboratory Department of Electronics Engineering National Chiao.
“Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Computations” By Ravi, Ma, Chiu, & Agrawal Presented.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Parallelization and CUDA libraries Lei Zhou, Yafeng Yin, Hong Man.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Efficient Pseudo-Random Number Generation for Monte-Carlo Simulations Using GPU Siddhant Mohanty, Subho Shankar Banerjee, Dushyant Goyal, Ajit Mohanty.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.
Basic CUDA Programming Computer Architecture 2014 (Prof. Chih-Wei Liu) Final Project – CUDA Tutorial TA Cheng-Yen Yang
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
Basic CUDA Programming Computer Architecture 2015 (Prof. Chih-Wei Liu) Final Project – CUDA Tutorial TA Cheng-Yen Yang
High Performance Computing with GPUs: An Introduction Krešimir Ćosić, Thursday, August 12th, LSST All Hands Meeting 2010, Tucson, AZ GPU Tutorial:
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
CUDA All material not from online sources/textbook copyright © Travis Desell, 2012.
GPU Architecture and Programming
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
CUDA - 2.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
QCAdesigner – CUDA HPPS project
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
EECS 583 – Class 21 Research Topic 3: Compilation for GPUs University of Michigan December 12, 2011 – Last Class!!
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
CUDA. Assignment  Subject: DES using CUDA  Deliverables: des.c, des.cu, report  Due: 12/14,
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.
Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.
GPU Programming Contest. Contents Target: Clustering with Kmeans How to use toolkit1.0 Towards the fastest program.
GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.
My Coordinates Office EM G.27 contact time:
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2013 Branching.ppt Control Flow These notes will introduce scheduling control-flow.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
Introduction to CUDA Li Sung-Chi Taiwan Evolutionary Intelligence Laboratory 2016/12/14 Group Meeting Presentation.
CS427 Multicore Architecture and Parallel Computing
Basic CUDA Programming
Clusters of Computational Accelerators
Operation System Program 4
Advanced Computing Facility Introduction
Graphics Processing Unit
6- General Purpose GPU Programming
Multicore and GPU Programming
Presentation transcript:

2012/06/22

Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use toolkit1.0  Towards the fastest program

GPU (Graphic Processing Unit)  Multicore processor Several handreds cores SP: Core in GPU SM: Composed of SPs  High memory bandwidth GPU SM Global Memory SM SP 240 SM 30 (Each of them has 8 SP) Memory Bandwidth GB/s SP SP: Streaming Processor SM: Streaming MultiProcessor … Table: Specification of GeForce280

Flow of CUDA Program 1. Allocate GPU memory cudaMalloc() 2. Transfer input data cudaMemcpy() 3. Execute kernel 4. Transfer result data 5. Free GPU memory cudaFree() Host Device (GPU) SP CPU Main Memory Global Memory SP Kernel SP Kernel … input 1 input 2 …… input N Array …… input 1 input 2 input N Data Transfer output 1 output 2 output N output 1 output 2 output N Data Transfer

Target application : clustering with Kmeans  A famous method for clustering  A program with kmeans method for a host processor is given. Modify it so that it works on GPU as fast as possible.  GeForce Tesla (GTX280) in Amano Lab. can be used for this contest.

Kmeans method(1/5) Initial state : Nodes in a certain color is distributed randomly. (Here, 100nodes with 5 colors are shown) STEP1: Centre of gravity is computed for each colored node set. (X in the figure is each centre) Reference URL:

Kmeans method(2/5) STEP2 The color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeans method(3/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeans method(4/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeans method(5/5) STEP2: Again and again, the color of each node is changed into that of the nearest centre. Terminate Condition : The color of all nodes are the same as the color of the centre, thus, there is no need to change the color. →Terminate.

How to start  ssh for login. Your account has been available. If you have not received mail about account, please send mail to  Download kmeans.tar.gz and ungip.  There are useful sample codes in kmeans.  Mission 1: Make GPU version based on CPU version. Describe gpuKMeans in kmeans.cu cpuKMeans in main.cu is a CPU version for reference.  Mission 2: Optimize the CPU code so that it runs as fast as possible.

Toolkit1.0  kmeans.cu To describe K-means program for GPU Please modify this file  main.cu To read input data, describe CPU program Modification forbidden  check.c To visualize output data by OpenCV  gen.c To generate input data  Makefile  data/ Input data  result/ Output data

How to use Toolkit1.0  $ make Compile  $ make gpu Execute GPU Program  $ make cpu Execute CPU Program  $./gen SEED (SEED = 0,1,2,…) Generate input data

Sample Code  Vector addition program for GPU $ make : Compile $./main : Program run  Point Memory allocation on GPU ○ cudaMalloc(), cudaFree() Data transfer between CPU and GPU ○ cudaMemcpy() Format of GPU kernel function

Towards the fastest program  Minimum requirement Implementation K-means program on GPU Parallelizing STEP1 or STEP2 in K-means  How to optimize program Parallelizing both of STEP1 and STEP2 Shared memory, Constant memory Coalesced Memory Access etc  Web Site NVIDIA GPU Computing Document: documentation documentation Fixstars CUDA Infromation Site:

Announcement:  If you have not an account mail to Your name should be included in the mail.  Deadline : 7/22 (Fri) 24:00  Copy follows in ~/comparch Source code and simple report  Please check the web site. Additional information will be on it. Please check the web site. Additional information will be on it.  If you have any question about the contest, please send mail to: