GPU Programming Contest. Contents Target: Clustering with Kmeans How to use toolkit1.0 Towards the fastest program.

Slides:



Advertisements
Similar presentations
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.
Advertisements

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Intermediate GPGPU Programming in CUDA
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
CS 179: GPU Computing Lecture 2: The Basics. Recap Can use GPU to solve highly parallelizable problems – Performance benefits vs. CPU Straightforward.
Characteristics of Realtime and Embedded Systems Chapter 1 6/10/20151.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
Basic CUDA Programming Shin-Kai Chen VLSI Signal Processing Laboratory Department of Electronics Engineering National Chiao.
“Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Computations” By Ravi, Ma, Chiu, & Agrawal Presented.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
CUDA (Compute Unified Device Architecture) Supercomputing for the Masses by Peter Zalutski.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
A Source-to-Source OpenACC compiler for CUDA Akihiro Tabuchi †1 Masahiro Nakao †2 Mitsuhisa Sato †1 †1. Graduate School of Systems and Information Engineering,
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Operating System Program 5 I/O System DMA Device Driver.
SAGE: Self-Tuning Approximation for Graphics Engines
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
Basic CUDA Programming Computer Architecture 2014 (Prof. Chih-Wei Liu) Final Project – CUDA Tutorial TA Cheng-Yen Yang
CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:
Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.
CUDA Programming continued ITCS 4145/5145 Nov 24, 2010 © Barry Wilkinson CUDA-3.
1 ITCS 4/5010 GPU Programming, UNC-Charlotte, B. Wilkinson, Jan 14, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU.
ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 3, 2011outline.1 ITCS 6010/8010 Topics in Computer Science: GPU Programming for High Performance.
Basic CUDA Programming Computer Architecture 2015 (Prof. Chih-Wei Liu) Final Project – CUDA Tutorial TA Cheng-Yen Yang
Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
1 Evaluation of parallel particle swarm optimization algorithms within the CUDA™ architecture Luca Mussi, Fabio Daolio, Stefano Cagnoni, Information Sciences,
Genetic Programming on General Purpose Graphics Processing Units (GPGPGPU) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering.
1 PA1 - Specification ● Goal ● To see how modern graphics engine and application works ● Objective ● Compile and run samples from a modern ray tracing.
GPU Architecture and Programming
CUDA - 2.
National Tsing Hua University ® copyright OIA National Tsing Hua University HSA HW2.
JPEG-GPU: A GPGPU IMPLEMENTATION OF JPEG CORE CODING SYSTEMS Ang Li University of Wisconsin-Madison.
IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.
Toolkits version 1.0 Special Cource on Computer Architectures
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
QCAdesigner – CUDA HPPS project
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
CS 732: Advance Machine Learning
Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
CUDA Simulation Benjy Kessler.  Given a brittle substance with a crack in it.  The goal is to study how the crack propagates in the substance as a function.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Jun Doi IBM Research – Tokyo Early Performance Evaluation of Lattice QCD on POWER+GPU Cluster 17 July 2015.
GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.
Blocked 2D Convolution Ravi Sankar P Nair
Tensorflow Tutorial Homin Yoon.
pycuda Jin Kwon Kim May 25, 2017 Hi my name is jin kwon kim.
Image Transformation 4/30/2009
Basic CUDA Programming
Lecture 2: Intro to the simd lifestyle and GPU internals
Operation System Program 4
Faster File matching using GPGPU’s Deephan Mohan Professor: Dr
Advanced Computing Facility Introduction
Introduction to CUDA.
CUDA Programming Model
6- General Purpose GPU Programming
Presentation transcript:

GPU Programming Contest

Contents Target: Clustering with Kmeans How to use toolkit1.0 Towards the fastest program

Target application : clustering with Kmeans A famous method for clustering A program with kmeans method for a host processor is given. Modify it so that it works on GPU as fast as possible. For final results, only the runs on the compute nodes of Fermi or Longhorn will be considered.

Kmeans method(1/5) Initial state : Nodes in a certain color is distributed randomly. (Here, 100nodes with 5 colors are shown) STEP1: Centre of gravity is computed for each colored node set. (X in the figure is each centre) Reference URL:

Kmeans method(2/5) STEP2 The color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeans method(3/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeans method(4/5) STEP2: Again, the color of each node is changed into that of the nearest centre. STEP1: Again, the centre of gravity is computer in node set with the same color.

Kmeans method(5/5) STEP2: Again and again, the color of each node is changed into that of the nearest centre. Terminate Condition : The color of all nodes are the same as the color of the centre, thus, there is no need to change the color. →Terminate.

How to start Download kmeans.tar.gz and ungip. There are useful sample codes in kmeans. Mission 1: Make GPU version based on CPU version. – Describe gpuKMeans in kmeans.cu cpuKMeans in main.cu is a CPU version for reference. Mission 2: Optimize the GPU code so that it runs as fast as possible.

Toolkit1.0 kmeans.cu – To describe K-means program for GPU – Please modify this file main.cu – To read input data, describe CPU program – Modification forbidden check.c – To visualize output data by OpenCV gen.c – To generate input data Makefile data/ – Input data result/ – Output data

How to use Toolkit1.0 $ make – Compile $ make gpu – Execute GPU Program $ make cpu – Execute CPU Program $./gen SEED (SEED = 0,1,2,…) – Generate input data

Sample Code Vector addition program for GPU – $ make : Compile – $./main : Program run Point – Memory allocation on GPU cudaMalloc(), cudaFree() – Data transfer between CPU and GPU cudaMemcpy() – Format of GPU kernel function

Towards the fastest program Minimum requirement – Implementation K-means program on GPU – Parallelizing STEP1 or STEP2 in K-means How to optimize program – Parallelizing both of STEP1 and STEP2 – Shared memory, Constant memory – Coalesced Memory Access etc Web Site – NVIDIA GPU Computing Document: documentation documentation – Fixstars CUDA Infromation Site:

Announcement: Deadline : 8 th August 10:00 PM If you have any question about the contest, please use Piazza.