Trip report: GPU UERJ Felice Pantaleo SFT Group Meeting 03/11/2014 Felice Pantaleo SFT Group Meeting 03/11/2014.

Slides:



Advertisements
Similar presentations
Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),
Advertisements

Intermediate GPGPU Programming in CUDA
Accelerators for HPC: Programming Models Accelerators for HPC: StreamIt on GPU High Performance Applications on Heterogeneous Windows Clusters
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Optimization on Kepler Zehuan Wang
GPU Virtualization Support in Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer Science and Information.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
OpenFOAM on a GPU-based Heterogeneous Cluster
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
L15: Review for Midterm. Administrative Project proposals due today at 5PM (hard deadline) – handin cs6963 prop March 31, MIDTERM in class L15: Review.
L13: Review for Midterm. Administrative Project proposals due Friday at 5PM (hard deadline) No makeup class Friday! March 23, Guest Lecture Austin Robison,
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.
An Introduction to Programming with CUDA Paul Richmond
Slide 1/8 Performance Debugging for Highly Parallel Accelerator Architectures Saurabh Bagchi ECE & CS, Purdue University Joint work with: Tsungtai Yeh,
Efficient Lists Intersection by CPU-GPU Cooperative Computing Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, Jing Liu Nankai-Baidu Joint Lab,
Early Adopter Introduction to Parallel Computing: Research Intensive University: 4 th Year Elective Bo Hong Electrical and Computer Engineering Georgia.
Skew Handling in Aggregate Streaming Queries on GPUs Georgios Koutsoumpakis 1, Iakovos Koutsoumpakis 1 and Anastasios Gounaris 2 1 Uppsala University,
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
gpucomputing.net is a research and development community site dedicated to fostering collaborative and interdisciplinary work on the various disciplines.
Scalable Data Clustering with GPUs Andrew D. Pangborn Thesis Defense Rochester Institute of Technology Computer Engineering Department Friday, May 14 th.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Martin Kruliš by Martin Kruliš (v1.0)1.
Extracted directly from:
CUDA 5.0 By Peter Holvenstot CS6260. CUDA 5.0 Latest iteration of CUDA toolkit Requires Compute Capability 3.0 Compatible Kepler cards being installed.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
GPU Computing April GPU Outpacing CPU in Raw Processing GPU NVIDIA GTX cores 1.04 TFLOPS CPU GPU CUDA Architecture Introduced DP HW Introduced.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
CUDA Optimizations Sathish Vadhiyar Parallel Programming.
Introducing collaboration members – Korea University (KU) ALICE TPC online tracking algorithm on a GPU Computing Platforms – GPU Computing Platforms Joohyung.
GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25, 2011 Synchronization.ppt Synchronization These notes will introduce: Ways to achieve.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
QCAdesigner – CUDA HPPS project
ARCHES: GPU Ray Tracing I.Motivation – Emergence of Heterogeneous Systems II.Overview and Approach III.Uintah Hybrid CPU/GPU Scheduler IV.Current Uintah.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.
Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.
Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.
Synchronization These notes introduce:
Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.
Would'a, CUDA, Should'a. CUDA: Compute Unified Device Architecture OU Supercomputing Symposium Highly-Threaded HPC.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
My Coordinates Office EM G.27 contact time:
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.
1 ITCS 4/5145 Parallel Programming, B. Wilkinson, Nov 12, CUDASynchronization.ppt Synchronization These notes introduce: Ways to achieve thread synchronization.
Lecture 3 CUDA Programming 1
Sathish Vadhiyar Parallel Programming
Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang
CS 179: GPU Programming Lecture 1: Introduction 1
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Linchuan Chen, Xin Huo and Gagan Agrawal
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
Antonio R. Miele Marco D. Santambrogio Politecnico di Milano
Chapter 01: Introduction
6- General Purpose GPU Programming
Presentation transcript:

Trip report: GPU UERJ Felice Pantaleo SFT Group Meeting 03/11/2014 Felice Pantaleo SFT Group Meeting 03/11/2014

Eplanet visit Host institute: Universidade do Estado do Rio de Janeiro (UERJ) Duration: Two-weeks long visit from Oct 6th to 17th Topic: GPU Programming 2

First week Dedicated to installation of two nodes with two NVIDIA GTX 650 Dedicated to installation of two nodes with two NVIDIA GTX 650 CUDA 6.5 installed CUDA 6.5 installed IT specialists learning about configuration and capabilities IT specialists learning about configuration and capabilities Introductory questionnaire sent to students in order to get to know them and tailor the lectures Introductory questionnaire sent to students in order to get to know them and tailor the lectures 3

Questionnaire 4

Questionnaire 5

Questionnaire 6

Questionnaire 7

Parallel Programming intro Introduction on Parallel Programming: motivations, architectures and algorithms. Reasons Computing Systems are becoming more and more parallel and heterogeneous were explained. 8

Introduction to GPU Programming using CUDA Introduce through examples, based on the CUDA programming language, the three abstractions that make the foundations of GPU programming: - Thread hierarchy - Synchronization - Memory hierarchy/Shared Memory 9

Load Balancing and Partitioning The aim was to make the students understand the relationship between a domain problem and the computational models available. Techniques to reduce the Streaming Multiprocessors idle time by making use of dynamic scheduling and dynamic partitioning were shown. 10

Hands-on Duration: 10 hours Duration: 10 hours GPU Memory management: allocation, data transfer between host and device, synchronization Kernel launch: offload of a parallel section to the GPU Partitioning of a problem to the GPU threads Profiling of a CUDA application 11

Hands-on Making use of the GPU shared memory Making use of Asynchronous operations Reducing contention by privatization Scatter to gather Filling histograms on GPUs The interest was so high that many of the students kept working on the exercises from home! 12

Feedback The students were asked to give some feedback using an anonymous questionnaire (see backup) Very positive feedback 13

Conclusion All goals that were set before the visit were achieved Language was sometimes a problem – – Italian helped ;-) Interest in preparing a degree thesis on the topic of parallel computing for high energy physics experiment in the context of the host group Didactic material available: lanetPantaleoUERJ

Didactic Material I am starting the initiative of collecting GPU Training material with some people from the HPC community (Cambridge, CINECA, BSC) I am starting the initiative of collecting GPU Training material with some people from the HPC community (Cambridge, CINECA, BSC) – Could be done in the context of the Concurrency Forum – Common github resource for trainers – Expertise acquisition from HPC community (GPUs for linear algebra, OpenACC) 15

BackupBackup

Anonymous Feedback 17

Anonymous Feedback 18

Anonymous Feedback 19

Anonymous Feedback 20

Anonymous Feedback 21

Anonymous Feedback 22