COMMUNICATION COMMUNICATE COMMUNITY Henri Bal A PUBLIC-PRIVATE RESEARCH COMMUNITY.

Slides:

Advertisements

Similar presentations

Instructor Notes This lecture describes the different ways to work with multiple devices in OpenCL (i.e., within a single context and using multiple contexts),

Advertisements

Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.

Vrije Universiteit Interdroid: a platform for distributed smartphone applications Henri Bal, Nick Palmer, Roelof Kemp, Thilo Kielmann High Performance.

CCGrid2013 Panel on Clouds Henri Bal Vrije Universiteit Amsterdam.

Big Data: Big Challenges for Computer Science Henri Bal Vrije Universiteit Amsterdam.

1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.

The National Digital Stewardship Alliance: Community, Content, Commitment.

Early Linpack Performance Benchmarking on IPE Mole-8.5 Fermi GPU Cluster Xianyi Zhang 1),2) and Yunquan Zhang 1),3) 1) Laboratory of Parallel Software.

CaSToRC LinkSCEEM-2: A Computational Resource for the Development of Computational Sciences in the Eastern Mediterranean Jens Wiegand Scientific Coordinator.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

 Open standard for parallel programming across heterogenous devices  Devices can consist of CPUs, GPUs, embedded processors etc – uses all the processing.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

GPUs on Clouds Andrew J. Younge Indiana University (USC / Information Sciences Institute) UNCLASSIFIED: 08/03/2012.

GPU Programming: eScience or Engineering? Henri Bal COMMIT/ msterdam Vrije Universiteit.

Parallel Programming Henri Bal Vrije Universiteit Faculty of Sciences Amsterdam.

1 Ideas About the Future of HPC in Europe “The views expressed in this presentation are those of the author and do not necessarily reflect the views of.

Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

GPGPU platforms GP - General Purpose computation using GPU

Going Dutch: How to Share a Dedicated Distributed Infrastructure for Computer Science Research Henri Bal Vrije Universiteit Amsterdam.

E-Infrastructures in WP European Commission – DG CNECT eInfrastructure Presentation for national contact points.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Data Partitioning on Heterogeneous Multicore and Multi-GPU Systems Using Functional Performance Models of Data-Parallel Applications Published in: Cluster.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

Martin Kruliš by Martin Kruliš (v1.0)1.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

Roger Jones, Lancaster University1 Experiment Requirements from Evolving Architectures RWL Jones, Lancaster University Ambleside 26 August 2010.

Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.

Advanced / Other Programming Models Sathish Vadhiyar.

Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.

OpenCL Sathish Vadhiyar Sources: OpenCL overview from AMD OpenCL learning kit from AMD.

NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.

Henri Bal Vrije Universiteit Amsterdam High Performance Distributed Computing.

SuperLU_DIST on GPU Cluster Sherry Li FASTMath Meeting, Oct. 1-2, /2014 “A distributed CPU-GPU sparse direct solver”, P. Sao, R. Vuduc and X.S. Li, Euro-Par.

GPU Architecture and Programming

GPU programming: eScience or engineering? Henri Bal Vrije Universiteit Amsterdam COMMIT/

GPU Programming with CUDA – CUDA 5 and 6 Paul Richmond

CS6235 L17: Generalizing CUDA: Concurrent Dynamic Execution, and Unified Address Space.

OpenCL Sathish Vadhiyar Sources: OpenCL quick overview from AMD OpenCL learning kit from AMD.

1 European e-Infrastructure experiences gained and way ahead OGF 20 / EGEE User’s Forum 9 th May 2007 Mário Campolargo European Commission - DG INFSO Head.

CSS 700: MASS CUDA Parallel‐Computing Library for Multi‐Agent Spatial Simulation Fall Quarter 2014 Nathaniel Hart UW Bothell Computing & Software Systems.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

Google Earth INTEGRATING GLOBAL THINKING. Why Use Virtual Tours? Flexible Tool: History, Science, Math, English, etc. An Interactive Way to Explore Supports.

Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.

Synchronization These notes introduce:

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs Allen D. Malony, Scott Biersdorff, Sameer Shende, Heike Jagode†, Stanimire.

Martin Kruliš by Martin Kruliš (v1.0)1.

Breakout sessions 13:15-14:45Five Breakout sessions 1.Atmosphere – Walsh/Elliot 2.Sea Ice/Ocean – Proshutinsky/Flato/Gerdes 3.Terrestrial/Permafrost –

Heterogeneous Computing With GPGPUs Matthew Piehl Overview Introduction to CUDA Project Overview Issues faced nvcc Implementation Performance Metrics Conclusions.

An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems Isaac Gelado, Javier Cabezas. John Stone, Sanjay Patel, Nacho Navarro.

Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.

1 Kostas Glinos European Commission - DG INFSO Head of Unit, Géant and e-Infrastructures "The views expressed in this presentation are those of the author.

Nguyen Thi Thanh Nha HMCL by Roelof Kemp, Nicholas Palmer, Thilo Kielmann, and Henri Bal MOBICASE 2010, LNICST 2012 Cuckoo: A Computation Offloading Framework.

European Perspective on Distributed Computing Luis C. Busquets Pérez European Commission - DG CONNECT eInfrastructures 17 September 2013.

EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu

Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.

NFV Compute Acceleration APIs and Evaluation

Enabling machine learning in embedded systems

Heterogeneous Programming

Implementation of Efficient Check-pointing and Restart on CPU - GPU

Linchuan Chen, Xin Huo and Gagan Agrawal

© 2012 Elsevier, Inc. All rights reserved.

Vrije Universiteit Amsterdam

6- General Purpose GPU Programming

Presentation transcript:

COMMUNICATION COMMUNICATE COMMUNITY Henri Bal A PUBLIC-PRIVATE RESEARCH COMMUNITY

Agenda General introduction to COMMIT/ – (slides by Arnold Smeulders) Example collaboration NLeSC & COMMIT/: global climate modelling – NLeSC eSalsa project – COMMIT/ IV-e project

COMMIT/ The public-private research community for ICT- research of 60+ (non)profit & 20 science partners. Our mission is to deliver world-class ICT-research to create opportunities with more & more partners.

DIMENSIONS ICT-science use-inspired science Synergy between projects Dissemination tell the story outside science Valorization towards use in industry Internationalization NL is not alone in this field

SUCCESSES ICT-science Synergy SWEET Euro-Par 2014 Achievement award

SUCCESSES Dissemination Valorization Internationalization InfoSys and Figshare become partners, EU projects, EIT-ICT, …….

THE BIG FUTURE OF DATA We have presented 60 demonstrators:  ICT Science  Application  Alternative application for four audiences:

The Big Future of Data Big Data are not always Big. Often they are:  private  interactive  uncertain  unordered  new media In short they are overwhelming data.

The Big Future of Data Data-Science will surpass ICT-Science, that is our future.

IV-e: e-Infrastructure Virtualization for e-Science Applications (P20) e-Infrastructures become very complicated –Distributed & heterogeneous P20 tries to simplify the e-Infrastructure, ease programming and integration WP5 studies how to effectively use GPUs for data processing (imaging) and computing vrije Universiteit

Global Climate Modeling COMMIT/

GPU Computing Offload expensive kernels for Parallel Ocean Program (POP) from CPU to GPU –Many kernels, fairly easy to port to GPUs –Execution time becomes virtually 0 New bottleneck: I/O between CPU & GPU CPU host memory GPU device memory Host Device PCI Express link

Different methods for CPU-GPU communication Memory copies (explicit) – No overlap with GPU computation Device-mapped host memory (implicit) – Allows fine-grained overlap between computation and communication in either direction CUDA Streams or OpenCL command-queues – Allows overlap between computation and communication in different streams Any combination of the above

Problem ICT problem: – Which method will be most efficient for a given GPU kernel? Implementing all can be a large effort Solution: – Create a performance model that identifies the best implementation: Which strategy for overlapping computation and communication is best for the given program? Ben van Werkhoven, Jason Maassen, Frank Seinstra & Henri Bal: Performance models for CPU-GPU data transfers, CCGrid2014 (nominated for best-paper-award, top 1%)

MOVIE

Example result MeasuredModel

Discussion Good example of use-inspired Computer Science research Many other application domains that use accelerators face the same problem – Digital forensics, water management, astronomy … All details: 27 October October 2014

Different GPUs (state kernel)

Different GPUs (buoydiff)

Global Climate Modeling Understand future local sea level changes Needs high-resolution simulations Combine two approaches: – Distributed computing (multiple resources) Ibis couples models for land, ice, ocean, atmosphere – GPUs(Graphics processing units) COMMIT/

Comes with spreadsheet