This deck has 1-, 2-, and 3- slide variants for C++ AMP If your own deck uses 4:3, get with the 21 st century and switch to 16:9 ( Design tab, Page Setup.

Slides:

Advertisements

Similar presentations

Advertisements

Hazim Shafi Principal Architect Microsoft Corporation TL19.

Cosc 2150 Arrays in assembly code. Variables and addresses Uncompiled ld [a], %r1 addcc %r1, 2, %r3 ARC has three addressing modes —immediate, direct,

C++  PPL  AMP When NO branches between a micro-op and retiring to the visible architectural state – its no longer speculative.

for (i = 0; i < 1024; i++) C[i] = A[i]*B[i]; for (i = 0; i < 1024; i+=4) C[i:i+3] = A[i:i+3]*B[i:i+3]; #pragma loop(hint_parallel ( N ) ) for.

Visual Studio 2013 Conformance Performance Productivity Services Mobile Devices What’s Next.

CS 791v Fall # a simple makefile for building the sample program. # I use multiple versions of gcc, but cuda only supports # gcc 4.4 or lower. The.

OpenCL Peter Holvenstot. OpenCL Designed as an API and language specification Standards maintained by the Khronos group  Currently 1.0, 1.1, and 1.2.

Slide 1Fig. 11.1, p.337. Slide 2Fig. 11.2, p.338.

Slide 1Fig. 19.1, p Slide 2Fig. 19.2, p. 583.

C++ AMP: Accelerated Massive Parallelism in Visual C++ 11 Kate Gregory Gregory Consulting

Slide 1Fig. 21.1, p.641. Slide 2Fig. 21.2, p.642.

Slide 1Fig. 10.1, p.293. Slide 2Fig. 10.1a, p.293.

Slide 1Fig. 5.1, p.113. Slide 2Fig. 5.1a, p.113 Slide 3Fig. 5.1b, p.113.

Presented by David Cravey 10/15/2011. About Me – David Cravey Started programming in 4 th grade Learned BASIC on a V-Tech “Precomputer 1000” and then.

Threads Load new page Page is loading Browser still responds to user (can read pages in other tabs)

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

General Computer Science for Engineers CISC 106 Lecture 34 Dr. John Cavazos Computer and Information Sciences 05/13/2009.

Contemporary Languages in Parallel Computing Raymond Hummel.

C++ + r1 r2 r3 add r3, r1, r2 SCALAR (1 operation) v1 v2 v3 + vector length vadd v3, v1, v2 VECTOR (N operations)

Visual Studio 11 for Game Developers Boris Jabes Senior Program Manager Microsoft Corporation.

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

images source: AMD image source: NVIDIA performance portability productivity.

C++ Accelerated Massive Parallelism in Visual C Kate Gregory Gregory Consulting DEV334.

An Introduction to Programming with CUDA Paul Richmond

demo 146X Interactive visualization of volumetric white matter connectivity 36X Ionic placement for molecular dynamics simulation on GPU 19X Transcoding.

Lecture Set 1 Part B: Understanding Visual Studio and.NET – Structure and Terminology 1/16/ :04 PM.

Steve Teixeira Director of Program Management, Visual C++ Microsoft Corporation Visual C++ and the Native Renaissance.

Tim Madden ODG/XSD.  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More.

Computing with C# and the.NET Framework Chapter 1 An Introduction to Computing with C# ©2003, 2011 Art Gittleman.

Lecture Set 2 Part B – Configuring Visual Studio; Configuration Options and The Help System (scan quickly for future reference)

CSCI 3328 Object Oriented Programming in C# Chapter 1: Introduction to C# Xiang Lian The University of Texas Rio Grande Valley Edinburg, TX 78539

Tutorial 11 Five windows included in the Visual Basic Startup Screen Main Form Toolbox Project Explorer (Project) Properties.

Slide 1Fig 28-CO, p.858. Slide 2Fig 28-1, p.859 Slide 3Fig Q28-19, p.884.

GPU Architecture and Programming

1 Programming Environment and Tools VS.Net 2012 First project MSDN Library.

Tim Madden ODG/XSD.  Graphics Processing Unit  Graphics card on your PC.  “Hardware accelerated graphics”  Video game industry is main driver.  More.

Session 13 Pinball Game Construction Kit (Version 3):

Inside LINQ to Objects How LINQ to Objects work Inside LINQ1.

OpenCL Joseph Kider University of Pennsylvania CIS Fall 2011.

Application Lifecycle Management Tools for C++ in Visual Studio 2012 Rong Lu Program Manager Visual C++ Microsoft Corporation DEV316.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

MAXIMISE.NET WITH C++ FOR INTEROP, PERFORMANCE AND PRODUCTIVITY Angel Hernandez Avanade Australia (c) 2011 Microsoft. All rights reserved. SESSION CODE:

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE 8823A GPU Architectures Module 2: Introduction.

“WALK IN” SLIDE. August Developing Games for Windows and Xbox 360: Stories from the Trenches Joe Waters Software Development Engineer FASA.

University of Michigan Electrical Engineering and Computer Science Paragon: Collaborative Speculative Loop Execution on GPU and CPU Mehrzad Samadi 1 Amir.

Special Topics in Computer Engineering OpenMP* Essentials * Open Multi-Processing.

Visual Basic.Net. Software to Install Visual Studio 2005 Professional Edition (Requires Windows XP Pro) MSDN Library for Visual Studio 2005 Available.

1 8/30/06CS150 Introduction to Computer Science 1 Your First C++ Program.

TOPICS WHAT YOU’LL LEAVE WITH WHO WILL BENEFIT FROM THIS TALK.NET developers: familiar with parallel programming support in Visual Studio 2010 and.NET.

Graphic Processing Units Presentation by John Manning.

Windows Programming Lecture 03. Pointers and Arrays.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Computer Engg, IIT(BHU)

INTRODUCTION TO ROBOTICS Part 5: Programming

C# and the .NET Framework

Patrick Cozzi University of Pennsylvania CIS Spring 2011

Quick Start Guide for Visual Studio 2010

Taming GPU compute with C++ Accelerated Massive Parallelism

CSCI 3328 Object Oriented Programming in C# Chapter 1: Introduction to C# UTPA – Fall 2012 This set of slides is revised from lecture slides of Prof.

Social Media And Global Computing Introduction to Visual Studio

Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos

Topics: Programming Constructs: loops & conditionals Digital Input

Introduction to CUDA.

Sharing Code across Platforms with Visual Studio 2015

Virtual Memory: Beyond the Physical Memory

Computer Terms Review from what language did C++ originate?

IS 135 Business Programming

Chapter 3: Process Management

Presentation transcript:

This deck has 1-, 2-, and 3- slide variants for C++ AMP If your own deck uses 4:3, get with the 21 st century and switch to 16:9 ( Design tab, Page Setup button )

C++ AMP in 1 slide (for notes see comments section of slides from the 2- and 3- slide variant)

C++ Accelerated Massive Parallelism What – Heterogeneous platform support – Part of C++ & Visual Studio – STL-like library for parallel patterns on large arrays – Builds on DirectX Why – Performance – Productivity – Portability How #include using namespace concurrency; void AddArrays(int n, int * pA, int * pB, int * pC) { array_view a(n, pA); array_view b(n, pB); array_view sum(n, pC); parallel_for_each( sum.extent, [=](index idx) restrict(amp) { sum[idx] = a[idx] + b[idx]; } ); }

C++ AMP in 2 or 3 slides (for 2 slides, just drop the 3 rd one) (see comments section of each slide for notes)

C++ AMP Heterogeneous platform support Part of Visual C++ Visual Studio integration STL-like library for multidimensional data Builds on DirectX Is open spec performance portability productivity

Basic Elements of C++ AMP coding void AddArrays(int n, int * pA, int * pB, int * pC) { array_view a(n, pA); array_view b(n, pB); array_view sum(n, pC); parallel_for_each( sum.extent, [=](index idx) restrict(amp) { sum[idx] = a[idx] + b[idx]; } ); } array_view variables captured and associated data copied to accelerator (on demand) restrict(amp): tells the compiler to check that this code can execute on Direct3D hardware (aka accelerator) parallel_for_each: execute the lambda on the accelerator once per thread extent: the number and shape of threads to execute the lambda index: the thread ID that is running the lambda, used to index into data array_view: wraps the data to operate on the accelerator

C++ AMP at a Glance restrict(amp, cpu) parallel_for_each class array class array_view class index class extent class accelerator class accelerator_view class tiled_extent class tiled_index class tile_barrier tile_static storage class

C++ AMP resources Native parallelism blog (team blog) – MSDN Forums to ask questions – Daniel Moth's blog (PM of C++ AMP) –