GPUs: Not Just for Graphics Anymore

Slides:

Advertisements

Similar presentations

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.

Advertisements

APARAPI Java™ platform’s ‘Write Once Run Anywhere’ ® now includes the GPU Gary Frost AMD PMTS Java Runtime Team.

University of Michigan Electrical Engineering and Computer Science Transparent CPU-GPU Collaboration for Data-Parallel Kernels on Heterogeneous Systems.

GPU Processing for Distributed Live Video Database Jun Ye Data Systems Group.

GPU Computing with CUDA as a focus Christie Donovan.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

C++ + r1 r2 r3 add r3, r1, r2 SCALAR (1 operation) v1 v2 v3 + vector length vadd v3, v1, v2 VECTOR (N operations)

Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

Computer Graphics Graphics Hardware

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

Cg Programming Mapping Computational Concepts to GPUs.

Instructor Notes GPU debugging is still immature, but being improved daily. You should definitely check to see the latest options available before giving.

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

About Me Microsoft MVP Intel Blogger TechEd Israel, TechEd Europe Expert C++ Book

GPU Architecture and Programming

OpenCL Programming James Perry EPCC The University of Edinburgh.

CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

GPU Based Sound Simulation and Visualization Torbjorn Loken, Torbjorn Loken, Sergiu M. Dascalu, and Frederick C Harris, Jr. Department of Computer Science.

 Programming - the process of creating computer programs.

Martin Kruliš by Martin Kruliš (v1.0)1.

Implementation and Optimization of SIFT on a OpenCL GPU Final Project 5/5/2010 Guy-Richard Kayombya.

David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

An Epsilon Range Join in a graphics processing unit Project work of Timo Proescholdt.

Mapping Computational Concepts to GPUs Mark Harris NVIDIA.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

University of Michigan Electrical Engineering and Computer Science Paragon: Collaborative Speculative Loop Execution on GPU and CPU Mehrzad Samadi 1 Amir.

My Coordinates Office EM G.27 contact time:

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

Distributed and Parallel Processing George Wells.

Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and from the Patterson and Hennessy Text.

Computer Engg, IIT(BHU)

Computer Graphics Graphics Hardware

Generalized and Hybrid Fast-ICA Implementation using GPU

Graphics Processor Graphics Processing Unit

CUDA Introduction Martin Kruliš by Martin Kruliš (v1.1)

CSC391/691 Intro to OpenCV Dr. Rongzhong Li Fall 2016

CS 179: GPU Programming Lecture 1: Introduction 1

Java for Beginners Level 6 University Greenwich Computing At School

Our Graphics Environment

GPU VSIPL: High Performance VSIPL Implementation for GPUs

C# and the .NET Framework

Graphics Processing Unit

Leiming Yu, Fanny Nina-Paravecino, David Kaeli, Qianqian Fang

Processing Framework Sytse van Geldermalsen

CS 179: GPU Programming Lecture 1: Introduction 1

Brook GLES Pi: Democratising Accelerator Programming

Faster File matching using GPGPU’s Deephan Mohan Professor: Dr

General Programming on Graphical Processing Units

CS/EE 217 – GPU Architecture and Parallel Programming

Static Image Filtering on Commodity Graphics Processors

General Programming on Graphical Processing Units

Kiran Subramanyam Password Cracking 1.

Chapter 1 Introduction.

Elections Choose wisely, this is your chance to prove if election by popular vote works or not.

Using OpenMP offloading in Charm++

Computer Graphics Graphics Hardware

Ray Tracing on Programmable Graphics Hardware

Graphics Processing Unit

Multicore and GPU Programming

Presentation transcript:

GPUs: Not Just for Graphics Anymore David Ostrovsky | Couchbase

GPGPU refers to using a Graphics Processing Unit (GPU) to perform computation in applications traditionally handled by the CPU. Particularly effective for Stream Processing – performing the same operation on multiple records in a stream in parallel

CPU vs. GPU Architecture

Embarrassingly Parallel Problems Image processing, graphics rendering Fractal images (e.g. Mandelbrot set) String matching Distributed queries, MapRecuce Brute-force cryptographic attacks Bitcoin mining Workloads that can be easily separated into parallel tasks. This is often the case when there is no dependency between the work units.

Amdahl’s Law The speedup of a program using multiple processors in parallel computing is limited by the sequential fraction of the program. Gene Myron Amdahl (born November 16, 1922) is an American computer architect and high-tech entrepreneur, chiefly known for his work on mainframe computers at IBM and later his own companies, especially Amdahl Corporation. He formulated Amdahl's law, which states a fundamental limitation of parallel computing.

GPGPU Concepts Texture: A common way to provide the read-only input data stream as a 2D grid. Frame Buffer: A write-only memory interface for output. Kernel: The operation to perform on each unit of data. Roughly similar to the body of a loop.

Parallelizing Your Code Texture Frame Buffer void compute(float in[10000], float *out[10000]) { for(int i=0; i < 10000; i++) *out[i] = func(in[i]); } Kernel

GPGPU Frameworks C++ AMP OpenCL CUDA Subset of C++ Microsoft implementation based on DirectX, integrated into Visual Studio Supports most modern GPUs OpenCL Subset of C99 Implementations for Intel, AMD, and nVidia GPUs CUDA C++ SDK, wrappers for other languages Only supported on nVidia GPUs

Client Integration C++ AMP OpenCL Native C++ projects, P/Invoke from .NET, WinRT component, any language that can interoperate with native libraries Supports GPU debugging, profiling OpenCL Vendor-specific SDKs, available from Intel, AMD, IBM, and nVidia Wrappers for popular languages, including C#, Python, Java, etc. Supports multiple vendor-specific debuggers

Using C++ AMP Native DLL extern "C" __declspec ( dllexport ) void _stdcall square_array(float* arr, int n) { array_view<float,1> dataView(n, &arr[0]); parallel_for_each(dataView.extent, [=] (index<1> idx) restrict(amp) dataView[idx] = dataView[idx] * dataView[idx]; }); dataView.synchronize(); }

Using C++ AMP Managed Code [DllImport("NativeAmpLibrary", CallingConvention = CallingConvention.StdCall)] extern unsafe static void square_array(float* array, int length); float[] arr = new[] { 1.0f, 2.0f, 3.0f, 4.0f }; fixed (float* arrPt = &arr[0]) { square_array(arrPt, arr.Length); }

Using OpenCL C# Project NuGet Package

Using OpenCL OpenCL Code

Using Aparapi (OpenCL) Java Code Converts Java bytecode to OpenCL at runtime Syntax somewhat similar to C++ AMP final float[] data = new float[size]; Kernel kernel = new Kernel(){ @Override public void run() { int gid = getGlobalId(); data[gid] = data[gid] * data[gid]; } }; kernel.execute(Range.create(512));

Simple GPGPU Applications Demo Time! Simple GPGPU Applications

Case Study 1: Edge Detection Sobel Operator Find all the points in the image where the brightness changes sharply. Pixels can be checked in parallel

Processing a Video Stream More Demo Time! Processing a Video Stream

Case Study 2: Password Cracking Passwords are commonly stored as hashes of the original plain text: "12345" = "5994471abb01112afcc18159f6cc74b4f511b99806da59b3caf5a9c173cacfc5" Cracking a password by brute force requires repeatedly hashing guesses until a match is found – can be parallelized effectively.

Cracking a Single Password Hash with a Dictionary Attack Even More Demos! Cracking a Single Password Hash with a Dictionary Attack

Fast hash algorithms like MD5, SHA1 and SHA2 are terrible for storing passwords. Use CPU intensive algorithms like PBKDF2, bcrypt, scrypt. They are expensive to calculate and have an adjustable work factor.

Thank you! @DavidOstrovsky CodeHardBlog.azurewebsites.net linkedin.com/in/davidostrovsky davido@couchbase.com David Ostrovsky | Couchbase