Thrust & Curand Dr. Bo Yuan

Slides:



Advertisements
Similar presentations
Jared Hoberock and Nathan Bell
Advertisements

M The University Of Michigan Andrew M. Morgan EECS Lecture 22 Savitch Ch. 16 Intro To Standard Template Library STL Container Classes STL Iterators.
Copyright © 2002 Pearson Education, Inc. Slide 1.
Chapter 19 Standard Template Library. Copyright © 2006 Pearson Addison-Wesley. All rights reserved Learning Objectives Iterators Constant and mutable.
LOGO Regression Analysis Lecturer: Dr. Bo Yuan
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
Hold data and provide access to it. Random-access containers: -Allow accessing any element by index -arrays, vectors Sequential containers: -Allow accessing.
SEG4110 – Advanced Software Design and Reengineering TOPIC J C++ Standard Template Library.
Brown Bag #2 Advanced C++. Topics  Templates  Standard Template Library (STL)  Pointers and Smart Pointers  Exceptions  Lambda Expressions  Tips.
Introduction to C++ Programming. Brief Facts About C++ Evolved from C Designed and implemented by Bjarne Stroustrup at the Bell Labs in the early 1980s.
Chapter 20 The STL (containers, iterators, and algorithms) John Keyser’s Modifications of Slides by Bjarne Stroustrup
Object Oriented Programming Lect. Dr. Daniel POP Universitatea de Vest din Timişoara Facultatea de Matematică şi Informatică.
Vectors, lists and queues
. STL: C++ Standard Library (continued). STL Iterators u Iterators are allow to traverse sequences u Methods  operator*  operator->  operator++, and.
STL Antonio Cisternino. Introduction The C++ Standard Template Library (STL) has become part of C++ standard The main author of STL is Alexander Stephanov.
Multimaps. Resources -- web For each new class, browse its methods These sites are richly linked, and contain indexes, examples, documents and other resources.
GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.
1 Introduction to Standard Template Library (STL) Wenguang Wang and Yanping Zhao March 20, 2000.
Beginning C++ Through Game Programming, Second Edition
STL. What is STL? Standard Templates Library Templates are best used for –Defining containers for storing data –Some kinds of algorithms that work the.
More on the STL vector list stack queue priority_queue.
Computer programming 1 Multidimensional Arrays, and STL containers: vectors and maps.
Arrays, Pointers and Structures CS-240 Dick Steflik.
Standard Template Library. Homework List HW will be posted on webpage Due Nov 15.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
An Introduction to the Thrust Parallel Algorithms Library.
Standard Template Library C++ introduced both object-oriented ideas, as well as templates to C Templates are ways to write general code around objects.
2012/06/22 Contents  GPU (Graphic Processing Unit)  CUDA Programming  Target: Clustering with Kmeans  How to use.
Chapter 7: Arrays. In this chapter, you will learn about: One-dimensional arrays Array initialization Declaring and processing two-dimensional arrays.
STL !!!generic programming!!! Anar Manafov
1 ITCS 4/5010 GPU Programming, UNC-Charlotte, B. Wilkinson, Jan 14, 2013 CUDAProgModel.ppt CUDA Programming Model These notes will introduce: Basic GPU.
Comp 245 Data Structures Linked Lists. An Array Based List Usually is statically allocated; may not use memory efficiently Direct access to data; faster.
Functional Programming Think Differently!! Functional languages are expressive, accomplishing tasks using short, succinct and readable code. Functional.
Templates Mark Hennessy Dept Computer Scicene NUI Maynooth C++ Workshop 18 th – 22 nd September 2006.
Iterator for linked-list traversal, Template & STL COMP171 Fall 2005.
 2003 Prentice Hall, Inc. All rights reserved.m ECE 2552 Dr. Këpuska based on Dr. S. Kozaitis Summer Chapter 15 - Class string and String Stream.
CS212: Object Oriented Analysis and Design Lecture 24: Introduction to STL.
Lecture 8-3 : STL Algorithms. STL Algorithms The Standard Template Library not only contains container classes, but also algorithms that operate on sequence.
Lecture 7 : Intro. to STL (Standard Template Library)
Intro to the C++ STL Timmie Smith September 6, 2001.
Copyright © 2006 Pearson Addison-Wesley. All rights reserved Today’s Learning Objective  Standard Template Library.
Introduction The STL is a complex piece of software engineering that uses some of C++'s most sophisticated features STL provides an incredible amount.
1 The Standard Template Library Drozdek Section 3.7.
Lecture 7.  There are 2 types of libraries used by standard C++ The C standard library (math.h) and C++ The C++ standard template library  Allows us.
Lecture 7-3 : STL Algorithms. STL Algorithms The Standard Template Library not only contains container classes, but also algorithms that operate on sequence.
Copyright © Curt Hill STL Priority Queue A Heap-Like Adaptor Class.
14-1 Computing Fundamentals with C++ Object-Oriented Programming and Design, 2nd Edition Rick Mercer Franklin, Beedle & Associates, 1999 ISBN
Standard Template Library
Lecture 7-2 : STL Iterators
Standard Template Library (STL)
Generic Programming Techniques in C++
Collections Intro What is the STL? Templates, collections, & iterators
Lecture 7-2 : STL Iterators
Today’s Learning Objective
Introduction to CUDA C Slide credit: Slides adapted from
Chapter 9 One-Dimensional Arrays
Containers, Iterators, Algorithms, Thrust
Computer Programming Dr. Deepak B Phatak Dr. Supratik Chakraborty
STL and Example.
Standard Template Library Model
CSE 303 Lecture 25 Loose Ends and Random Topics
Copyright © – Curt Hill STL List Details Copyright © – Curt Hill.
Lecture 8-2 : STL Iterators and Algorithms
CUDA Programming Model
GPU Lab1 Discussion A MATRIX-MATRIX MULTIPLICATION EXAMPLE.
STL and Example.
Standard Template Library
Collections Intro What is the STL? Templates, collections, & iterators
Standard Template Library
Presentation transcript:

Thrust & Curand Dr. Bo Yuan

Outline Thrust Basic Work with Thrust General Types of Thrust Fancy Iterators of Thrust Curand Library 2

Thrust Basic What is Thrust? Containers Iterators Interaction with raw pointers Namespace 3

What is Thrust? C++ template library for CUDA –Mimics Standard Template Library (STL) Containers –thrust::host_vector –thrust::device_vector Algorithms –thrust::sort() –thrust::reduce() –thrust::inclusive_scan() –… 4

Containers Make common operations concise and readable –Hides cudaMalloc, cudaMemcpy and cudaFree // allocate host vector with two elements thrust::host_vector h_vec(2); // copy host vector to device thrust::device_vector d_vec = h_vec; // manipulate device values from the host d_vec[0] = 13; d_vec[1] = 27; std::cout << "sum: " << d_vec[0] + d_vec[1] << std::endl; // vector memory automatically released w/ free() or cudaFree() 5

Iterators Sequences defined by pair of iterators // allocate device vector thrust::device_vector d_vec(4); // returns iterator at first element of d_vec d_vec.begin(); // returns iterator one past the last element of d_vec d_vec.end() // [begin, end) pair defines a sequence of 4 elements 6 d_vec.begin() d_vec.end()

Iterators // initialize random values on host thrust::host_vector h_vec(1000); thrust::generate(h_vec.begin(), h_vec.end(), rand); // copy values to device thrust::device_vector d_vec = h_vec; // compute sum on host int h_sum = thrust::reduce(h_vec.begin(),h_vec.end()); // compute sum on device int d_sum = thrust::reduce(d_vec.begin(),d_vec.end()); 7

Convertible to Raw Pointers // allocate device vector thrust::device_vector d_vec(4); // obtain raw pointer to device vectors memory int * ptr = thrust::raw_pointer_cast(&d_vec[0]); // use ptr in a CUDA C kernel my_kernel >>(N, ptr); // Note: ptr cannot be dereferenced on the host! 8

Wrap Raw Pointers with device_ptr // raw pointer to device memory int * raw_ptr; cudaMalloc((void **) &raw_ptr, N * sizeof(int)); // wrap raw pointer with a device_ptr thrust::device_ptr dev_ptr(raw_ptr); // use device_ptr in thrust algorithms thrust::fill(dev_ptr, dev_ptr + N, (int) 0); // access device memory through device_ptr dev_ptr[0] = 1; 9

Namespaces C++ supports namespaces –Thrust uses thrust namespace thrust::device_vector thrust::copy –STL uses std namespace std::vector std::list Avoids collisions –thrust::sort() –std::sort() For brevity –using namespace thrust; 10

Usage of Thrust: Thrust provides many standard algorithms –Transformations –Reductions –Prefix Sums –Sorting Generic definitions –General Types Built-in types ( int, float, …) User-defined structures –General Operators reduce with plus operator scan with maximum operator 11

Transformations // allocate three device_vectors with 10 elements thrust::device_vector X(10); thrust::device_vector Y(10); thrust::device_vector Z(10); // initialize X to 0,1,2,3,.... thrust::sequence(X.begin(), X.end()); // compute Y = -X thrust::transform(X.begin(), X.end(), Y.begin(), thrust::negate ()); // fill Z with twos thrust::fill(Z.begin(), Z.end(), 2); // compute Y = X mod 2 thrust::transform(X.begin(), X.end(), Z.begin(), Y.begin(), thrust::modulus ()); // replace all the ones in Y with tens thrust::replace(Y.begin(), Y.end(), 1, 10); 12

Reductions int sum = thrust::reduce(D.begin(), D.end(), (int) 0, thrust::plus ()); int sum = thrust::reduce(D.begin(), D.end(), (int) 0); int sum = thrust::reduce(D.begin(), D.end()) thrust::count_if thrust::min_element thrust::max_element thrust::is_sorted thrust::inner_product 13

Sorting thrust::sort thrust::stable_sort thrust::sort_by_key thrust::stable_sort_by_key 14

General Types and Operators #include // declare storage device_vector i_vec =... device_vector f_vec =... // sum of integers (equivalent calls) reduce(i_vec.begin(), i_vec.end()); reduce(i_vec.begin(), i_vec.end(), 0, plus ()); // sum of floats (equivalent calls) reduce(f_vec.begin(), f_vec.end()); reduce(f_vec.begin(), f_vec.end(), 0.0f, plus ()); // maximum of integers reduce(i_vec.begin(), i_vec.end(), 0, maximum ()); 15

General Types and Operators struct negate_float2 { __host__ __device__ float2 operator()(float2 a) { return make_float2(-a.x, -a.y); } }; // declare storage device_vector input =... device_vector output =... // create functor negate_float2 func; // negate vectors transform(input.begin(), input.end(), output.begin(), func); 16

General Types and Operators // compare x component of two float2 structures struct compare_float2 { __host__ __device__ bool operator()(float2 a, float2 b) { return a.x < b.x; } }; // declare storage device_vector vec =... // create comparison functor compare_float2 comp; // sort elements by x component sort(vec.begin(), vec.end(), comp); 17

Advanced Thrust Behave like normal iterators –Algorithms don't know the difference –constant_iterator –counting_iterator –transform_iterator –zip_iterator 18

Constant // create iterators constant_iterator begin(10); constant_iterator end = begin + 3; begin[0] // returns 10 begin[1] // returns 10 begin[100] // returns 10 // sum of [begin, end) reduce(begin, end); // returns 30 (i.e. 3 * 10) 19

Counting // create iterators counting_iterator begin(10); counting_iterator end = begin + 3; begin[0] // returns 10 begin[1] // returns 11 begin[100] // returns 110 // sum of [begin, end) reduce(begin, end); // returns 33 (i.e ) 20

Transform –Yields a transformed sequence –Facilitates kernel fusion –Conserves memory capacity and bandwidth 21 XYZ F( x ) F( y )F( z )

Transform // initialize vector device_vector vec(3); vec[0] = 10; vec[1] = 20; vec[2] = 30; // create iterator (type omitted) begin = make_transform_iterator(vec.begin(), negate ()); end = make_transform_iterator(vec.end(), negate ()); begin[0] // returns -10 begin[1] // returns -20 begin[2] // returns -30 // sum of [begin, end) reduce(begin, end); // returns -60 (i.e ) 22

Zip –Looks like an array of structs (AoS) –Stored in structure of arrays (SoA) 23

Zip // initialize vectors device_vector A(3); device_vector B(3); A[0] = 10; A[1] = 20; A[2] = 30; B[0] = x; B[1] = y; B[2] = z; // create iterator (type omitted) begin = make_zip_iterator(make_tuple(A.begin(), B.begin())); end = make_zip_iterator(make_tuple(A.end(), B.end())); begin[0] // returns tuple(10, x) begin[1] // returns tuple(20, y) begin[2] // returns tuple(30, z) // maximum of [begin, end) maximum > binary_op; reduce(begin, end, begin[0], binary_op); // returns tuple(30, z) 24

Curand Library # include curandStatus_t curandGenerate(curandGenerator_t generator, unsigned int *outputPtr, size_t num) A given state may not be used by more than one block. A given block may generate random numbers using multiple states. 25

Distribution __device__ float curand_uniform (curandState_t *state) __device__ float curand_normal (curandState_t *state) __device__ unsigned int curand_poisson (curandState_t *state, double lambda) __device__ double curand_uniform_double (curandState_t *state) __device__ float2 curand_normal2 (curandState_t *state) 26

struct estimate_pi{ public thrust::unary_function { __device__ float operator()(unsigned int thread_id) { float sum = 0; unsigned int N = 10000; // samples per thread unsigned int seed = thread_id; curandState s; // seed a random number generator curand_init(seed, 0, 0, &s); // take N samples in a quarter circle for(unsigned int i = 0; i < N; ++i) { 27 Estimating π

// draw a sample from the unit square float x = curand_uniform(&s); float y = curand_uniform(&s); float dist = sqrtf(x*x + y*y); // measure distance from the origin // add 1.0f if (u0,u1) is inside the quarter circle if(dist <= 1.0f) sum += 1.0f; } // multiply by 4 to get the area of the whole circle sum *= 4.0f; // divide by N return sum / N; }}; 28 Estimating π

int main(void) { // use 30K independent seeds int M = 30000; float estimate = thrust::transform_reduce ( thrust::counting_iterator (0),thrust::c ounting_iterator (M), estimate_pi(), 0.0f, thrust::plus ()); estimate /= M; std::cout << std::setprecision(3); std::cout << "pi is approximately "; std::cout << estimate << std::endl; return 0; } 29 Estimating π

Review What are containers and iterators? How to transfer information between Host and Device via Thurst? How to use thrusts namespace? In which way can we make containers associated with raw pointers? How to use Thrust to perform transformation, reduce and sort? How to use general types in Thrust? 30

Review Why constant iterators can save storage? When do we need to use counting iterators, and how much memory does it require? How to use transfor iterators? Why do we need zip iterators? How to generate different distribution by Curand? How to use Curand with Thrust? How to use Curand in a kernel function? 31

References An Introduction to Thrust GTC 2010 –(Part 1) - High Productivity Development –(Part 2) - Thrust By Example Thrust_Quick_Start_Guide Thrust - A Productivity-Oriented Library for CUDA CURAND_Library 32