Download presentation
Presentation is loading. Please wait.
Published byCale Peacher Modified over 9 years ago
1
Adding GPU Computing to Computer Organization Courses Karen L. Karavanic Portland State University with David Bunde, Knox College and Jens Mache, Lewis & Clark College
2
Our Backgrounds in CUDA Education Karavanic (PSU) – new course “Multicore Computing” in 2008 – “General Purpose GPU Computing” in 2010 – Mixed graduate/undergraduate Mache (Lewis & Clark) – Special topics course in CUDA – Project with students “Game of Life” Module Bunde (Knox) – Modules for teaching CUDA within existing courses SC12 HPC Educators [Full-Day] Session: – An Educators Toolbox for CUDA Adding GPU Computing to Computing Organization Courses 2
3
Why Teach Parallel Computing with GPUs? It is here – Students have GPUs (on desk/ on lap/ in pocket) – Inexpensive (no need to pay $$$ or to build) We see the future – Massively parallel: 100s of cores – Ahead of the curve (how many cores in your CPU?) We see pay-off – Performance improvements – Knowledge of computer architecture helps
4
4 Example CUDA program Adding two vectors, A and B N elements in A and B, and N threads (without code to load arrays with data) #define N 256 __global__ void vecAdd(int *A, int *B, int *C) { int i = threadIdx.x; C[i] = A[i] + B[i]; } int main (int argc, char **argv ) { int size = N *sizeof( int); int *a, *b, *c, *devA, *devB, *devC; a = (int*)malloc(size); b = (int*)malloc(size); c = (int*)malloc(size); cudaMalloc( (void**)&devA, size) ); cudaMalloc( (void**)&devB, size ); cudaMalloc( (void**)&devC, size ); cudaMemcpy( devA, a, size, cudaMemcpyHostToDevice); cudaMemcpy( devB, b size, cudaMemcpyHostToDevice); vecAdd >>(devA, devB, devC); cudaMemcpy( c, devC size, cudaMemcpyDeviceToHost); cudaFree( devA); cudaFree( devB); cudaFree( devC); free( a ); free( b ); free( c ); return (0); } 2 1 3 4 5 6
5
Why teach GPUs in Computer Organization? “Feed me” – Thread “execution” configuration (threads, blocks) – Transfer CPU – GPU – Explicit cache management “Conflict” – Architecture leads to large penalties for naïve code – synchronization
6
Mache - Unit goals Idea of parallelism Benefits and costs of system heterogeneity Data movement and NUMA Generally, the effect of architecture on program performance Adding GPU Computing to Computing Organization Courses 6
7
Bunde – Module Design Brief time: Course has lots of other goals – One 70-minute lab and parts of 2 lectures Relatively inexperienced students – Some just out of CS 2 – Many didn’t know C or Unix programming Adding GPU Computing to Computing Organization Courses 7
8
Bunde: Approach taken Introductory lecture – GPUs: massively parallel, outside CPU, kernels, SIMD Lab illustrating features of CUDA architecture – Data transfer time – Thread divergence – Memory types (next time) “Lessons learned” lecture – Reiterate architecture – Demonstrate speedup with Game of Life – Talk about use in Top 500 systems Adding GPU Computing to Computing Organization Courses 8
9
Bunde: Survey results: Good news Asked to describe CPU/GPU interaction: – 9 of 11 mention both data movement and invoking kernel – Another just mentions invoking the kernel Asked to explain experiment illustrating data movement cost: – 9 of 12 say comparing computation and communication cost – 2 more talk about comparing different operations Adding GPU Computing to Computing Organization Courses 9
10
Bunde: Survey results: Not so good news Asked to explain experiment illustrating thread divergence: – 2 of 9 were correct – 2 more seemed to understand, but misused terminology – 3 more remembered performance effect, but said nothing about the cause Adding GPU Computing to Computing Organization Courses 10
11
Convey’s Game of Life Rules Visual Demo
12
Game of Life Module - Results Adding GPU Computing to Computing Organization Courses 12 1=strongly disagree 7=strongly agree
13
Game of Life Module - Results Adding GPU Computing to Computing Organization Courses 13 1=strongly disagree 7=strongly agree
14
Game of Life Module - Results Adding GPU Computing to Computing Organization Courses 14 1=strongly disagree 7=strongly agree
15
Conclusions Bunde: – Unit was mostly successful, but thread divergence is a harder concept – Students interested in CUDA and about half the class requested more of it Mache: – What students say It’s not easy, it’s worthwhile, more please – What instructors think We’ll do it again, focus, use new resources Bottom line: A brief introduction is possible even to students with limited background Adding GPU Computing to Computing Organization Courses 15
16
Future Work Bunde – Will add constant memory and a small assignment to next offering Mache and Karavanic – Continuing Collaboration for summer 2013 course at PSU Versions of CUDA & Hardware Adding GPU Computing to Computing Organization Courses 16
17
Thank You We thank Barry Wilkinson for helpful input throughout our collaboration, and Julian Dale for his help in creating the GoL exercise and website. This material is based upon work supported by the National Science Foundation under grants 1044932, 1044299 and 1044973; by Intel; and by a PSU Miller Foundation Sustainability Grant. More information – Game of Life Exercise lclark.edu/~jmache/parallel – Authors Karen L. Karavanic karavan at cs.pdx.edu David Bunde dbunde at knox.edu Jens Mache jmache at lclark.edu Adding GPU Computing to Computing Organization Courses 17
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.