© NVIDIA Corporation 2009 Background Founded 2006 by NVIDIA Chief Scientist David Kirk Mission: long-term strategic research Discover & invent new markets Influence product roadmaps Follow, support, and focus academic research Improve parallel computing education
© NVIDIA Corporation 2009 Topics Visual computing Real-time rendering, cinematic rendering, animation, modeling, visualization, computational photography Parallel computing Programming languages, compilers, numerics, HPC applications, architecture, circuit design, interconnects Mobile computing Low-power computing, networks, HCI
© NVIDIA Corporation 2009 Personnel Currently 25 full-time researchers in CA, NC, MI, MN, VA, UT, Berlin, Helsinki 2 National Academy members 1 Academy Award 5 recent former faculty
© NVIDIA Corporation 2009 External Research Collaborations UC Berkeley: parallel programming UC Davis – parallel algorithms U British Columbia – imaging, architecture U North Carolina – ray tracing, hybrid rendering U Virginia – architecture, perceptual psychology UCLA – oceanography U Massachusetts – real-time rendering Chalmers University – real-time rendering U Utah – HPC, ray tracing NC State – rendering algorithms Johns Hopkins – data-intensive computing Brown – computer vision Saarland U – ray tracing U Illinois – parallel programming Weta – cinematic rendering Williams College – real-time rendering
© NVIDIA Corporation 2009 Example: Skin Rendering Real-time subsurface scattering Multilayer translucent materials ~5 minutes ~11 ms No precomputation Key insight: project diffusion profiles onto sum-of-Gaussians basis
© NVIDIA Corporation 2009 Raytracing
© NVIDIA Corporation 2009 NVIRT: CUDA Ray Tracing API
© NVIDIA Corporation 2009 Example: Programming Languages Copperhead: Cu + Python Copperhead is a subset of Python, designed for data parallelism Python: extant, well accepted high level scripting language Already understands things like map and reduce Comes with a parser & lexer The current Copperhead compiler takes a subset of Python and produces CUDA code
© NVIDIA Corporation 2009 Copperhead is not Pure Python Copperhead is not for arbitrary Python code Most features of Python are unsupported Connecting Python & Copperhead code will require binding similar to Python-C interaction Copperhead is compiled, not interpreted Statically typed Python Copperhead
© NVIDIA Corporation 2009 Saxpy: Hello world Some things to notice: Types are implicit The Copperhead compiler uses a Hindley-Milner type system with typeclasses similar to Haskell Typeclasses are fully resolved in CUDA via C++ templates Functional programming: map, lambda (or equivalent in list comprehensions) you can pass functions around to other functions Closure: the variable ‘a’ is free in the lambda function, but bound to the ‘a’ in its enclosing scope def saxpy(a, x, y): return map(lambda xi, yi: a*xi + yi, x, y)
© NVIDIA Corporation 2009 Example: Parallel Programming thrust is a library of data parallel algorithms & data structures with an interface similar to the C++ Standard Template Library for CUDA C++ template metaprogramming automatically chooses the fastest code path at compile time Data Structures thrust::device_vector thrust::host_vector thrust::device_ptr Etc. Algorithms thrust::sort thrust::reduce thrust::exclusive_scan Etc.
© NVIDIA Corporation 2009 thrust::sort sort.cu #include int main(void) { // generate random data on the host thrust::host_vector h_vec( ); thrust::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and sort thrust::device_vector d_vec = h_vec; // sort 140M 32b keys/sec on GT200 thrust::sort(d_vec.begin(), d_vec.end()); return 0; } #include int main(void) { // generate random data on the host thrust::host_vector h_vec( ); thrust::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and sort thrust::device_vector d_vec = h_vec; // sort 140M 32b keys/sec on GT200 thrust::sort(d_vec.begin(), d_vec.end()); return 0; }
© NVIDIA Corporation 2009 thrust::sort sort.cu #include int main(void) { // generate random data on the host thrust::host_vector h_vec( ); thrust::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and sort thrust::device_vector d_vec = h_vec; // sort 140M 32b keys/sec on GT200 thrust::sort(d_vec.begin(), d_vec.end()); return 0; } #include int main(void) { // generate random data on the host thrust::host_vector h_vec( ); thrust::generate(h_vec.begin(), h_vec.end(), rand); // transfer to device and sort thrust::device_vector d_vec = h_vec; // sort 140M 32b keys/sec on GT200 thrust::sort(d_vec.begin(), d_vec.end()); return 0; }
© NVIDIA Corporation 2009 thrust::reduce reduce.cu #include int main(void) { // generate random data on the host thrust::host_vector h_vec( ); thrust::generate(h_vec.begin(), h_vec.end(), rand); // compute sum thrust::device_vector d_vec = h_vec; int x = thrust::reduce(d_vec.begin(), d_vec.end(), thrust::plus ()); return 0; } #include int main(void) { // generate random data on the host thrust::host_vector h_vec( ); thrust::generate(h_vec.begin(), h_vec.end(), rand); // compute sum thrust::device_vector d_vec = h_vec; int x = thrust::reduce(d_vec.begin(), d_vec.end(), thrust::plus ()); return 0; }
© NVIDIA Corporation 2009 Thrust thrust.googlecode.com Open source (Apache2 license)
© NVIDIA Corporation 2008 Example: Sparse Matrix-Vector CPU Results from “Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms", Williams et al, Supercomputing 2007
© NVIDIA Corporation 2009 Example: Sort Radix Sorting Rate
© NVIDIA Corporation 2009 Example: Fluid Dynamics HOT COLD CIRCULATING CELLS INITIAL TEMPERATURE Rayleigh-Bénard Convection
© NVIDIA Corporation 2009 Rayleigh-Bénard Results Double precision 384 x 384 x 192 grid (max that fits in 4GB) Vertical slice of temperature at y=0 Transition from stratified (left) to turbulent (right) Regime depends on Rayleigh number: Ra = gαΔT/κν 8.5x speedup versus Fortran code running on 8-core 2.5 GHz Xeon
© NVIDIA Corporation 2009 Mission: Support Academic Research Serve as academic liaison Follow, inform, and influence external research Direct support – funding and equipment
© NVIDIA Corporation 2009 Sponsored Research Donate and discount equipment Professor Partnerships Ph.D. Fellowships CUDA Centers of Excellence New programs: CUDA Fellows CUDA Research Awards
© NVIDIA Corporation 2009 Mission: Support Parallel Computing Education Supporting courses & curricular efforts Creating & gathering online training materials Teaching courses (and putting them online) Writing textbooks
© NVIDIA Corporation 2009 Final Thoughts – Education We should teach parallel computing in CS 1 or CS 2 Computers don’t get faster, just wider Manycore is the future of computing Insertion Sort Heap Sort Merge Sort Which goes faster on large data? students need to understand this! now ALLEarly!
© NVIDIA Corporation 2009 NVIDIA Research Summit Sept 30 – Oct 2, 2009 – The Fairmont San Jose, California A cross-disciplinary forum for researchers using GPUs across science and engineering Join your colleagues, researchers in other fields, and the NVIDIA Research team for this valuable opportunity to gather, learn, and collaborate. Share your work with peers from many disciplines; learn from experts at NVIDIA and elsewhere. In-depth sessions on numeric computing, computational science, visual computing trends, and advanced CUDA programming & optimization Opportunities: Call for Posters open. Showcase your work, learn from your peers. Research Roundtables Moderated discussions led by your peers. Submit a roundtable to shape the hot topics in GPU computing! Opportunities: Call for Posters open. Showcase your work, learn from your peers. Research Roundtables Moderated discussions led by your peers. Submit a roundtable to shape the hot topics in GPU computing! Co-located with the GPU Technology Conference, a technical event focused on developers, engineers, researchers, senior executives, venture capitalists, press and analysts