Past Practices of Conventional Core Microarchitecture is Dead ROADMAP Backdrop Brief history of graphics hardware Why GPU Computing? Progression GPU Computing 1.0 – compute pretending to be graphics GPU Computing 2.0 – direct computing, CUDA GPU Computing 3.0 – an emerging ecosystem Future Driving workloads GPU Computing 4.0? Steve Keckler Architecture Research Group
Old Equations
New Equation
Where are we now? Today’s high-end CPUs: 1-2nJ/Flop Today’s high-end GPUs: ~200pJ/Flop
What do things cost? Operation Energy 64-bit FP Operation 10.5pJ Regfile access (2 read/1 write) 5.5pJ Instruction RAM access 3.6pJ Data RAM access On-chip wire 18-110fJ/bit-mm 64-bit on-chip bus 1.2-7pJ/mm Standard off-chip link 30pJ/bit TSV (not including wire) 1-11fJ/bit 30nm with aggressive voltage scaling, from DARPA Exascale Report, 2008
Core microarchitecture is not dead But now need to focus on core perf/W Limit communication, storage access overheads All multicore proposals must focus on Perf/W Drive down overheads of data movement and tracking
Research poster submission deadline: August 15 www.nvidia.com/gtc Research poster submission deadline: August 15