Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

Emergence of GPU systems for general purpose high performance computing
ITCS 4145/5145 July 12, © Barry Wilkinson CUDAIntro.ppt

Fastest computer systems in the world
In the last few years, GPUs have moved from simply supporting graphics to being designed and used for high performance computing. Many if not most high performance clusters use GPUs Uses NVIDIA GPUs Japanese Chinese Tianhe-1A (was #1 now #2) has 7168 NVIDIA M2050 GPUs

coit-grid01.uncc.edu – coit-grid7.uncc.edu
cluster Login directly from within UNC-C campus only Login from on-campus or off-campus Use coit-grid01.uncc.edu coit-grid01 coit-grid02 coit-grid03 coit-grid04 coit-grid07 coit-grid06 NVIDIA C2050 GPU (448 cores) NVIDIA C2050 GPU (448 cores) switch coit-grid07: GPU server, X GHz quad-core Xeon processor with NVIDIA 2050 GPU, 12GB main memory (Can hold four C2050 GPUs, 1792 cores!) All user’s home directories on coit-grid05 (NFS) coit-grid06 currently kept turned off as a back-up system coit-grid05 GPU servers grid06 and grid07 for HPC GPU programming Can also use Windows lab machines if have NVIDIA cards and software.

CPU-GPU architecture evolution
1970s s Co-processors -- very old idea that appeared in 1970s and 1980s with floating point co-processors attached to microprocessors that did not then have floating point capability. These coprocessors simply executed floating point instructions that were fetched from memory. Around same time, interest to provide hardware support for displays, especially with increasing use of graphics and PC games. Led to graphics processing units (GPUs) attached to CPU to create video display. Early design Memory CPU Graphics card Display

Pipelined programmable GPU
Dedicated pipeline (late1990s-early 2000s) By late1990’s, graphics chips needed to support 3-D graphics, especially for games and graphics APIs such as DirectX and OpenGL. Graphics chips generally had a pipeline structure with individual stages performing specialized operations, finally leading to loading frame buffer for display. Individual stages may have access to graphics memory for storing intermediate computed data. Input stage Vertex shader stage Graphics memory Geometry shader stage Rasterizer stage Frame buffer Pixel shading stage

Graphics Processing Units (GPUs) Brief History
GPU Computing General-purpose computing on graphics processing units (GPGPUs) GPUs with programmable shading Nvidia GeForce GE 3 (2001) with programmable shading DirectX graphics API OpenGL graphics API Hardware-accelerated 3D graphics S3 graphics cards- single chip 2D accelerator Atari 8-bit computer text/graphics chip IBM PC Professional Graphics Controller card Playstation 1970 1980 1990 2000 2010 Source of information

Established by Jen-Hsun Huang, Chris Malachowsky, Curtis Priem
NVIDIA products NVIDIA Corp. is the leader in GPUs for high performance computing: Maxwell (2013) Tesla 2050 GPU has 448 thread processors Kepler (2011) Fermi NVIDIA's first GPU with general purpose processors Tesla C870, S870, C1060, S1070, C2050, … GeForce 400 series GTX460/465/470/475/480/485 Quadro Established by Jen-Hsun Huang, Chris Malachowsky, Curtis Priem GT 80 GeForce 200 series GeForce 8800 GTX260/275/280/285/295 GeForce 8 series GeForce FX series GeForce 2 series NV1 GeForce 1 1993 1995 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

GeForce 6 Series Architecture (2004-5)
From GPU Gems 2, Copyright 2005 by NVIDIA Corporation

General-Purpose GPU designs
High performance pipelines call for high-speed (IEEE) floating point operations. People tried to use GPU cards to speed up scientific computations Known as GPGPU (General-purpose computing on graphics processing units) -- Difficult to do with specialized graphics pipelines, but possible.) By mid 2000’s, recognized that individual stages of graphics pipeline could be implemented by a more general purpose processor core (although with a data-parallel paradigm) a

NVIDIA GT 80 chip/GeForce 8800 card (2006)
First GPU for high performance computing as well as graphics Unified processors that could perform vertex, geometry, pixel, and general computing operations Could now write programs in C rather than graphics APIs. Single-instruction multiple thread (SIMT) prog. model

GPU performance gains over CPUs
T12 Westmere NV30 NV40 G70 G80 GT200 3GHz Dual Core P4 3GHz Core2 Duo 3GHz Xeon Quad Source © David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL Spring 2010, University of Illinois, Urbana-Champaign

NVIDIA Fermi architecture
Evolving GPU design: NVIDIA Fermi architecture (announced Sept 2009) Data parallel single instruction multiple data operation (“Stream” processing) Up to 512 cores (“stream processing engines”, SPEs, organized as 16 SPEs, each having 32 SPEs) 3GB or 6 GB GDDR5 memory Many innovations including L1/L2 caches, unified device memory addressing, ECC memory, … First implementation: Tesla 20 series (single chip C2050/2070, 4 chip S2050/2070) 3 billion transistor chip? Number of cores limited by power considerations, C2050 has 448 cores. * Whitepaper NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, NVIDIA, 2008

Most recent NVIDIA architecture and GPUs (2012)
Called “Kepler” architecture GeForce 600 series card introduced early 2012. GTX 680 has 1536 cores, 195 watts. Introduced March 2012. GXT 690 has two dies, 3072 cores (2 x 1536 cores), 300 watts. Introduced April 2012. CUDA Computer Capability 3.0 see next GK104 chip with 1536 cores

(Compute Unified Device Architecture)
CUDA (Compute Unified Device Architecture) Architecture and programming model introduced in NVIDIA in 2007 Enables GPUs to execute programs written in C. Within C programs, call SIMT “kernel” routines that are executed on GPU. CUDA syntax extension to C identify routine as a Kernel. Very easy to learn although to get highest possible execution performance requires understanding of hardware architecture. Version 3 introduced in 2009 – the one we have been using Current version 4 introduced 2011 – significant additions including “unified virtual addressing” – a single address space across GPU and host, see later. We will go into CUDA in detail later and have programming experiences.

UNC-C CUDA Teaching Center
2010: NVIDIA Corp. selected UNC-Charlotte Department of Computer Science to be a CUDA Teaching Center, kindly providing GPU equipment and TA support. 2011: NVIDIA kindly provided 50 GTX 480 GPU cards valued at $15,000 as continuing support for the CUDA Teaching Center. Our course materials are posted on NVIDIA’s corporate site next to those from Stanford, and other top schools.

Questions

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

Similar presentations

Presentation on theme: "Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.

Similar presentations

Presentation on theme: "Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt."— Presentation transcript:

Similar presentations

About project

Feedback