A many-core GPU architecture.. Price, performance, and evolution.

A many-core GPU architecture.

Price, performance, and evolution.

 CPU (Central Processing Unit) – general purpose processor able to execute computer programs.  GPU (Graphics Processing Unit) - dedicated graphics rendering device.

 The nVIDIA GeForce 6800 Ultra is able to reach a performance of 40 Gflops whereas an Intel 3GHz Pentium4 is able to reach only 6. [1]  What is more impressive, current cards such as ATI HD5870, AMD FireStream 9250, NVIDIA GeForce 9800 run between 1 and 3 TFLOPS.  Reasons for this include highly parallel vector processing, fast onboard memory, and pipeline constraints which stream data without stalls.

 GPU performance has approximately doubled every 6 months since the mid-1990s.  CPU performance doubles every 18 months on average (Moore’s law).

How we use GPUs.

 New trends are showing GPU use in scientific computing using data-parallel algorithms. Examples include:

Clustering GPU clustering to simulate the dispersion of airborne contaminants in New York City.

Image Stitching Fast seamless stitching and tone-mapping of gigapixel images. (~1 hour on a notebook PC)

Molecular Dynamics Molecular dynamics to evaluate forces between atoms that do not share bonds.

How it is built.

TYPICAL GPU  Ordered sequence of rendering steps.  Fixed hardware dedicated to each step. LARABEE  Runs most of its pipeline in software running on multiple general purpose x86 cores.  This allows the rendering pipeline to be reconfigured dynamically. Hence, we are able to skip steps or allocate extra resources when required.

 The Larrabee core is “derived” from the Pentium processor.  1 scalar unit for single operations and 1 vector unit for multiple operations.  32KB L1 data and instruction cache.  256 KB L2 cache which share a ring network.

 8KB L1 cache is 4 times larger than original Pentium.  This is due to the fact that each core is able to perform four-way multithreading to reduce thread switching overhead. (Not to be confused with simultaneous multithreading.)  The 256KB L2 cache share a ring network. If a core is unable to find data in its own L2 cache, it places a request on a ring bus/network and will eventually find the data in its L2.  Uses a rendering technique called binning, which divides the screen into regions, and renders polygons accordingly.

Benefits of Larrabee Game physics Real-time ray tracing Image and video processing Physical simulation Extended rendering capabilities

 [1] Zhe Fan, Feng Qiu, Kaufman A., Yoakum- Stover S. GPU Cluster for High Performance Computing. 2004. ACM / IEEE Supercomputing Conference 2004, November 06-12, Pittsburgh, PA.  [2] L. Seiler et al. 2008. Larrabee: A Many- Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, vl. 27, n. 3, Article 18, August 2008.

A many-core GPU architecture.. Price, performance, and evolution.

Similar presentations

Presentation on theme: "A many-core GPU architecture.. Price, performance, and evolution."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A many-core GPU architecture.. Price, performance, and evolution.

Similar presentations

Presentation on theme: "A many-core GPU architecture.. Price, performance, and evolution."— Presentation transcript:

Similar presentations

About project

Feedback