NVIDIA Tesla GPU Zhuting Xue EE126. GPU Graphics Processing Unit The "brain" of graphics, which determines the quality of performance of the graphics.

Slides:



Advertisements
Similar presentations
Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.
Advertisements

Optimization on Kepler Zehuan Wang
Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Appendix A. Appendix A — 2 FIGURE A.2.1 Historical PC. VGA controller drives graphics display from framebuffer memory. Copyright © 2009 Elsevier, Inc.
Appendix A — 1 FIGURE A.2.2 Contemporary PCs with Intel and AMD CPUs. See Chapter 6 for an explanation of the components and interconnects in this figure.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
A many-core GPU architecture.. Price, performance, and evolution.
Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
1 Threading Hardware in G80. 2 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.
Graphics Processing Units
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
Panda: MapReduce Framework on GPU’s and CPU’s
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Direct Self-Consistent Field Computations on GPU Clusters Guochun.
GPU Programming and CUDA Sathish Vadhiyar High Performance Computing.
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Extracted directly from:
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
GPU Architecture and Programming
A Closer Look At GPUs By Kayvon Fatahalian and Mike Houston Presented by Richard Stocker.
GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
1 November 11, 2015 A Massively Parallel, Hybrid Dataflow/von Neumann Architecture Yoav Etsion November 11, 2015.
Sunpyo Hong, Hyesoon Kim
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2013 Branching.ppt Control Flow These notes will introduce scheduling control-flow.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
P&H Ap. A GPUs for Graphics and Computing. Appendix A — 2 FIGURE A.2.1 Historical PC. VGA controller drives graphics display from framebuffer memory.
Computer Engg, IIT(BHU)
Appendix C Graphics and Computing GPUs
CS427 Multicore Architecture and Parallel Computing
Parallel Computing Lecture
Multi-Processing in High Performance Computer Architecture:
Mattan Erez The University of Texas at Austin
NVIDIA Fermi Architecture
Chapter 1 Introduction.
Mattan Erez The University of Texas at Austin
Chapter 4 Multiprocessors
Graphics Processing Unit
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
CIS 6930: Chip Multiprocessor: Parallel Architecture and Programming
Presentation transcript:

NVIDIA Tesla GPU Zhuting Xue EE126

GPU Graphics Processing Unit The "brain" of graphics, which determines the quality of performance of the graphics

NVIDIA Tesla NVIDIA is a company, which designs graphics chip and chipset-based semiconductor, also achieving a lot of breakthroughs in parallel processing. Tesla is a GPU produced by NVIDIA, which is designed for large-scale parallel computer;mostly work for science research in labs.

GPU & CPU In NVIDIA’s design, the GPU cooperates with CPU to accelerate scientific and other applications. Application CodeGPUCPU Compute-Intensive Functions 5% of code Rest of Sequential CPU Code 95% of code

GPU Architecture Dynamic Parallelism All child launches must complete in order for the parent kernel to be seen as completed Recursion, irregular loop structure or other structures that do not fit a flat, single- level parallelism can be more transparently expressed. CPUB AX Y ZC

GPU Architecture Host interface and Compute Work Distribution Getting instructions and data from the host CPU and its main memory; also manage the threads of execution by assigning groups of threads to processor clusters and performing context switching; TPC (texture/processor cluster) A basic building block of Tesla architecture GPUs (a single GPU can have from one to eight TPCs); SM (Streaming Multiprocessor) A part of TPC, consisting of eight streaming processor cores, two special function units (for performing interpolation and approximate evaluation of trigonometric functions, logarithms, etc.), a multithreaded instruction fetch and issue unit, two caches and a pool of shared memory; Texture unit Each unit is shared by two SMs, the unit plays its part in graphics calculations and is equipped with L1 cache memory accessible from SPs; Level 2 Cache memory units are connected through fast network to TPCs. Host CPUBridgeSystem Memory Host Interface Input Assembler Vertex Work Distribution Pixel Work Distribution Compute Work Distribution viewport/clip/setu p/raster/zcull GPU Text Unit TPC SM Shared Memory SP SM Shared Memory SP Text Unit TPC SM Shared Memory SP SM Shared Memory SP Text Unit TPC SM Shared Memory SP SM Shared Memory SP Text Unit TPC SM Shared Memory SP SM Shared Memory SP Interconnection Network ROPL2 DRAM ROPL2 DRAM ROPL2 DRAM

GPU Challenges The GPU remains a specialized processor Its performance in graphics computation belies a host of difficulties to perform true general-purpose computing. The processors themselves require recompiling software they have rudimentary programming tools, as well as limits in programming languages and features.

GPU VS CPU CPU VS GPU

Conclusion GPU still cooperates with CPU, which is supposed to be broken through so that the GPU can complete all work without the help of CPU. The competition between CPU and GPU is positive, which leads to the development of both GPU and CPU.

Reference [1]Maciol, Pawel, and Krzysztof Banas. "Testing tesla architecture for scientific computing: The performance of matrix-vector product." Computer Science and Information Technology, IMCSIT International Multiconference on. IEEE, [2]Andreyev, A., A. Sitek, and A. Celler. "Acceleration of blob-based iterative reconstruction algorithm using Tesla GPU." Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE. IEEE, [3]Lindholm, Erik, et al. "NVIDIA Tesla: A unified graphics and computing architecture." Ieee Micro 28.2 (2008): [4]Heinecke, Alexander. "Accelerators in scientific computing is it worth the effort?." High Performance Computing and Simulation (HPCS), 2013 International Conference on. IEEE, 2013.