CMSC 611: Advanced Computer Architecture

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
CM-5 Massively Parallel Supercomputer ALAN MOSER Thinking Machines Corporation 1993.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Graphics Hardware and Software Architectures
A many-core GPU architecture.. Price, performance, and evolution.
CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
X86 and 3D graphics. Quick Intro to 3D Graphics Glossary: –Vertex – point in 3D space –Triangle – 3 connected vertices –Object – list of triangles that.
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
IN4151 Introduction 3D graphics 1 Introduction to 3D computer graphics part 2 Viewing pipeline Multi-processor implementation GPU architecture GPU algorithms.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Graphic Architecture introduction and analysis
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
Graphics Processors CMSC 411. GPU graphics processing model Texture / Buffer Texture / Buffer Vertex Geometry Fragment CPU Displayed Pixels Displayed.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
CEG 4131-Fall Graphics Processing Unit GPU CEG4131 – Fall 2012 University of Ottawa Bardia Bandali CEG4131 – Fall 2012.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
Interactive Rendering of Meso-structure Surface Details using Semi-transparent 3D Textures Vision, Modeling, Visualization Erlangen, Germany November 16-18,
A Sorting Classification of Parallel Rendering Molnar et al., 1994.
Prof. Hakim Weatherspoon CS 3410, Spring 2015 Computer Science Cornell University.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
Copyright © Curt Hill Video Hardware Evolution.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Graphics Pipeline Bringing it all together. Implementation The goal of computer graphics is to take the data out of computer memory and put it up on the.
The Present and Future of Parallelism on GPUs
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
GPU Architecture and Its Application
Lynn Choi School of Electrical Engineering
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Programmable Shaders Dr. Scott Schaefer.
A Crash Course on Programmable Graphics Hardware
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
GPU and Effects Graphics pipeline Programmable shaders Special effects.
CSC 2231: Parallel Computer Architecture and Programming GPUs
Graphics Hardware CMSC 491/691.
Models and Architectures
CMSC 611: Advanced Computer Architecture
Graphics Processing Unit
CS5500 Computer Graphics April 17, 2006 CS5500 Computer Graphics
UMBC Graphics for Games
Ray Tracing on Programmable Graphics Hardware
RADEON™ 9700 Architecture and 3D Performance
Graphics Processing Unit
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

CMSC 611: Advanced Computer Architecture Complex Parallel Systems

Computational Examples Connection Machine 5 SGI Origin Intel Nehalem

Thinking Machines CM5 (1993) MIMD, SPARC processors Fat Tree communication network D. Hillis and L. Tucker, “The CM-5 Connection Machine: A Scalable Supercomputer,” Communications of the ACM, v36n11, November 1993

SGI Origin (1998) MIPS R10000 processor Hypercube connected ccNUMA / directory protocol

SGI Origin Node Ammon, "Hypercube Connectivity within ccNUMA Architecture", Silicon Graphics, 1998.

Origin Communication Level Latency (ns) L1 cache 5.1 L2 cache 56.4 local memory 310 4P remote memory 540 8P avg. remote memory 707 16P avg. remote memory 726 32P avg. remote memory 773 64P avg. remote memory 867 128P avg. remote memory 945 Laudon and Lenoski, "SGI Origin: A ccNUMA Highly Scalable Server", Proceedings of Computer Architecture 1997

Intel Nehalem Design Appaloosa, “Intel Nehalem Microarchitecture”, Wikimedia project, November 2008

Communication Performance Michael Thomadakis: The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms

Graphics Hardware Problem domain Pixel-Planes 4 Pixel-Planes 5 SGI Reality Engine Pixel Flow NVIDIA GeForce 6 NVIDIA Maxwell

Graphics Rendering Just model the surfaces (that’s all you can see) Approximate them with a mesh of triangles Get really good at rendering triangles

Graphics Pipeline Transform: find where each vertex goes on the screen Clip Rasterize Shade Visibility/Blend Display

Graphics Pipeline Clip: get rid of off-screen parts (especially behind the viewer) Transform Clip Rasterize Shade Visibility/Blend Display

Graphics Pipeline Rasterize: find which pixels are inside the triangle Transform Clip Rasterize Shade Visibility/Blend Display

Graphics Pipeline Shade: compute the color for each pixel Transform Clip Rasterize Shade Visibility/Blend Display

Graphics Pipeline Visibility: throw out pixels covered by opaque stuff that’s already rendered Blend: Combine colors for partially transparent objects Transform Clip Rasterize Shade Visibility/Blend Display

Graphics Pipeline Display: Show results to user Transform Clip Rasterize Shade Visibility/Blend Display

Graphics Pipeline vertex triangle pixel frame Transform Clip Rasterize Shade Visibility/Blend Display triangle pixel frame

Computation and Bandwidth Based on: • 100 Mtri/sec (1.6M/frame @ 60Hz) • 256 Bytes vertex data • 128 Bytes interpolated • 68 Bytes pixel output • 5x depth complexity • 16 4-Byte textures • 223 ops/vertex • 1664 ops/pixel • No caching • No compression Vertex 75 GB/s 67 GFLOPS Triangle 13 GB/s 335 GB/s Texture 45 GB/s Fragment Pixel 1.1 TFLOPS

UNC Pixel-Planes 4 (1985) DSP vertex processor Custom rasterizer 512x512 SIMD array Full screen Fuchs et al., ”Fast spheres, shadows, textures, transparencies, and image enhancements in pixel-planes", SIGGRAPH 1985

UNC Pixel-Planes 5 (1989) ~40 i860 CPUs for vertex processing ~20 128x128 SIMD arrays for pixel processing Fuchs et al., ”Pixel-Planes 5: a heterogeneous multiprocessor graphics system using processor enhanced memory", SIGGRAPH 1989

SGI Reality Engine (1993) Akeley, ”Reality Engine Graphics", SIGGRAPH 1993

Pixel-Flow (1992-1997) ~35 nodes, each with 2 HP-PA 8000 CPUs 128x64 SIMD array (~160 tiles/screen) Eyles, et al., "PixelFlow: The Realization", Graphics Hardware 1997

Pixel-Flow Eyles, et al., "PixelFlow: The Realization", Graphics Hardware 1997

NVIDIA GeForce 6 (2004) Kilgariff and Fernando, ”The GeForce 6 GPU Architecture", GPU Gems 2, 2005

GeForce 6 Parallelism More Parallel Data Parallel … Vertex Triangle Pixel Triangle Pipeline More Parallel More Pipeline

NVIDIA G80/Tesla (2006) NVIDIA, “NVIDIA GeForce 8800 GPU Architecture Overview”, TB-02787-001_v01, November 2006

NVIDIA Maxwell (2014) NVIDIA, NVIDIA GeForce GTX 980 Whitepaper, 2014

Maxwell SIMD Processing Block 32 Cores 8 Special Function NVIDIA Terminology: Warp = interleaved threads Hide memory latency Want at least 4-8 Thread Block = Warps*Cores Flexible Registers Trade registers for warps NVIDIA, NVIDIA GeForce GTX 980 Whitepaper, 2014

Maxwell Streaming Multiprocessor (SMM) 4 SIMD blocks Share L1 Caches Share memory Share tessellation HW NVIDIA, NVIDIA GeForce GTX 980 Whitepaper, 2014

Maxwell Graphics Processing Cluster 4 SMM Share rasterizer NVIDIA, NVIDIA GeForce GTX 980 Whitepaper, 2014

Full Maxwell (again) NVIDIA, NVIDIA GeForce GTX 980 Whitepaper, 2014