Graphical Processing Units

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Graphics Pipeline.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
1 Angel: Interactive Computer Graphics 4E © Addison-Wesley 2005 Models and Architectures Ed Angel Professor of Computer Science, Electrical and Computer.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
Computer Graphics Graphics Hardware
Invitation to Computer Science 5th Edition
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
1Computer Graphics Lecture 4 - Models and Architectures John Shearer Culture Lab – space 2
1 The Rendering Pipeline. CS788 Topic of HCI 2 Outline  Introduction  The Graphics Rendering Pipeline  Three functional stages  Example  Bottleneck.
Computer Graphics Chapter 6 Andreas Savva. 2 Interactive Graphics Graphics provides one of the most natural means of communicating with a computer. Interactive.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Applications and Rendering pipeline
1 Geometry for Game. Geometry Geometry –Position / vertex normals / vertex colors / texture coordinates Topology Topology –Primitive »Lines / triangles.
Computer Graphics Graphics Hardware
GPU Architecture and Its Application
Graphics Processor Graphics Processing Unit
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
- Introduction - Graphics Pipeline
Week 2 - Friday CS361.
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Introduction to OpenGL
Real-Time Ray Tracing Stefan Popov.
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Understanding Theory and application of 3D
2.1. Collision Detection Overview.
GRAPHICS PROCESSING UNIT
Models and Architectures
Models and Architectures
Models and Architectures
Introduction to Computer Graphics with WebGL
CISC AND RISC SYSTEM Based on instruction set, we broadly classify Computer/microprocessor/microcontroller into CISC and RISC. CISC SYSTEM: COMPLEX INSTRUCTION.
Graphics Processing Unit
Introduction to Computer Graphics with WebGL
Chapter V Vertex Processing
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Models and Architectures
Computer Graphics Graphics Hardware
Models and Architectures
Advanced Games Development Game Physics
RADEON™ 9700 Architecture and 3D Performance
Computer Graphics Introduction to Shaders
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Introduction to OpenGL
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Graphical Processing Units SUPERVISED BY: DR.HADI Adineh By: Azhar Albakry & Abbas Alkhafaji Dpartment: Software

INTRODUCTION The graphics processing unit (GPU) has become an integral part of today’s mainstream computing systems. Over the past years. GPU is not only a powerful graphics engine but also a highly parallel programmable processor The GPU’s rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU.

INTRODUCTION The GPU is designed for a particular class of applications with the following characteristics. Computational requirements are large. Real-time rendering requires billions of pixels per second, and each pixel requires hundreds or more operations. GPUs must deliver an enormous amount of compute performance to satisfy the demand of complex real-time applications. Parallelism is substantial. Fortunately, the graphics pipeline is well suited for parallelism, which in turn are applicable to many other computational domains. Throughput is more important than latency. GPU implementations of the graphics pipeline prioritize throughput over latency. Use multiple points, if necessary.

GPU ARCHITECTURE: A. The Graphics Pipeline The input to the GPU is a list of geometric primitives, typically triangles, in a 3-D world coordinate system. Through many steps, those primitives are shaded and mapped onto the screen where they are assembled to create a final picture. specific steps in the canonical pipeline: Vertex Operations: The input primitives are formed from individual vertices. Each vertex must be transformed into screen space and shaded, typically through computing their interaction with the lights in the scene. Primitive Assembly: The vertices are assembled into triangles, the fundamental hardware-supported primitive in today’s GPUs. Rasterization: Rasterization is the process of determining which screen- space pixel locations are covered by each triangle. Use brief bullets and discuss details verbally.

GPU ARCHITECTURE: A. The Graphics Pipeline specific steps in the canonical pipeline: Fragment Operations: Using color information from the vertices and possibly fetching additional data from global memory in the form of textures (images that are mapped onto surfaces), each fragment is shaded to determine its final color. Composition: Fragments are assembled into a final image with one color per pixel, usually by keeping the closest fragment to the camera for each pixel location. Historically, the operations available at the vertex and fragment stages were configurable but not programmable.

GPU ARCHITECTURE: B. Evolution of GPU Architecture The fixed-function pipeline lacked the generality to efficiently express more complicated shading and lighting operations that are essential for complex effects. The key step was replacing the fixed-function per-vertex operations with user-specified programs run on each vertex and fragment. Over the past years, these vertex programs and fragment programs have become increasingly more capable, with larger limits on their size and resource consumption, with more fully featured instruction sets, and with more flexible control-flow operations.

GPU ARCHITECTURE: C. Architecture of a Modern GPU the GPU is built for different application demands than the CPU: large, parallel computation requirements with an emphasis on throughput rather than latency. Consequently, the architecture of the GPU has progressed in a different direction than that of the CPU. In a pipeline, the output of each successive task is fed into the input of the next task. The pipeline exposes the task parallelism of the application, as data in multiple pipeline stages can be computed at the same time; within each stage, computing more than one element at the same time is data parallelism. GPU divides the resources of the processor among the different stages, such that the pipeline is divided in space, not time.

GPU ARCHITECTURE: C. Architecture of a Modern GPU This machine organization was highly successful in fixed-function GPUs for two reasons: First, the hardware in any given stage could exploit data parallelism within that stage, processing multiple elements at the same time. Secondly, each stage’s hardware could be customized with special- purpose hardware for its given task, allowing substantially greater compute and area efficiency over a general-purpose solution. For instance, the rasterization stage, which computes pixel coverage information for each input triangle, is more efficient when implemented in special-purpose hardware.

GPU ARCHITECTURE: C. Architecture of a Modern GPU In a CPU, any given operation may take on the order of 20 cycles between entering and leaving the CPU pipeline. On a GPU, a graphics operation may take thousands of cycles from start to finish. The latency of any given operation is long. However, the task and data parallelism across and between stages delivers high throughput The major disadvantage of the GPU task-parallel pipeline is load balancing. Like any pipeline, the performance of the GPU pipeline is dependent on its slowest stage. If the vertex program is complex and the fragment program is simple, overall throughput is dependent on the performance of the vertex program.

GPU ARCHITECTURE: C. Architecture of a Modern GPU AMD introduced the first unified shader architecture for modern GPUs in its Xenos GPU in the XBox 360 (2005). Today, both AMD’s and NVIDIA’s flagship GPUs feature unified shaders (Fig. 1). The benefit for GPU users is better load-balancing at the cost of more complex hardware. The benefit for GPGPU users is clear: with all the programmable power in a single hardware unit, GPGPU programmers can now target that programmable unit directly, rather than the previous approach of dividing work across multiple hardware units.

GPU ARCHITECTURE: C. Architecture of a Modern GPU

CASE STUDY: GAME PHYSICS Physics simulation occupies an increasingly important role in modern video games. Game players and developers seek environments that move and react in a physically plausible fashion, requiring immense computational resources. case study focuses on Havok FX (Fig. 2), a GPUaccelerated game physics package and one of the first successful consumer applications of GPU computing.

CASE STUDY: GAME PHYSICS

CASE STUDY: GAME PHYSICS Game physics takes many forms and increasingly includes articulated characters “Brag doll physics”, vehicle simulation, cloth, deformable bodies, and fluid simulation. We concentrate here on rigid body dynamics, which simulate solid objects moving under gravity and obeying Newton’s laws of motion and are probably the most important form of game physics today. Rigid body simulation typically incorporates three steps: integration, collision detection, and collision resolution.

CASE STUDY: GAME PHYSICS Integration: The integration step updates the objects’ velocities based on the applied forces (e.g., gravity, wind, player interactions) and updates the objects’ position based on the velocities. Collision detection: This step determines which objects are colliding after integration and their contact points. Collision detection must in principle compare each object with every other object a very expensive (O(n2)) proposition. In practice, most systems mitigate this cost by splitting collision detection into a broad phase and a narrow phase. The broad phase compares a simplified representation of the objects (typically their bounding boxes) to quickly determine potentially colliding pairs of objects. The narrow phase then accurately determines the pairs of objects that are actually colliding

CASE STUDY: GAME PHYSICS Collision resolution: Once collisions are detected, collision resolution applies impulses (instant transitory force) to the colliding objects so that they move apart. In 2005, Havok, the leading game physics middleware supplier, began researching new algorithms targeted at simulating tens of thousands of rigid bodies on parallel and ATI have worked with Havok to implement and optimize the system on the GPU. Several reasons argue for moving some physics simulation to the GPU. For instance, many games today are CPU-limited, and physics can easily consume 10% or more of CPU time.

CASE STUDY: GAME PHYSICS Performing physics on the GPU also enables direct rendering of simulation results from GPU memory, avoiding the need to transfer the positions and orientations of thousands or millions of objects from CPU to GPU each frame. Havok FX is a hybrid system, leveraging the strengths of the CPU and GPU. It stores the complete object state (position, orientation, linear and angular velocities) on the GPU, as well as a proprietary texture- based representation for the shapes of the objects.

CASE STUDY: GAME PHYSICS The CPU performs broad phase collision detection using a highly optimized sort and sweep algorithm after reading axis-aligned bounding boxes of each object back from the GPU each frame in a compressed format. The list of potential colliding pairs is then downloaded back to the GPU for the narrow phase. Both transfers consist of a relatively small amount of data which transfer quickly over the PCIe bus. The GPU performs all narrow phase collision detection and integration. Havok FX uses a simple Euler integrator with a fixed time step.

CASE STUDY: GAME PHYSICS The end result is an order of magnitude performance boost over Havok’s reference single-core CPU implementation. . Simulating a scene of 15 000 boulders rolling down a terrain, the CPU implementation (on a single core of an Intel 2.9 GHz Core 2 Duo) achieved 6.2 frames per second whereas the initial GPU implementation on an NVIDIA GeForce 8800 GTX reached 64.5 frames per second. Havok FX demonstrates the feasibility of building a hybrid system in which the CPU executes serial portions of the algorithm and the GPU executes data parallel portions. The overall performance of this hybrid system far exceeds a CPU-only system despite the frequent transfers between CPU and GPU

REFERENCES [1] John D. Owens, Mike Houston, David Luebke, Simon Green, John E. Stone, and James C. Phillips, Graphics Processing Units-powerful, programmable, and highly parallel-are increasingly targeting general-purpose computing applications.,[IEEE. Vol. 96, No. 5, May 2008 [2] M. Harris, BMapping computational concepts to GPUs,[ in GPU Gems 2, M. Pharr, Ed. Reading, MA: Addison-Wesley, Mar. 2005, pp. 493–508. [3] M. McCool, BData-parallel programming on the cell BE and the GPU using the RapidMind development platform,[ in Proc. GSPx Multicore Applicat. Conf., Oct.– Nov. 2006. [4] P. M. Hubbard, BCollision detection for interactive graphics applications,[ IEEE Trans. Vis. Comput. Graphics, vol. 1, no. 3, pp. 218–230, 1995. [5] B. Bustos, O. Deussen, S. Hiller, and D. Keim, BA graphics hardware accelerated algorithm for nearest neighbor search,[ in Proc. 6th Int. Conf. Comput. Sci., May 2006, vol. 3994, pp. 196–199, Lecture Notes in Computer Science.