Xbox 360 512 MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.

Slides:



Advertisements
Similar presentations
4/6/ :35 AM V ©2004 Microsoft Corporation. All rights reserved.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Normal Map Compression with ATI 3Dc™ Jonathan Zarge ATI Research Inc.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
G30™ A 3D graphics accelerator for mobile devices Petri Nordlund CTO, Bitboys Oy.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Computer Organization and Architecture
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
OS Case Study: The Xbox 360  Instructor: Rob Nash  Readings: See citations in the slides.
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Status – Week 277 Victor Moya.
Chapter 7 Interupts DMA Channels Context Switching.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
1 A Single (Unified) Shader GPU Microarchitecture for Embedded Systems Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Department of Computer.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Modern GPU Architecture
Graphics processors Norm Rubin – compiler architect –
AMD’s ATI GPU Radeon R700 (HD 4xxx) series Elizabeth Soechting David Chang Jessica Vasconcellos 1 CS 433 Advanced Computer Architecture May 7, 2008.
Comparison of Next Generation Gaming Architectures Presented By Dela Tsiagbe Presented By Dela Tsiagbe.
Emotion Engine A look at the microprocessor at the center of the PlayStation2 gaming console Charles Aldrich.
High Performance in Broad Reach Games Chas. Boyd
Background image by chromosphere.deviantart.com Fella in following slides by devart.deviantart.com DM2336 Programming hardware shaders Dioselin Gonzalez.
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
Enhancing GPU for Scientific Computing Some thoughts.
Next-Generation Consoles Brenden Schubert & Nathaniel Williams With a Special talk by John F. Rhoads on Video Game Ethics.
Computer Graphics Graphics Hardware
Geometric Objects and Transformations. Coordinate systems rial.html.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
4/23/2017 4:23 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.
The programmable pipeline Lecture 3.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Chapter 16 Micro-programmed Control
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Sony PlayStation 3 Sony also laid out the technical specs of the device. The PlayStation 3 will feature the much-vaunted Cell processor, which will run.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS The GPU.
Computer Graphics Graphics Hardware
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Petri Nordlund Chief Architect Bitboys Oy
Petri Nordlund Chief Architect Bitboys Oy
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
Normal Map Compression with ATI 3Dc™
Chapter 6 GPU, Shaders, and Shading Languages
CSC 2231: Parallel Computer Architecture and Programming GPUs
Graphics Hardware CMSC 491/691.
NVIDIA Fermi Architecture
Graphics Processing Unit
Computer Graphics Graphics Hardware
UMBC Graphics for Games
RADEON™ 9700 Architecture and 3D Performance
Chapter 11 Processor Structure and function
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk

The Xbox 360 GPU Custom silicon designed by ATi Technologies Inc. 500 MHz, 338 million transistors, 90nm process Supports vertex and pixel shader version 3.0+ Includes some Xbox 360 extensions

The Xbox 360 GPU 10 MB embedded DRAM (EDRAM) for extremely high-bandwidth render targets Alpha blending, Z testing, multisample antialiasing are all free (even when combined) Hierarchical Z logic and dedicated memory for early Z/stencil rejection GPU is also the memory hub for the whole system 22.4 GB/sec to/from system memory

More About the Xbox 360 GPU 48 shader ALUs shared between pixel and vertex shading (unified shaders) Each ALU can co-issue one float4 op and one scalar op each cycle Non-traditional architecture 16 texture samplers Dedicated Branch instruction execution

More About the Xbox 360 GPU 2x and 4x hardware multi-sample anti- aliasing (MSAA) Hardware tessellator N-patches, triangular patches, and rectangular patches Can render to 4 render targets and a depth/stencil buffer simultaneously

GPU: Work Flow Consumes instructions and data from a command buffer Ring buffer in system memory Managed by Direct3D, user configurable size (default 2 MB) Supports indirection for vertex data, index data, shaders, textures, render state, and command buffers Up to 8 simultaneous contexts in-flight at once Changing shaders or render state is inexpensive, since a new context can be started up easily

GPU: Work Flow Threads work on units of 64 vertices or pixels at once Dedicated triangle setup, clipping, etc. Pixels processed in 2x2 quads Back buffers/render targets stored in EDRAM Alpha, Z, stencil test, and MSAA expansion done in EDRAM module EDRAM contents copied to system memory by “resolve” hardware

GPU: Operations Per Clock Write 8 pixels or 16 Z-only pixels to EDRAM With MSAA, up to 32 samples or 64 Z-only samples Reject up to 64 pixels that fail Hierarchical Z testing Vertex fetch sixteen 32-bit words from up to two different vertex streams

GPU: Operations Per Clock 16 bilinear texture fetches 48 vector and scalar ALU operations Interpolate 16 float4 shader interpolants 32 control flow operations Process one vertex, one triangle Resolve 8 pixels to system memory from EDRAM

GPU: Hierarchical Z Rough, low-resolution representation of Z/stencil buffer contents Provides early Z/stencil rejection for pixel quads 11 bits of Z and 1 bit of stencil per block

GPU: Hierarchical Z NOT tied to compression EDRAM BW advantage Separate memory buffer on GPU Enough memory for 1280x720 2x MSAA Provides a big performance boost when drawing complex scenes Draw opaque objects front to back

GPU: Textures 16 bilinear texture samples per clock 64bpp runs at half rate, 128bpp at quarter rate Trilinear at half rate Unlimited dependent texture fetching DXT decompression has 32 bit precision Better than Xbox (16-bit precision)

GPU: Resolve Copies surface data from EDRAM to a texture in system memory Required for render-to-texture and presentation to the screen Can perform MSAA sample averaging or resolve individual samples Can perform format conversions and biasing

Direct3D 9+ on Xbox 360 Similar API to PC Direct3D 9.0 Optimized for Xbox 360 hardware No abstraction layers or drivers—it’s direct to the metal Exposes all Xbox 360 custom hardware features New state enums New APIs for finer-grained control and completely new features

Direct3D 9+ on Xbox 360 Communicates with GPU via a command buffer Ring buffer in system memory Direct Command Buffer Playback support

Direct3D: Command Buffer Ring buffer that allows the CPU to safely send commands to the GPU Buffer is filled by CPU, and the GPU consumes the data CPU Write Pointer GPU Read Pointer Code Execution Draw Rendering

Shaders Two options for writing shaders HLSL (with Xbox 360 extensions) GPU microcode (specific to the Xbox 360 GPU, similar to assembly but direct to hardware) Recommendation: Use HLSL Easy to write and maintain Replace individual shaders with microcode if performance analysis warrants it