Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
CS 352: Computer Graphics Chapter 7: The Rendering Pipeline.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
G30™ A 3D graphics accelerator for mobile devices Petri Nordlund CTO, Bitboys Oy.
Computer Graphic Creator: Mohsen Asghari Session 2 Fall 2014.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Status – Week 228 Victor Moya. Summary Hierarchical Z-Buffer. Hierarchical Z-Buffer.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
Status – Week 250 Victor Moya. Summary Current State. Current State. Next Tasks. Next Tasks. Future Work. Future Work. Creditos investigación. Creditos.
Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
Status – Week 259 Victor Moya. Summary OpenGL Traces. OpenGL Traces. DirectX Traces. DirectX Traces. Proxy CPU. Proxy CPU. Command Processor. Command.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
3D Graphics Processor Architecture Victor Moya. PhD Project Research on architecture improvements for future Graphic Processor Units (GPUs). Research.
Status – Week 231 Victor Moya. Summary Primitive Assembly Primitive Assembly Clipping triangle rejection. Clipping triangle rejection. Rasterization.
Status – Week 242 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
Status – Week 277 Victor Moya.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
1 A Single (Unified) Shader GPU Microarchitecture for Embedded Systems Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Department of Computer.
Status – Week 265 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Status – Week 264 Victor Moya. Summary Doctorado. Doctorado. Credits Recerca. Credits Recerca. GPU design GPU design PS2 PS2 PS3 PS3 Imagine Imagine NV30.
Status – Week 276 Victor Moya. Hardware Pipeline Command Processor. Command Processor. Vertex Shader. Vertex Shader. Rasterization. Rasterization. Pixel.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Status – Week 279 Victor Moya. Rasterization Setup triangles (calculate slope values). Setup triangles (calculate slope values). Fill triangle: Interpolate.
Status – Week 240 Victor Moya. Summary Post Geometry Pipeline. Post Geometry Pipeline. Rasterization. Rasterization. Triangle Setup. Triangle Setup. Triangle.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Status – Week 266 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
COOL Chips IV A High Performance 3D Graphics Rasterizer with Effective Memory Structure Woo-Chan Park, Kil-Whan Lee*, Seung-Gi Lee, Moon-Hee Choi, Won-Jong.
High Performance in Broad Reach Games Chas. Boyd
© Copyright Khronos Group, Page 1 Harnessing the Horsepower of OpenGL ES Hardware Acceleration Rob Simpson, Bitboys Oy.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
OpenGL 3.0 Texture Arrays Presentation: Olivia Terrell, Dec. 4, 2008.
CHAPTER 4 Window Creation and Control © 2008 Cengage Learning EMEA.
Enhancing GPU for Scientific Computing Some thoughts.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Interactive Time-Dependent Tone Mapping Using Programmable Graphics Hardware Nolan GoodnightGreg HumphreysCliff WoolleyRui Wang University of Virginia.
1 ATTILA: A Cycle-Level Execution-Driven Simulator for Modern GPU Architectures Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca,
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
Tone Mapping on GPUs Cliff Woolley University of Virginia Slides courtesy Nolan Goodnight.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
A Crash Course on Programmable Graphics Hardware
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
Graphics Processing Unit
RADEON™ 9700 Architecture and 3D Performance
OpenGL-Rendering Pipeline
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Status – Week 260 Victor Moya

Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.

shSim Currently working: Currently working: Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). Command Processor: reads a text based trace file (programs, parameters, vertexs, commands to rasterizer). Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader. Shader: simulates a N multithreaded, variable latency support, VS1 capable ‘vertex’ shader. Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays the triangles in a GL window. Rasterizer: OpenGL ‘emulator’, accepts resolution and clip planes changes, recieves ‘shaded’ vertexs from the shader (only 2 QuadFloats, vertex positon + color), displays the triangles in a GL window.

shSim Tests: Tests: 2/4 multithread (with another 2/4 input buffers) single shader. 2/4 multithread (with another 2/4 input buffers) single shader. Fixed 3 latency cycles. Shader to Rasterizer latency of 4. CommandProcessor to Rasterizer latency of 6. Fixed 3 latency cycles. Shader to Rasterizer latency of 4. CommandProcessor to Rasterizer latency of 6. Simple coordinate change traces (shader.input, shader.input.2). Simple coordinate change traces (shader.input, shader.input.2). Ripple vertex shader example from DX8 & DX9 SDK (ripple.input): Ripple vertex shader example from DX8 & DX9 SDK (ripple.input): Around 300 triangles (1100 vertexs). Around 300 triangles (1100 vertexs). Color is calculated from vertex position. Color is calculated from vertex position.

shSim Ripple.vsh. Ripple.vsh.

shSim Screenshots from frames rendered by shSim: Screenshots from frames rendered by shSim:

GPU Architecture Based in current GPUs: Based in current GPUs: NV30 NV30 R300 R300 Based in other graphic processors: Based in other graphic processors: PS3 PS3 Imagine Imagine

GPU Architecture Based in an API: Based in an API: DX8 DX8 DX9 DX9 DX10 DX10 OpenGL 1.4 and extensions. OpenGL 1.4 and extensions. OpenGL 2.0 OpenGL 2.0 Based in an architecture model: Based in an architecture model: Vector Vector Scalar Scalar Multithreaded Multithreaded

GPU Specification Shader Model: Shader Model: Language: Language: DX9: DX9: –VS2.0/PS2.0. –VS3.0/PS3.0. OpenGL: OpenGL: –NV_vertex_program_2/NV_fragment_program. –ARB_vertex_program/ARB_fragment_program. Our own language. Our own language.

GPU Specification Shader Architecture: Shader Architecture: Architectural model: Architectural model: Scalar. Scalar. SIMD. SIMD. Multithreaded. Multithreaded. Vector. Vector. Out-of-order. Out-of-order.

GPU Specification Configuration: Configuration: Integer Unit: Integer Unit: –Number. –Precission. –SIMD or scalar? Float Point Unit: Float Point Unit: –Number. –Precission. –SIMD or scalar?

GPU Specification Memory Unit: Memory Unit: –Number. –Texture modes. –Filtering modes. Register Banks: Register Banks: –Number. –Ports. –Size. –Scalar or SIMD?

XBOX (NV2A) Vertex Shader

Future Work Shader: Shader: Add branch/call/ret instructions. Add branch/call/ret instructions. Add texture instructions (Pixel Shader). Add texture instructions (Pixel Shader). Command Processor: Command Processor: Define a trace specification: binary, gzipped? Define a trace specification: binary, gzipped? Define an interface with OpenGL (Mesa?) or DX8/DX9 (driver?). Define an interface with OpenGL (Mesa?) or DX8/DX9 (driver?). Primitive Assembly: Primitive Assembly: Implement vertex cache and primitive assembly (only triangles?). Implement vertex cache and primitive assembly (only triangles?). Implement culling and clipping? Implement culling and clipping?

Future Work Deferred rendering? Deferred rendering? Transformed geometry must be stored in video memory. Transformed geometry must be stored in video memory. Geometry must be sorted: Geometry must be sorted: Tiles. Tiles. Front to back. Front to back. Rasterization: Rasterization: Triangle Setup and Fragment Generation. Triangle Setup and Fragment Generation. Any suited method: Olano & Greer, DDA?. Any suited method: Olano & Greer, DDA?. MSAA support? MSAA support?

Future Work Early Z and Hierarchical Z? Pixel Shader: Early Z and Hierarchical Z? Pixel Shader: Implement unified with vertex shaders? Implement unified with vertex shaders? Queue/buffering mechanism? (memory/texture latency very large). Queue/buffering mechanism? (memory/texture latency very large). Pixel Shader: Pixel Shader: Unified shader architecture? Unified shader architecture? Pixels need a lot of buffering (memory/texture operations). Pixels need a lot of buffering (memory/texture operations). Implement a TMU simulator (filter algorithms, memory access, texture compression, cache). Implement a TMU simulator (filter algorithms, memory access, texture compression, cache).

Future Work Fixed fragment operations: Fixed fragment operations: Implement using the shader? Implement using the shader? Fog: remove? Fog: remove? Pixel Ownership: remove? Pixel Ownership: remove? Scissor Test: implement (needed if clipping is not implemented). Scissor Test: implement (needed if clipping is not implemented). Alpha test: same as Z Test. Alpha test: same as Z Test. Z Test and Stencil Test: must be implemented, but could be added to a generic shader unit? Z Test and Stencil Test: must be implemented, but could be added to a generic shader unit? Blending: add to shader? Blending: add to shader? Dithering: remove. Dithering: remove. Logical Op: remove or add to shader. Logical Op: remove or add to shader. MSAA Operations: ? MSAA Operations: ?

Future Work Framebuffer: Framebuffer: Z compression. Z compression. Color compression. Color compression. SSAA or MSAA support? SSAA or MSAA support?

News and Rumors NV30 architecture: NV30 architecture: 4x2 pixel pipes? 4x2 pixel pipes? 8x zixel pipes (Z Test & Stencil only). 8x zixel pipes (Z Test & Stencil only). ATI ready to release R350 and RV350 in a couple of weeks. ATI ready to release R350 and RV350 in a couple of weeks. R350: Updated R300 core with additional features (?) and increased clock frequency (375 – 400 MHz). R350: Updated R300 core with additional features (?) and increased clock frequency (375 – 400 MHz). RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors. RV350: value chip based in R300 core. Maybe 8x1 core, 128 bits bus. Clock frequency 300 – 400 MHz. 75 Million transistors.

Imagine ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. Not read yet either. Not read yet either.