Status – Week 272 Victor Moya. Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

Central Processing Unit
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Choice for the rest of the semester New Plan –assembler and machine language –Operating systems Process scheduling Memory management File system Optimization.
Status – Week 265 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Status – Week 270 Victor Moya. Summary ShaderEmulator. ShaderEmulator. ShaderSimulator. ShaderSimulator. Schedule. Schedule. Name. Name. Projects. Projects.
Status – Week 275 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. Parameters: wires in, wires out, child.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Status – Week 245 Victor Moya. Summary Streamer Streamer Creditos investigación. Creditos investigación.
Status – Week 266 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
Basic Computer Organization CH-4 Richard Gomez 6/14/01 Computer Science Quote: John Von Neumann If people do not believe that mathematics is simple, it.
Chapter 6 Memory and Programmable Logic Devices
Computer Science 210 Computer Organization The Instruction Execution Cycle.
Intro to Java The Java Virtual Machine. What is the JVM  a software emulation of a hypothetical computing machine that runs Java bytecodes (Java compiler.
System Calls 1.
Micro-operations Are the functional, or atomic, operations of a processor. A single micro-operation generally involves a transfer between registers, transfer.
Computer Organization Computer Organization & Assembly Language: Module 2.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Chapter 4 The Von Neumann Model
HW-Accelerated HD video playback under Linux Zou Nan hai Open Source Technology Center.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Cis303a_chapt04.ppt Chapter 4 Processor Technology and Architecture Internal Components CPU Operation (internal components) Control Unit Move data and.
CSCI 211 Intro Computer Organization –Consists of gates for logic And Or Not –Processor –Memory –I/O interface.
Introduction to Computer Engineering CS/ECE 252, Fall 2009 Prof. Mark D. Hill Computer Sciences Department University of Wisconsin – Madison.
Computer Architecture Memory, Math and Logic. Basic Building Blocks Seen: – Memory – Logic & Math.
Fetch-execute cycle.
8085. Microcomputer Major components of the computer - the processor, the control unit, one or more memory ICs, one or more I/O ICs, and the clock Major.
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
The von Neumann Model – Chapter 4 COMP 2620 Dr. James Money COMP
Von Neumann Model Computer Organization I 1 September 2009 © McQuain, Feng & Ribbens The Stored Program Computer 1945: John von Neumann –
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Overview von Neumann Architecture Computer component Computer function
Computer Organization CS345 David Monismith Based upon notes by Dr. Bill Siever and notes from the Patterson and Hennessy Text.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Chapter 20 Computer Operations Computer Studies Today Chapter 20.
STUDY OF PIC MICROCONTROLLERS.. Design Flow C CODE Hex File Assembly Code Compiler Assembler Chip Programming.
Basic Processor Structure/design
Chapter 4 The Von Neumann Model
Micro-Operations A computer executes a program Fetch/execute cycle
Chapter 4 The Von Neumann Model
William Stallings Computer Organization and Architecture
Introduction to Computer Engineering
Computer Architecture
Chapter 4 The Von Neumann Model
Processor (I).
The fetch-execute cycle
Instruction cycle Instruction: A command given to the microprocessor to perform an operation Program : A set of instructions given in a sequential.
Computer Science 210 Computer Organization
Instruction and Control II
VLIW DSP vs. SuperScalar Implementation of a Baseline H.263 Encoder
8085 MICROPROCESSOR 8085 CPU Registers and Status Flags S Z AC P C A B
Computer Architecture
Chapter 4 The Von Neumann Model
The Stored Program Computer
CS510 Operating System Foundations
Instruction execution and ALU
Introduction to Computer Engineering
Computer Architecture
Introduction to Computer Engineering
Introduction to Computer Engineering
Introduction to Computer Engineering
Prof. Onur Mutlu Carnegie Mellon University
Chapter 4 The Von Neumann Model
Presentation transcript:

Status – Week 272 Victor Moya

Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented with a FP array (3DLabs P10). Multithreaded?? Implemented with a FP array (3DLabs P10). Dynamic branching. Dynamic branching. No texture/vertx buffer load. No texture/vertx buffer load. No vertex kill. No vertex kill.

Vertex Shader

Shader Model Mono/Multithreaded Shader based in NV30 instruction set. Mono/Multithreaded Shader based in NV30 instruction set. A Shader is a stream processor: A Shader is a stream processor: Input Stream => Input Register Bank Input Stream => Input Register Bank 16 registers in a Vertex Shader 16 registers in a Vertex Shader 12 registers in Pixel Shader 12 registers in Pixel Shader Output Stream => Output Register Bank Output Stream => Output Register Bank ~16 registers in Vertex Shader ~16 registers in Vertex Shader ~4 registers in Pixel Shader ~4 registers in Pixel Shader Constant Memory/Register Bank Constant Memory/Register Bank Up to 256 in Vertex Shader Up to 256 in Vertex Shader

Shader Model Instruction Cache/Memory Instruction Cache/Memory Up to 256 in Vertex Shader Up to 256 in Vertex Shader 1024 in Pixel Shader 1024 in Pixel Shader Shared between different processors (?) Shared between different processors (?) Temporary and Auxiliary Registers Temporary and Auxiliary Registers 16 (Vertex Shader), 32/64 (Pixel Shader) 16 (Vertex Shader), 32/64 (Pixel Shader) Address Registers Address Registers Condition Code Register Condition Code Register Boolean Register Boolean Register Loop counters Loop counters etc. etc.

Shader Model Multithreaded: Multithreaded: numThreads: Number of streams that the shader can store. Includes idle and loading/unloading threads. Structures affected: Input and Output register banks. numThreads: Number of streams that the shader can store. Includes idle and loading/unloading threads. Structures affected: Input and Output register banks. numActiveThreads: Number of active (in execution) threads. Structures affected: temporary and auxiliary registers. PC table (in the Simulator Box). numActiveThreads: Number of active (in execution) threads. Structures affected: temporary and auxiliary registers. PC table (in the Simulator Box). Constant/Parameter Memory and Instruction Cache/Memory shared between all the threads. It is also shared between different Shaders (but this isn’t provided with the current model). Constant/Parameter Memory and Instruction Cache/Memory shared between all the threads. It is also shared between different Shaders (but this isn’t provided with the current model).

Test Model Three boxes: Three boxes: Loader: gets commands (input stream, new programs and parameters) from a file. Loader: gets commands (input stream, new programs and parameters) from a file. Fetch: fetch instructions from a Shader program memory. Fetch: fetch instructions from a Shader program memory. Decode/Execute: decodes and executes instructions, takes into account dependencies. Decode/Execute: decodes and executes instructions, takes into account dependencies. Writer: receives output stream and writes it in a file. Writer: receives output stream and writes it in a file.

Test Model Wires: Wires: Command: sends commands read from the input file to the fetch box. Latency varies for each kind of command and the data size. Command: sends commands read from the input file to the fetch box. Latency varies for each kind of command and the data size. New Shader Program: loads new instructions. New Shader Program: loads new instructions. New Shader Parameters: loads new parameters in constant memory. New Shader Parameters: loads new parameters in constant memory. New Input: sends a new input (Vertex Input 16 4D registers). New Input: sends a new input (Vertex Input 16 4D registers). Sync: for synchronization between Loader and Fetch (execution of a Shader Program depends from the Shader Output with the dynamic branch model). Latency 1. Sync: for synchronization between Loader and Fetch (execution of a Shader Program depends from the Shader Output with the dynamic branch model). Latency 1.

Test Model Wires: Wires: Instruction: Fetch send new instructions to Decode/Execute. Instruction EXIT marks end of Shader Program (Decode/Execute send Output to Writer). Latency 1. Instruction: Fetch send new instructions to Decode/Execute. Instruction EXIT marks end of Shader Program (Decode/Execute send Output to Writer). Latency 1. NewPC: Fetch recieves control flow changes from Decode/Execute. Latency 1. NewPC: Fetch recieves control flow changes from Decode/Execute. Latency 1. Execute: Drives execution latency for each instruction. Variable latency (1 – 5?). Execute: Drives execution latency for each instruction. Variable latency (1 – 5?). Output: Decode/Execute sends the Shader Program result for the current output to the logger box (Writer). Latency constant but greater than 1 (4 or 5?). Output: Decode/Execute sends the Shader Program result for the current output to the logger box (Writer). Latency constant but greater than 1 (4 or 5?).

Test Model Instruction Set: Instruction Set: Encoding in 128 bits. See file. Encoding in 128 bits. See file. Emulation: Emulation: Separate library: ShaderEmulator. Separate library: ShaderEmulator.

ShaderEmulator Performs the functional emulation of the shader: Performs the functional emulation of the shader: Instruction (static) management and execution. Instruction (static) management and execution. Keeps the shader state. Keeps the shader state. Implementation: Implementation: Support for differnt MODELS?: VS1, VS2, PS1, PS2. Support for differnt MODELS?: VS1, VS2, PS1, PS2. How to implement models? Different classess? Switch/case? How to implement models? Different classess? Switch/case? Where to keep structures related with control flow? Ex: stack, PC table. Where to keep structures related with control flow? Ex: stack, PC table.

ShaderEmulator Interface: Interface: ShaderEmulator(numThreads, numActiveThreads, shaderModel) ShaderEmulator(numThreads, numActiveThreads, shaderModel) LoadShaderProgram(code) LoadShaderProgram(code) ResetShaderState(numThread) ResetShaderState(numThread) ReadShaderState(numThread, data) ReadShaderState(numThread, data) LoadShaderState(numThread, data) LoadShaderState(numThread, data) ExecuteShaderInstruction(numThread, PC) ExecuteShaderInstruction(numThread, PC)

ShaderInstruction Decoded shader instruction. Decoded shader instruction. What to do with shader models? Invalid instructions in different models. What to do with shader models? Invalid instructions in different models. Interface: Interface: ShaderInstruction(code) ShaderInstruction(code) Different functions/attributes to get decoded information from the instruction (input registers, output registers, mask, swizzle, condition codes, etc.). Different functions/attributes to get decoded information from the instruction (input registers, output registers, mask, swizzle, condition codes, etc.).

ShaderExecInstruction Stores a instance of an instruction that is being executed. Stores a instance of an instruction that is being executed. Carries information about the execution: Carries information about the execution: ShaderInstruction: decoded instruction. ShaderInstruction: decoded instruction. PC: instruction memory address. PC: instruction memory address. state: decode/execution/writeback/locked/… state: decode/execution/writeback/locked/… result: result of the instruction. result: result of the instruction. startCycle: cycle in which the instruction was fetched. startCycle: cycle in which the instruction was fetched. Other statistics? Other statistics?

ShaderExecInstruction Implementation: Implementation: Avoid dynamic creation of objects. Avoid dynamic creation of objects. Static pool. Static pool. Created at fetch, destroyed at decode/execute (writeback). Created at fetch, destroyed at decode/execute (writeback). Can be managed by the own ShaderExecInstruction class? (static). Can be managed by the own ShaderExecInstruction class? (static).

Test Model

Code Management Directory structure: Directory structure: /emu (or /emulator): functional emulation classes and functions. /emu (or /emulator): functional emulation classes and functions. /sim (or /simulator): simulation classes and functions. /sim (or /simulator): simulation classes and functions. /support: support functions (IO, Types, etc.). /support: support functions (IO, Types, etc.).