Status – Week 266 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.

Slides:



Advertisements
Similar presentations
Machine cycle.
Advertisements

Fetch Execute Cycle – In Detail -
Central Processing Unit
CS364 CH16 Control Unit Operation
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.
Arithmetic Logic Unit (ALU)
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
A look at interrupts What are interrupts and why are they needed.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
1 Tomasulo’s Algorithm and IBM 360 Srivathsan Soundararajan.
Computer Organization. This module surveys the physical resources of a computer system. –Basic components CPUMemoryBus I/O devices –CPU structure Registers.
Status – Week 229 Victor Moya. Summary Simulator parameters. Simulator parameters. Hierarchical Z-Buffer. Hierarchical Z-Buffer.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Status – Week 274 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. A box can only access its own data, external.
Status – Week 206 Victor Moya. Summary Fetch Cache. Fetch Cache. ColorCache. ColorCache. ColorWrite. ColorWrite. Next week. Next week.
Status – Week 259 Victor Moya. Summary OpenGL Traces. OpenGL Traces. DirectX Traces. DirectX Traces. Proxy CPU. Proxy CPU. Command Processor. Command.
Status – Week 243 Victor Moya. Summary Current status. Current status. Tests. Tests. XBox documentation. XBox documentation. Post Vertex Shader geometry.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Status – Week 230 Victor Moya. Summary Simulator parameters. Simulator parameters. Oclusion culling (Z-Buffer). Oclusion culling (Z-Buffer). To be done.
Status – Week 248 Victor Moya. Summary Streamer. Streamer. TraceDriver. TraceDriver. bGPU bGPU Signal Traffic Analyzer. Signal Traffic Analyzer. How to.
Status – Week 265 Victor Moya. Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage.
A look at interrupts What are interrupts and why are they needed.
Status – Week 270 Victor Moya. Summary ShaderEmulator. ShaderEmulator. ShaderSimulator. ShaderSimulator. Schedule. Schedule. Name. Name. Projects. Projects.
Status – Week 264 Victor Moya. Summary Doctorado. Doctorado. Credits Recerca. Credits Recerca. GPU design GPU design PS2 PS2 PS3 PS3 Imagine Imagine NV30.
Status – Week 272 Victor Moya. Vertex Shader VS 2.0+ (NV30) based Vertex Shader model. VS 2.0+ (NV30) based Vertex Shader model. Multithreaded?? Implemented.
Lecture 16: Basic CPU Design
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Status – Week 275 Victor Moya. Simulator model Boxes. Boxes. Perform the actual work. Perform the actual work. Parameters: wires in, wires out, child.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
Status – Week 245 Victor Moya. Summary Streamer Streamer Creditos investigación. Creditos investigación.
Pipelining By Toan Nguyen.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
System Calls 1.
Computer Organization Computer Organization & Assembly Language: Module 2.
Week 2.  Understand what the processor is and what it does.  Execute basic LMC programs.  Understand how CPU characteristics affect performance.
1 Web Based Programming Section 8 James King 12 August 2003.
Dynamic Pipelines. Interstage Buffers Superscalar Pipeline Stages In Program Order In Program Order Out of Order.
Lecture 14 Today’s topics MARIE Architecture Registers Buses
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
J. Stover, CSD-HS.  A computer is an electronic device that is programmed to accept data (input), process it into useful information (output), and store.
1 Lecture 6 Tomasulo Algorithm CprE 581 Computer Systems Architecture, Fall 2009 Zhao Zhang Reading:Textbook 2.4, 2.5.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
Overview von Neumann Architecture Computer component Computer function
Input Output Techniques Programmed Interrupt driven Direct Memory Access (DMA)
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
PROCESS MANAGEMENT IN MACH
PowerPC 604 Superscalar Microprocessor
Lecture: Out-of-order Processors
Commit out of order Phd student: Adrián Cristal.
Figure 8.1 Architecture of a Simple Computer System.
Out of Order Processors
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 11: Memory Data Flow Techniques
Ka-Ming Keung Swamy D Ponpandi
Figure 8.1 Architecture of a Simple Computer System.
Lecture 8: Dynamic ILP Topics: out-of-order processors
Why we have Counterintuitive Memory Models
Lecture 9: Dynamic ILP Topics: out-of-order processors
Objectives Describe common CPU components and their function: ALU Arithmetic Logic Unit), CU (Control Unit), Cache Explain the function of the CPU as.
Computer Architecture
Conceptual execution on a processor which exploits ILP
Ka-Ming Keung Swamy D Ponpandi
Microprocessor Lecture 7 (8086 Registers).
Presentation transcript:

Status – Week 266 Victor Moya

Summary ShaderEmulator ShaderEmulator ShaderFetch ShaderFetch ShaderDecodeExecute ShaderDecodeExecute Communication storage classes Communication storage classes GPU design GPU design PS2 PS2 PS3 PS3 Imagine Imagine

ShaderEmulator Decoded information for emulation moved from ShaderEmulator to ShaderInstruction. Decoded information for emulation moved from ShaderEmulator to ShaderInstruction. Used macros for shader instructions. Used macros for shader instructions. Functions renamed. Functions renamed. Not tested yet. Not tested yet. Shader assembler? Shader assembler?

ShaderFetch Parameters: Parameters: numThreads numThreads numActiveThreads (runnable?) numActiveThreads (runnable?) issueRate (fetch width?) issueRate (fetch width?) retireRate (unblock rate?) retireRate (unblock rate?) Issues N instructions from cycle from succesive ‘threads’ (inputs?). Issues N instructions from cycle from succesive ‘threads’ (inputs?). Threads can be ready (a new instruction can be fetched) or blocked (waiting for ‘retire’ command from decode). Threads can be ready (a new instruction can be fetched) or blocked (waiting for ‘retire’ command from decode).

ShaderFetch Non runnable threads act as buffer for input refill. Non runnable threads act as buffer for input refill. Finished threads act as buffers for output (consumes runnable threads). Finished threads act as buffers for output (consumes runnable threads). Free threads are get first from runnable, later from input buffer. Free threads are get first from runnable, later from input buffer.

ShaderFetch Receives control from Decode/Execute: Receives control from Decode/Execute: PC updates. PC updates. Retire/Unblock instruction/thread. Retire/Unblock instruction/thread. End of thread: ready for output dump. End of thread: ready for output dump. Output dump to next unit? Output dump to next unit? ShaderFetch sends it and next unit acks? ShaderFetch sends it and next unit acks? Next unit asks data (poll), ShaderFetch sends it? Next unit asks data (poll), ShaderFetch sends it? Handle in decode/execute? Handle in decode/execute?

ShaderFetch Ignores commands from Command Processor if it can not be executed: Ignores commands from Command Processor if it can not be executed: Wait for a free input thread. Wait for a free input thread. Wait for all threads to end. Wait for all threads to end. Ack to Command Processor when a command is executed. Ack to Command Processor when a command is executed.

ShaderDecodeExecute Parameters: Parameters: numThreads: runnable threads only!!! numThreads: runnable threads only!!! issueRate: number of instructions received/launched per cycle? issueRate: number of instructions received/launched per cycle? retireRate: number of instruction ‘finished’ per cycle? retireRate: number of instruction ‘finished’ per cycle? Receives new instructions from ShaderFetch. Receives new instructions from ShaderFetch. Check dependeces: Check dependeces: Address (QI) bank. Address (QI) bank. Temp. Register bank (QF). Temp. Register bank (QF). Flags. Flags.

ShaderDecodeExecute Instruction Queues sopported? Instruction Queues sopported? Out-of-order supported? Out-of-order supported? Store information about in-fly instructions? Store information about in-fly instructions? Latency is variable? Latency is variable? Load/Store supported? Load/Store supported?

Communication storage Communication between boxes: Communication between boxes: ShaderExecInstruction ShaderExecInstruction ShaderCommand ShaderCommand ShaderDecodeCommand ShaderDecodeCommand Dynamic: creating/destruction. Dynamic: creating/destruction. Class model or struct model? Class model or struct model? Inherit from a ‘dynamic data’ class. Inherit from a ‘dynamic data’ class. Modified new/delete implementation. Modified new/delete implementation.

GPU design Target architecture? Target architecture? NV30 NV30 DX9 DX9 DX10 DX10 OpenGL2 OpenGL2 PS3 PS3 Imagine Imagine Are we really going for it? Are we really going for it? Do we really know what we are doing? Do we really know what we are doing?

PS2 I got the EE, VU and GS programming manuals :). I got the EE, VU and GS programming manuals :).

PS3 Sony patent. Sony patent. I haven’t read it yet. I haven’t read it yet.

Imagine ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. ‘Computer Graphics on a Stream Architecture’, John Douglas Owens, PhD dissertation. Not read yet either. Not read yet either.