Out-of-order Execution Divider Sanmukh Kuppannagari.

Slides:



Advertisements
Similar presentations
1/1/ / faculty of Electrical Engineering eindhoven university of technology Speeding it up Part 3: Out-Of-Order and SuperScalar execution dr.ir. A.C. Verschueren.
Advertisements

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
Dynamic Memory Allocation (also see pointers lectures) -L. Grewe.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Dyn. Sched. CSE 471 Autumn 0219 Tomasulo’s algorithm “Weaknesses” in scoreboard: –Centralized control –No forwarding (more RAW than needed) Tomasulo’s.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
Microarchitecture of Superscalars (7) Preserving sequential consistency Dezső Sima Fall 2007 (Ver. 2.0)  Dezső Sima, 2007.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 14, 2002 Topic: Instruction-Level Parallelism (Multiple-Issue, Speculation)
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
CPE 731 Advanced Computer Architecture ILP: Part IV – Speculative Execution Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
DATAFLOW ARHITEKTURE. Dataflow Processors - Motivation In basic processor pipelining hazards limit performance –Structural hazards –Data hazards due to.
EECS 470 Lecture 7 Branches: Address prediction and recovery (And interrupt recovery too.)
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
DMA Direct memory access  Problems with programmed I/O  Processor wastes time polling  In our example I.Waiting for a key to be pressed, II.Waiting.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
1 Computer System Overview OS-1 Course AA
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
1 IBM System 360. Common architecture for a set of machines. Robert Tomasulo worked on a high-end machine, the Model 91 (1967), on which they implemented.
1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.
CS533 - Concepts of Operating Systems
Computer Organization and Architecture
1 Lecture 4: Advanced Pipelines Data hazards, control hazards, multi-cycle in-order pipelines (Appendix A.4-A.10)
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Process by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.
OOO execution © Avi Mendelson, 4/ MAMAS – Computer Architecture Lecture 7 – Out Of Order (OOO) Avi Mendelson Some of the slides were taken.
Pipelining By Toan Nguyen.
Recovery Basics. Types of Recovery Catastrophic – disk crash –Backup from tape; redo from log Non-catastrophic: inconsistent state –Undo some operations.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Object Oriented Analysis & Design SDL Threads. Contents 2  Processes  Thread Concepts  Creating threads  Critical sections  Synchronizing threads.
Khaled A. Al-Utaibi  Interrupt-Driven I/O  Hardware Interrupts  Responding to Hardware Interrupts  INTR and NMI  Computing the.
PROCESSES TANNENBAUM, SECTION 2-1 OPERATING SYSTEMS.
Nicolas Tjioe CSE 520 Wednesday 11/12/2008 Hyper-Threading in NetBurst Microarchitecture David Koufaty Deborah T. Marr Intel Published by the IEEE Computer.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
Computer Systems - Registers. Starter… Discuss in pairs the definition of the following Control Unit Arithmetic and Logic Unit Registers Internal clock.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.
VHDL – Behavioral Modeling and Registered Elements ENGIN 341 – Advanced Digital Design University of Massachusetts Boston Department of Engineering Dr.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Lecture: Out-of-order Processors
Address – 32 bits WRITE Write Cache Write Main Byte Offset Tag Index Valid Tag Data 16K entries 16.
RTL Design Methodology Transition from Pseudocode & Interface
Lecture 13 - Introduction to the Central Processing Unit (CPU)
/ Computer Architecture and Design
Smruti R. Sarangi IIT Delhi
Chapter 7.2 Computer Architecture
Assembly Language for Intel-Based Computers, 5th Edition
Lectures Queues Chapter 8 of textbook 1. Concepts of queue
Operating Systems (CS 340 D)
Lecture 6: Advanced Pipelines
Computer System Overview
Smruti R. Sarangi IIT Delhi
CS 704 Advanced Computer Architecture
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Checking for issue/dispatch
Process.
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
LINEAR DATA STRUCTURES
Presentation transcript:

Out-of-order Execution Divider Sanmukh Kuppannagari

Overview Concept of “In-Order” dispatch, “Out-of-Order” execution and “In- Order” Commit Role of Re-Order Buffer (ROB) in facilitating OoE-IoC. Performance improvement by using OoE technique

Problem Statement Given dividend-divisor pairs as input, using 4 dividers in parallel, produce the quotient-remainder pairs under the following constraint: “The order in which quotient-remainder pairs are deposited should be the order in which the dividend-divisor pairs are dispatched”

Part 1 Design

Clock: 0 Dd = x Dr = x Q = x R = x Dd = 24 Dr = 4 Q = 0 R = 24 Dd = x Dr = x Q = x R = x Dd = 15 Dr = 7 Q = 0 R = 15 Dd = 20 Dr = 4 Q = 1 R = 20 Dd = 8 Dr = 7 Q = 1 R = 0 Dd = 16 Dr = 4 Q = 2 R = 16 Dd = 12 Dr = 4 Q = 4 R = 12 Dd = 1 Dr = 7 Q = 2 R = 1 Dd = 8 Dr = 4 Q = 4 R = 8 Dd = 1 Dr = 7 Q = 2 R = 1 Dd = 4 Dr = 4 Q = 5 R = 4 Dd = 1 Dr = 7 Q = 2 R = 1 Dd = 0 Dr = 4 Q = 6 R = 0 Dd = x Dr = x Q = x R = x Dd = 15 Dr = 5 Q = 0 R = 15 Dd = 5 Dr = 4 Q = 0 R = 5 Dd = 10 Dr = 5 Q = 1 R = 10 Dd = 5 Dr = 5 Q = 2 R = 5 Dd = 1 Dr = 4 Q = 1 R = 1 Dd = 0 Dr = 5 Q = 3 R = 0 Dd = x Dr = x Q = x R = x Dd = x Dr = x Q = x R = x Clock: 1Clock: 2Clock: 3Clock: 5Clock: 4Clock: 6Clock: 7Clock: 8Clock: 9Clock: 10Clock: 11Clock: 13Clock: 12Clock: Dd = x Dr = x Q = x R = x Mem_dividend_divisor Mem_quotient_remainder Assign Pointer Collection Pointer Assign Pointer Collection Pointer Ready Busy Ready Busy Done

Part 2 Design

Introduce ROB to store the “out-of-order ”results produced by the dividers Dispatch Unit Wait until atleast one single divider is available and then read the dividend-divisor pair from the input memory if all of the memory is not read yet Assign the dividend-divisor pair to the available divider in priority order i.e. if a divider with smaller index is available, the input is dispatched to it Provide the write pointer of the ROB as a tag to the divider and increment the write pointer and send a start signal to the divider The ROB tag will be used to store the result in the corresponding ROB location on collection Issue Unit If any divisor has completed, store the result into the ROB location corresponding to the ROB tag in the single divider

Part 2 Design ROB Write Pointer: Next free location, used by dispatch unit to allocate a ROB entry Read Pointer: The top of ROB, entry corresponding to the most senior dispatched dividend-divisor pair Issue unit can write at any location pointed by ROB tag Wait until the top of ROB is not finished, once it is done store it into the output memory and increment rp

Clock: 0 Dd = x Dr = x Q = x R = x Dd = 24 Dr = 4 Q = 0 R = 24 Dd = x Dr = x Q = x R = x Dd = 15 Dr = 7 Q = 0 R = 15 Dd = 20 Dr = 4 Q = 1 R = 20 Dd = 8 Dr = 7 Q = 1 R = 0 Dd = 16 Dr = 4 Q = 2 R = 16 Dd = 12 Dr = 4 Q = 4 R = 12 Dd = 1 Dr = 7 Q = 2 R = 1 Dd = 8 Dr = 4 Q = 4 R = 8 Dd = 4 Dr = 4 Q = 5 R = 4 Dd = 0 Dr = 4 Q = 6 R = 0 Dd = x Dr = x Q = x R = x Dd = 15 Dr = 5 Q = 0 R = 15 Dd = 5 Dr = 4 Q = 0 R = 5 Dd = 10 Dr = 5 Q = 1 R = 10 Dd = 5 Dr = 5 Q = 2 R = 5 Dd = 1 Dr = 4 Q = 1 R = 1 Dd = 0 Dr = 5 Q = 3 R = 0 Dd = x Dr = x Q = x R = x Clock: 1Clock: 2Clock: 3Clock: 5Clock: 4Clock: 6Clock: 7Clock: 8Clock: 9Clock: 10Clock: 11 Clock: Dd = x Dr = x Q = x R = x Mem_dividend_divisor Mem_quotient_remainder RP WP RP WP RP Ready Busy Ready Busy Done ROB