A Configurable Simulator for OOO Speculative Execution

Slides:

Advertisements

Similar presentations

Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.

Advertisements

Superscalar Processors

Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)

A scheme to overcome data hazards

1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.

Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.

COMP25212 Advanced Pipelining Out of Order Processors.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.

Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.

Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)

CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.

1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)

A Configurable Simulator for OOO Speculative Execution Design & Implementation By Mustafa Imran Ali ID#

Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.

Out-of-Order Speculative Execution Designing a Configurable Simulator for an OOO Microprocessor By Mustafa Imran Ali ID#

1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.

Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.

The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.

1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.

CSE431 L13 SS Execute & Commit.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 13: SS Backend (Execute, Writeback & Commit) Mary Jane.

CS203 – Advanced Computer Architecture ILP and Speculation.

Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.

Instruction-Level Parallelism and Its Dynamic Exploitation

IBM System 360. Common architecture for a set of machines

Dynamic Scheduling Why go out of style?

CSL718 : Superscalar Processors

E0-243: Computer Architecture

/ Computer Architecture and Design

/ Computer Architecture and Design

COMP 740: Computer Architecture and Implementation

Out of Order Processors

CS203 – Advanced Computer Architecture

Lecture: Out-of-order Processors

Microprocessor Microarchitecture Dynamic Pipeline

CS203 – Advanced Computer Architecture

Out of Order (OoO) Execution

Advantages of Dynamic Scheduling

High-level view Out-of-order pipeline

11/14/2018 CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation Aleksandar Milenković, Electrical and Computer.

CMSC 611: Advanced Computer Architecture

A Dynamic Algorithm: Tomasulo’s

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Out of Order Processors

Pipelining Multicycle, MIPS R4000, and More

Lecture 10: Out-of-order Processors

Lecture 11: Out-of-order Processors

Lecture: Out-of-order Processors

Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

ECE 2162 Reorder Buffer.

Lecture 11: Memory Data Flow Techniques

Lecture: Out-of-order Processors

Lecture 8: Dynamic ILP Topics: out-of-order processors

Adapted from the slides of Prof

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

Advanced Computer Architecture

CC423: Advanced Computer Architecture ILP: Part V – Multiple Issue

Tomasulo Organization

Adapted from the slides of Prof

Midterm 2 review Chapter

/ Computer Architecture and Design

Instruction-Level Parallelism (ILP)

Chapter 3: ILP and Its Exploitation

September 20, 2000 Prof. John Kubiatowicz

Lecture 7 Dynamic Scheduling

Lecture 9: Dynamic ILP Topics: out-of-order processors

Conceptual execution on a processor which exploits ILP

Presentation transcript:

A Configurable Simulator for OOO Speculative Execution Design & Implementation Presented by Mustafa Imran Ali ID#230203 Fall 2004 COE 501

Architecture Modeled Fetch logic Issue Logic Trace driven execution. Branches outcome explicitly specified. Issue Logic Issue width configurable Functional Units’ Reservations Stations (RS) RS count configurable Execution Units modeled after MIPS R4000 Pipeline (Hennessy & Peterson Computer Architecture 3rd Ed.) No. of pipeline stages configurable Common Data Buses No. of CDBs configurable ROB and commit logic ROB size and commit capacity configurable

Simulation Methodology A program trace file written in comma separated variable (CSV) format A configuration file to specify values of configurable parameters Trace file and configuration file input to the simulator

Architectural Assumptions Only load misses supported. Stores are committed in a single cycle Stores use a direct bus to transfer the calculated Effective Address into the ROB Branch outcomes are written to ROB using the CDB Branch mispredict is handled when the branch instruction reaches the Head of ROB

Architectural Assumptions (cont.) Dynamic memory disambiguation implemented by using a Store EA cache A load is only allowed to proceed if there are no pending Stores with the same effective address Reservations Stations issue the first ready instruction detected Not necessarily the oldest Instruction

Architectural Assumptions (cont.) The number of CDBs available are arbitrated When a request for CDB arrives, the following priority order is used to grant the requests Branch FU Div FU LD/ST MULT FU FPADD FU INT ALU FU

List of Configurable Parameters ISSUE SIZE The maximum number of instructions examined for parallel issue COMMIT SIZE The maximum number of instructions examined in ROB for commit ROB SIZE The number of entries in Reorder Buffer NUM CDB Number of Common Data Buses LSQ SIZE Number of entries in load store buffer STORE CACHE SIZE Number of entries in store EA lookup table

List of Configurable Parameters NUMRSBU Number of reservation stations in branch prediction unit NUMRSINTALU Number of reservations stations in integer ALU NUMRSMULT Number of reservations stations in integer multiplier MULTSTAGES Number of pipeline stages in integer multiplier NUMRSDIV Number of reservations stations in integer division unit

List of Configurable Parameters DIVCYCLES Number of stages in integer division NUMRSFPADD Number of reservations stations in floating point adder FPADDSTAGES Number of pipeline stages in floating point adder MISSPROB The load miss probability MPPROB Branch mispredict probability

Simulator Structure main() { readtracefile(); readconfigfile(); while(NOT EXIT) commit(); ROB_update(); RS_update(); CDB_Arbiter(); writeback(); execute(); issue(); fetch(); } printStatistics();

Block Diagram Issue Unit Instructions Trace INT ALU RS BR UNIT RS LSQ DIV UNIT RS MULT UNIT RS ROB Arbiter Functional Units CDB RF

Metrics Measured Cycles to Complete Issue Stall Cycles Cycles when no instructions can be issued to RS FU utilizations (for each FU) No. of FU type Instructions / Total Cycles CDB utilizations (for each CDB) No. broadcasts / Total Cycles Cycles Per Instruction

Metrics Measured (cont.) Frequency of Various Issue Count over all execution cycles Frequency of Various Commit Count over all execution cycles RS occupancy Frequency over all cycles ROB occupancy Frequency over all cycles

Simulator Design Coded in C++ Compiled using MS VC++ 6.0

Execution Demonstration Registers State Initializations REGS[1].valid=1 REGS[2].valid=1 REGS[3].valid=1 REGS[8].valid=1 REGS[9].valid=1 REGS[11].valid=1 REGS[12].valid=1 REGS[15].valid=1 REGS[16].valid=1 REGS[17].valid=1 Sample Program ADD R0,R1,R2; ADD R4,R0,R3; ADD R7,R4,R0; ADD R10,R11,R12; ADD R13,R10,R15; ADD R13,R16,R17; ADD R15,R11,R12; ADD R17,R15,R12; EXIT RAW{{ }RAW }RAW }WAR WAW{ RAW{

Results: Cycles

Present Implementation Completely Configurable Simulator

Immediate Extensions