A Configurable Simulator for OOO Speculative Execution Design & Implementation By Mustafa Imran Ali ID#230203.

Slides:



Advertisements
Similar presentations
CMSC 611: Advanced Computer Architecture Tomasulo Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advertisements

CSE 502: Computer Architecture
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Lec18.1 Step by step for Dynamic Scheduling by reorder buffer Copyright by John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
A scheme to overcome data hazards
Dynamic ILP: Scoreboard Professor Alvin R. Lebeck Computer Science 220 / ECE 252 Fall 2008.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
COMP25212 Advanced Pipelining Out of Order Processors.
1 Advanced Computer Architecture Limits to ILP Lecture 3.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Microprocessor Microarchitecture Dependency and OOO Execution Lynn Choi Dept. Of Computer and Electronics Engineering.
Instruction-Level Parallelism (ILP)
Spring 2003CSE P5481 Reorder Buffer Implementation (Pentium Pro) Hardware data structures retirement register file (RRF) (~ IBM 360/91 physical registers)
CS 211: Computer Architecture Lecture 5 Instruction Level Parallelism and Its Dynamic Exploitation Instructor: M. Lancaster Corresponding to Hennessey.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Review of CS 203A Laxmi Narayan Bhuyan Lecture2.
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Instruction-Level Parallelism dynamic scheduling prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University May 2015Instruction-Level Parallelism.
Out-of-Order Speculative Execution Designing a Configurable Simulator for an OOO Microprocessor By Mustafa Imran Ali ID#
1 Advanced Computer Architecture Dynamic Instruction Level Parallelism Lecture 2.
CSCE 614 Fall Hardware-Based Speculation As more instruction-level parallelism is exploited, maintaining control dependences becomes an increasing.
© Wen-mei Hwu and S. J. Patel, 2005 ECE 412, University of Illinois Lecture Instruction Execution: Dynamic Scheduling.
Professor Nigel Topham Director, Institute for Computing Systems Architecture School of Informatics Edinburgh University Informatics 3 Computer Architecture.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
1 Lecture 10: Memory Dependence Detection and Speculation Memory correctness, dynamic memory disambiguation, speculative disambiguation, Alpha Example.
CS203 – Advanced Computer Architecture ILP and Speculation.
Ch2. Instruction-Level Parallelism & Its Exploitation 2. Dynamic Scheduling ECE562/468 Advanced Computer Architecture Prof. Honggang Wang ECE Department.
Instruction-Level Parallelism and Its Dynamic Exploitation
Dynamic Scheduling Why go out of style?
CSL718 : Superscalar Processors
/ Computer Architecture and Design
/ Computer Architecture and Design
Out of Order Processors
CS203 – Advanced Computer Architecture
Lecture: Out-of-order Processors
Microprocessor Microarchitecture Dynamic Pipeline
CS203 – Advanced Computer Architecture
Advantages of Dynamic Scheduling
CMSC 611: Advanced Computer Architecture
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Out of Order Processors
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
ECE 2162 Reorder Buffer.
Lecture 11: Memory Data Flow Techniques
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)
Advanced Computer Architecture
Tomasulo Organization
Instruction Execution Cycle
Adapted from the slides of Prof
Midterm 2 review Chapter
/ Computer Architecture and Design
Instruction-Level Parallelism (ILP)
September 20, 2000 Prof. John Kubiatowicz
Lecture 7 Dynamic Scheduling
Lecture 10: ILP Innovations
A Configurable Simulator for OOO Speculative Execution
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Presentation transcript:

A Configurable Simulator for OOO Speculative Execution Design & Implementation By Mustafa Imran Ali ID#230203

by Mustafa Imran Ali Architecture Modeled Fetch logic –Trace driven execution. Branches outcome explicitly specified. Issue Logic –Issue width configurable Functional Units’ Reservations Stations (RS) –RS count configurable Execution Units modeled after MIPS R4000 Pipeline (Hennessy & Peterson Computer Architecture 3 rd Ed.) –No. of pipeline stages configurable Common Data Buses –No. of CDBs configurable ROB and commit logic –ROB size and commit capacity configurable

by Mustafa Imran Ali Simulation Methodology A program trace file written in comma separated variable (CSV) format A configuration file to specify values of configurable parameters Trace file and configuration file input to the simulator

by Mustafa Imran Ali Architectural Assumptions Only load misses supported. Stores are committed in a single cycle Stores use a direct bus to transfer the calculated Effective Address into the ROB Branch outcomes are written to ROB using the CDB Branch mispredict is handled when the branch instruction reaches the Head of ROB

by Mustafa Imran Ali Architectural Assumptions (cont.) Dynamic memory disambiguation implemented by using a Store EA cache –A load is only allowed to proceed if there are no pending Stores with the same effective address Reservations Stations issue the first ready instruction detected –Not necessarily the oldest Instruction

by Mustafa Imran Ali Architectural Assumptions (cont.) The number of CDBs available are arbitrated When a request for CDB arrives, the following priority order is used to grant the requests –Branch FU –Div FU –LD/ST –MULT FU –FPADD FU –INT ALU FU

by Mustafa Imran Ali List of Configurable Parameters ISSUE SIZE –The maximum number of instructions examined for parallel issue COMMIT SIZE –The maximum number of instructions examined in ROB for commit ROB SIZE –The number of entries in Reorder Buffer NUM CDB –Number of Common Data Buses LSQ SIZE –Number of entries in load store buffer STORE CACHE SIZE –Number of entries in store EA lookup table

by Mustafa Imran Ali List of Configurable Parameters NUMRSBU NUMRSINTALU NUMRSMULT MULTSTAGES NUMRSDIV

by Mustafa Imran Ali List of Configurable Parameters DIVCYCLES NUMRSFPADD FPADDSTAGES MISSPROB MPPROB

by Mustafa Imran Ali Simulator Structure main() { readtracefile(); readconfigfile(); while(NOT EXIT) { commit(); ROB_update(); RS_update(); CDB_Arbiter(); writeback(); execute(); issue(); fetch(); } printStatistics(); }

by Mustafa Imran Ali Block Diagram Trace Issue Unit Issue Unit INT ALU RS INT ALU RS BR UNIT RS BR UNIT RS LSQ Arbiter ROB DIV UNIT RS DIV UNIT RS MULT UNIT RS MULT UNIT RS CDB RF

by Mustafa Imran Ali Metrics Measured Cycles to Complete Issue Stall Cycles –Cycles when no instructions can be issued to RS FU utilizations (for each FU) –No. of FU type Instructions / Total Cycles CDB utilizations (for each CDB) –No. broadcasts / Total Cycles Cycles Per Instruction

by Mustafa Imran Ali Metrics Measured (cont.) Frequency of Various Issue Count over all execution cycles Frequency of Various Commit Count over all execution cycles RS occupancy Frequency over all cycles ROB occupancy Frequency over all cycles

by Mustafa Imran Ali Simulator Design Coded in C++ Compiled using MS VC++ 6.0

by Mustafa Imran Ali Execution Demonstration Sample Program ADD R0,R1,R2; ADD R4,R0,R3; ADD R7,R4,R0; ADD R10,R11,R12; ADD R13,R10,R15; ADD R13,R16,R17; ADD R15,R11,R12; ADD R17,R15,R12; EXIT Registers State Initializations REGS[1].valid=1 REGS[2].valid=1 REGS[3].valid=1 REGS[8].valid=1 REGS[9].valid=1 REGS[11].valid=1 REGS[12].valid=1 REGS[15].valid=1 REGS[16].valid=1 REGS[17].valid=1 } RAW RAW { { } RAW WAW { } WAR RAW {

by Mustafa Imran Ali Results: Cycles

by Mustafa Imran Ali Present Implementation Completely Configurable Simulator INT ALU in working State

by Mustafa Imran Ali Immediate Extension Branch Unit Completion Pipelined Multiplier Completion LD/STORE Unit Completion