Tolerating Long Latency Instructions

Slides:



Advertisements
Similar presentations
Topics Left Superscalar machines IA64 / EPIC architecture
Advertisements

Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
UPC Microarchitectural Techniques to Exploit Repetitive Computations and Values Carlos Molina Clemente LECTURA DE TESIS, (Barcelona,14 de Diciembre de.
Out-of-Order Machine State Instruction Sequence: Inorder State: Look-ahead State: Architectural State: R3  A R7  B R8  C R7  D R4  E R3  F R8  G.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Alpha Microarchitecture Onur/Aditya 11/6/2001.
National & Kapodistrian University of Athens Dep.of Informatics & Telecommunications MSc. In Computer Systems Technology Advanced Computer Architecture.
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
CS 7810 Lecture 10 Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors O. Mutlu, J. Stark, C. Wilkerson, Y.N.
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Performance Counters on Intel® Core™ 2 Duo Xeon® Processors Michael D’Mello
Out-of-Order Commit Processors Adrián Cristal (UPC), Daniel Ortega (HP Labs), Josep Llosa (UPC) and Mateo Valero (UPC) HPCA-10, Madrid February th.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
The life of an instruction in EV6 pipeline Constantinos Kourouyiannis.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
Exploiting Value Locality in Physical Register Files Saisanthosh Balakrishnan Guri Sohi University of Wisconsin-Madison 36 th Annual International Symposium.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Samira Khan University of Virginia Feb 9, 2016 COMPUTER ARCHITECTURE CS 6354 Precise Exception The content and concept of this course are adapted from.
Lecture: Out-of-order Processors
Dynamic Scheduling Why go out of style?
ALPHA Introduction I- Stream
Smruti R. Sarangi IIT Delhi
PowerPC 604 Superscalar Microprocessor
Physical Register Inlining (PRI)
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
Case Studies MAINAK CS422 1 CS422 MAINAK CS422 MAINAK 1.
Lecture: Out-of-order Processors
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Microprocessor Microarchitecture Dynamic Pipeline
Out-of-Order Commit Processors
Commit out of order Phd student: Adrián Cristal.
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
CS 152 Computer Architecture & Engineering
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Flow diagrams (i) (ii) (iii) x – Example
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
Smruti R. Sarangi IIT Delhi
Out-of-Order Commit Processor
Alpha Microarchitecture
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Out-of-Order Commit Processors
Lecture 20: OOO, Memory Hierarchy
Sampoorani, Sivakumar and Joshua
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Patrick Akl and Andreas Moshovos AENAO Research Group
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
ECE 721, Spring 2019 Prof. Eric Rotenberg.
ECE 721 Alternatives to ROB-based Retirement
Sizing Structures Fixed relations Empirical (simulation-based)
ECE 721 Modern Superscalar Microarchitecture
Lecture 7: Branch Prediction, Dynamic ILP
Presentation transcript:

Tolerating Long Latency Instructions Phd student: Adrián Cristal. Advisors: Mateo Valero, Antonio González and Josep Llosa

Motivation 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor 4 Way superscalar 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor

Tolerating Long Latency Instructions Resources Study Large ROBs Early Release + Virtual Registers

Resources Study I – FP Programs 2.2 X For 2K Rob, the processor needs 1K queues, and more than 1K registers Motivation 4 Way Superscalar Processor 32 KB L1 256 KB L2 16 K entries Gshare BP 500 cycles memory latency

Resources Study IV – Integer 0.98 X 1.44 X

Resources Study II – Metodology Two basic strategies: Early Recycle: To free the resources as soon as possible. Late Assignment: To assign resources as late as possible. Instructions: Short Term: These are instructions that depend only on short latency instructions or in short term instructions. Long Term: These are instructions that depend on a long latency instruction (i.e. load miss) or depend on la long term instruction.

Resources Study III – Methodology Processor 4 Way Superscalar 2048 ROBs entries 2080 Physical Registers 2048 Queues (Integer, FP, Load, Store) 16 K entries GShare Branch Predictor 32 KB L1 - 2 ports – 1 cycle 256 KB L2 – 2 ports – 10 cycles 500 cycles main memory

Resources Study V – FP Instructions in-flight

Resources Study VI – Int Instructions in-flight

Resources Study VII – FP Register File Early Release Virtual Registers

Resources Study VIII – Integer Register File

Resources Study IX – FP Queue Late assignment Remove – Reinsert Dependence Chain

Resources Study X – Integer Queue

Resources Study XI – FP Load Queue Early Release Checkpointing

Resources Study XII – Integer Load Queue

Resources Study XIII – FP Store Queue

Resources Study XIV – Integer Store Queue

Virtual ROBs (commit out of order) Target: To create Large Reorder Buffers using a samller one Characteristics: Based on checkpoints Uses CAM Maps Tables Scheme Less than 1 KB for 8 checkpoints (256 registers) Allows Early Release of registers Allows Early Release of Loads Allows Out-of-Order Commit

Early Release + Virtual Registers I Early Release of registers Needs a mechanism to recover from exceptions Checkpointing Needs a pending counter for each register When an instruction is decoded, each pending counter associated with the source registers is incremented and when the instructions is issued, the pending counter is decremented. The instructions in a wrong way, are “nullified” and issued in order to maintain the pending counter. Highly coupled with the renaming logic CAM maps table scheme A register can be freed if it is not referenced in any map table, and if his pending counter is zero.

Early Release + Virtual Registers II Decouple renaming from physical register allocation. Needs a new map table from Virtual Register to Physical Register. The CAM Maps Tables, map from Logical Registers to Virtual Registers. Each entry is added with an entry for the physical register.

Early Release + Virtual Registers III Evaluation

Early Release + Virtual Registers IV Evaluation FP

Early Release + Virtual Registers V Evaluation Integer

Future works Finish the evaluation of Early Register Release + Virtual Registers Work on a model to remove and reinsert instructions in the queue Join the Virtual ROBs and Early Release + Virtual registers + Remove and Reinsert