Commit out of order Phd student: Adrián Cristal.

Slides:



Advertisements
Similar presentations
Hardware-Based Speculation. Exploiting More ILP Branch prediction reduces stalls but may not be sufficient to generate the desired amount of ILP One way.
Advertisements

Krste Asanovic Electrical Engineering and Computer Sciences
1 Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design.
Federation: Repurposing Scalar Cores for Out- of-Order Instruction Issue David Tarjan*, Michael Boyer, and Kevin Skadron* University of Virginia Department.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
EECS 470 Lecture 7 Branches: Address prediction and recovery (And interrupt recovery too.)
1 Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: branch prediction, out-of-order processors (Sections )
Trace Caches J. Nelson Amaral. Difficulties to Instruction Fetching Where to fetch the next instruction from? – Use branch prediction Sometimes there.
March 9, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 12 - Advanced Out-of-Order Superscalars Krste Asanovic Electrical.
Translation Buffers (TLB’s)
1 Lecture 10: ILP Innovations Today: handling memory dependences with the LSQ and innovations for each pipeline stage (Section 3.5)
1 Lecture 8: Branch Prediction, Dynamic ILP Topics: static speculation and branch prediction (Sections )
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Ravikumar Source:
Out-of-Order Commit Processors Adrián Cristal (UPC), Daniel Ortega (HP Labs), Josep Llosa (UPC) and Mateo Valero (UPC) HPCA-10, Madrid February th.
Implementing Precise Interrupts in Pipelined Processors James E. Smith Andrew R.Pleszkun Presented By: Shrikant G.
OOO Pipelines - II Smruti R. Sarangi IIT Delhi 1.
OOO Pipelines - III Smruti R. Sarangi Computer Science and Engineering, IIT Delhi.
1 Lecture: Out-of-order Processors Topics: branch predictor wrap-up, a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
1 Lecture: Out-of-order Processors Topics: a basic out-of-order processor with issue queue, register renaming, and reorder buffer.
Lecture: Out-of-order Processors
MIPS I/O and Interrupt.
Timer and Interrupts.
/ Computer Architecture and Design
Smruti R. Sarangi IIT Delhi
PowerPC 604 Superscalar Microprocessor
CIS-550 Advanced Computer Architecture Lecture 10: Precise Exceptions
Dr. George Michelogiannakis EECS, University of California at Berkeley
CSE 502: Computer Architecture
Lecture: Out-of-order Processors
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Out-of-Order Commit Processors
MIPS I/O and Interrupt.
Lecture 6: Advanced Pipelines
Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Lecture 18: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue.
Tolerating Long Latency Instructions
Smruti R. Sarangi IIT Delhi
Lecture 11: Memory Data Flow Techniques
Out-of-Order Commit Processor
Smruti R. Sarangi Computer Science and Engineering, IIT Delhi
Lecture: Out-of-order Processors
Lecture 8: Dynamic ILP Topics: out-of-order processors
Adapted from the slides of Prof
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Advanced Computer Architecture
Out-of-Order Commit Processors
Lecture 20: OOO, Memory Hierarchy
Translation Buffers (TLB’s)
Adapted from the slides of Prof
Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 9/30/2011
Instruction-Level Parallelism (ILP)
Translation Buffers (TLB’s)
Additional ILP Topics Prof. Eric Rotenberg
Patrick Akl and Andreas Moshovos AENAO Research Group
RTL for the SRC pipeline registers
Lecture 10: ILP Innovations
Lecture 9: ILP Innovations
Translation Buffers (TLBs)
Lecture 9: Dynamic ILP Topics: out-of-order processors
Conceptual execution on a processor which exploits ILP
Review What are the advantages/disadvantages of pages versus segments?
Spring 2019 Prof. Eric Rotenberg
Handling Stores and Loads
Presentation transcript:

Commit out of order Phd student: Adrián Cristal. Advisors: Josep Llosa, Antonio González and Mateo Valero

Commit out of order: Why? Tolerate Long Latency Instructions with following features (compared with Large ROB design) A Reduced ROB A Reduced Physical Register File

Commit out of order: How? Checkpoints: The processor creates a checkpoint in a conflictive long latency instruction and retires (virtually) it from ROB, but not from issue queues. The processor retires (virtually) all dependent instruction as well The processor retires the rest of instructions in a normal way In case of miss predict either a virtually retired branch or an exception, the processor recover its state from the checkpoint

Example Instruction Action Create checkpoint Commit Commit virtually ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7

Some Definitions At the moment to retire an instruction, the processor must: Retire or Commit: if the instruction is completed Retire or Commit Virtually: if the instruction is not ready Create a checkpoint and retire virtually: if the instruction has a long latency and is ready but not completed Wait to complete: if the instruction has a short latency and is ready but no completed A Physical Register is free only if: Its busy flag is clear Its reference counter in the commit state is zero Its blocking counter in the commit state is zero

Commit State It’s the committed (virtually or not) processor’s state When the processor creates a checkpoint, it copies this state to the checkpoint entry Its information is used to control which physical register is free

Commit State Map Commit Table: The processor saves here the committed (virtually or not) map table. References counters: For each physical register counts the number pending operations (readings or freeing) over the register Blockings counters: For each physical register counts the number of blockings over the register. When the processor creates a checkpoint, it blocks all physical registers included in the map commit table, plus the destination register of the instruction. And some stores blocks registers too.

Checkpoint Table It is a set of checkpoints where each entry contains A map table A references counters A virtually retired instruction counter The first virtually retired instruction information

Example: Create Checkpoint I Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Copy the Map Commit table to the new entry in the checkpoint table

Example: Create Checkpoint II Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update (add 1 to the entries corresponding to the destination physical register and source registers) and Copy the References counters from the commit state to the new entry in the checkpoint table

Example: Create Checkpoint III Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Copy the instruction information and Set the retired virtually counter to 1

Example: Create Checkpoint IV Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update blockings counters. Add 1 to the corresponding entry for each physical register in the map commit table. Add 1 to the entry corresponding to the destination register of the instruction

Example: Create Checkpoint V Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Send a signal to the store buffer to block the futures stores, until the checkpoint is removed

Example: Create Checkpoint VI Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Mark the instruction in the LSQ as retired virtually Free the rob entry, but not the LSQ entry. Update the map commit table and the busy flag

Example: Commit Update busy flag. Update map commit table Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update busy flag. Update map commit table References Counters[current]++ References Counters[old]--

Example: Commit Virtually Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update busy flag. Update map commit table References Counters[current]++ References Counters[sources]++ Virtually retired counter++ in the last checkpoint entry

Example: Writeback I Instruction Action Create checkpoint Commit ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 In all entries of the chekcpoint table created after or with the instruction References Counters[old]-- References Counters[sources]-- Virtually retired counter-- in the instruction checkpoint entry

Example: Writeback II Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Virtually retired counter-- in the instruction checkpoint entry. If 0 then Unblock registers Clear the entry in the checkpoint table

Example: Miss Predict Branch I Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Copy from the checkpoint entry the references counters and the map commit table

Example: Miss Predict Branch II Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Unblock registers from all checkpoints entry that will be freed Unblock registers corresponding to aborted stores

Example: Miss Predict Branch III Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Purge the IQ, FPQ, LSQ, SB and erase all entry in the ROB Set the PC to the next PC of the instruction saved in the checkpoint entry Purge the entries in the chekcpoint table

Exception I (virtually committed) If the instruction is the same that generate the checkpoint entry The processor waits until this entry is the only entry in the checkpoint table Acts as in miss predict but set the PC to the exception handler PC

Exception II (virtually committed) If the instruction is not the same that generate the checkpoint entry Acts as miss predict branch until the instruction which generate the exception is not virtually committed and acts as normal exception This model is precise exception model, but a relaxed more efficient model is allowed too

Load/Store Loads can advance stores The LSQ entries are freed at commit for completed loads or at writeback for virtually committed loads The stores can not be virtually retired. To retire a store the processor needs to know the address, if the value has not been yet calculated the store is retired and the value register is blocked The store operation always is retired to the store buffer, where the store remains until is safe to send to memory

Simulations Highly modified simplescalar 3.0 simulator 10 spec2000 First 500 millions instructions of test set Branch predictor is update at writeback Speedup=(IPC/IPC_base)-1

Simulations

Simulations

Simulations: Swim

Simulations: Swim

Detected problems Branch predictor Some times the processor will fail several times in the same branch. This can be solved more or less easily

Space design In the commit state the references counters and the blockings counters can be stored in a unique array. The checkpoint entry can store only the instruction information and the map table. The references counters and the blockings counters can be calculated from information stored in the checkpoint table entries, store buffer and the issue queues To allow a fast unblocking, perhaps it is better design to add a new 1 bit x physical register size array to the commit state and to each checkpoint entry. This array will be set if the physical register is included in the map table

Future works Add Virtual registers Add a “waiting” queue associated to each issue queue. The virtual retired instructions are moved to it when they are virtual committed. When the first instruction of a checkpoint is completed this instructions are moved back to the issue queue Study better politics of checkpoint creation, may be something based in branch confidence and number of virtual committed instructions Study of branch predictors adequate to this processor Develop a model without a ROB, only checkpoint based