Commit out of order Phd student: Adrián Cristal. Advisors: Josep Llosa, Antonio González and Mateo Valero
Commit out of order: Why? Tolerate Long Latency Instructions with following features (compared with Large ROB design) A Reduced ROB A Reduced Physical Register File
Commit out of order: How? Checkpoints: The processor creates a checkpoint in a conflictive long latency instruction and retires (virtually) it from ROB, but not from issue queues. The processor retires (virtually) all dependent instruction as well The processor retires the rest of instructions in a normal way In case of miss predict either a virtually retired branch or an exception, the processor recover its state from the checkpoint
Example Instruction Action Create checkpoint Commit Commit virtually ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7
Some Definitions At the moment to retire an instruction, the processor must: Retire or Commit: if the instruction is completed Retire or Commit Virtually: if the instruction is not ready Create a checkpoint and retire virtually: if the instruction has a long latency and is ready but not completed Wait to complete: if the instruction has a short latency and is ready but no completed A Physical Register is free only if: Its busy flag is clear Its reference counter in the commit state is zero Its blocking counter in the commit state is zero
Commit State It’s the committed (virtually or not) processor’s state When the processor creates a checkpoint, it copies this state to the checkpoint entry Its information is used to control which physical register is free
Commit State Map Commit Table: The processor saves here the committed (virtually or not) map table. References counters: For each physical register counts the number pending operations (readings or freeing) over the register Blockings counters: For each physical register counts the number of blockings over the register. When the processor creates a checkpoint, it blocks all physical registers included in the map commit table, plus the destination register of the instruction. And some stores blocks registers too.
Checkpoint Table It is a set of checkpoints where each entry contains A map table A references counters A virtually retired instruction counter The first virtually retired instruction information
Example: Create Checkpoint I Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Copy the Map Commit table to the new entry in the checkpoint table
Example: Create Checkpoint II Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update (add 1 to the entries corresponding to the destination physical register and source registers) and Copy the References counters from the commit state to the new entry in the checkpoint table
Example: Create Checkpoint III Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Copy the instruction information and Set the retired virtually counter to 1
Example: Create Checkpoint IV Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update blockings counters. Add 1 to the corresponding entry for each physical register in the map commit table. Add 1 to the entry corresponding to the destination register of the instruction
Example: Create Checkpoint V Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Send a signal to the store buffer to block the futures stores, until the checkpoint is removed
Example: Create Checkpoint VI Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Mark the instruction in the LSQ as retired virtually Free the rob entry, but not the LSQ entry. Update the map commit table and the busy flag
Example: Commit Update busy flag. Update map commit table Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update busy flag. Update map commit table References Counters[current]++ References Counters[old]--
Example: Commit Virtually Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Update busy flag. Update map commit table References Counters[current]++ References Counters[sources]++ Virtually retired counter++ in the last checkpoint entry
Example: Writeback I Instruction Action Create checkpoint Commit ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 In all entries of the chekcpoint table created after or with the instruction References Counters[old]-- References Counters[sources]-- Virtually retired counter-- in the instruction checkpoint entry
Example: Writeback II Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Virtually retired counter-- in the instruction checkpoint entry. If 0 then Unblock registers Clear the entry in the checkpoint table
Example: Miss Predict Branch I Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Copy from the checkpoint entry the references counters and the map commit table
Example: Miss Predict Branch II Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Unblock registers from all checkpoints entry that will be freed Unblock registers corresponding to aborted stores
Example: Miss Predict Branch III Instruction Action ld r1,@r2 Create checkpoint r3:=r2+r3 Commit r4:=r1+4 Commit virtually br r4==0,L1 ld r5,@r3 ld r3,@r2+128 ld r6,@r3+4 r3:=r3+1 br r3==0,L2 r6:=r6+r5 r6:=r6%7 Purge the IQ, FPQ, LSQ, SB and erase all entry in the ROB Set the PC to the next PC of the instruction saved in the checkpoint entry Purge the entries in the chekcpoint table
Exception I (virtually committed) If the instruction is the same that generate the checkpoint entry The processor waits until this entry is the only entry in the checkpoint table Acts as in miss predict but set the PC to the exception handler PC
Exception II (virtually committed) If the instruction is not the same that generate the checkpoint entry Acts as miss predict branch until the instruction which generate the exception is not virtually committed and acts as normal exception This model is precise exception model, but a relaxed more efficient model is allowed too
Load/Store Loads can advance stores The LSQ entries are freed at commit for completed loads or at writeback for virtually committed loads The stores can not be virtually retired. To retire a store the processor needs to know the address, if the value has not been yet calculated the store is retired and the value register is blocked The store operation always is retired to the store buffer, where the store remains until is safe to send to memory
Simulations Highly modified simplescalar 3.0 simulator 10 spec2000 First 500 millions instructions of test set Branch predictor is update at writeback Speedup=(IPC/IPC_base)-1
Simulations
Simulations
Simulations: Swim
Simulations: Swim
Detected problems Branch predictor Some times the processor will fail several times in the same branch. This can be solved more or less easily
Space design In the commit state the references counters and the blockings counters can be stored in a unique array. The checkpoint entry can store only the instruction information and the map table. The references counters and the blockings counters can be calculated from information stored in the checkpoint table entries, store buffer and the issue queues To allow a fast unblocking, perhaps it is better design to add a new 1 bit x physical register size array to the commit state and to each checkpoint entry. This array will be set if the physical register is included in the map table
Future works Add Virtual registers Add a “waiting” queue associated to each issue queue. The virtual retired instructions are moved to it when they are virtual committed. When the first instruction of a checkpoint is completed this instructions are moved back to the issue queue Study better politics of checkpoint creation, may be something based in branch confidence and number of virtual committed instructions Study of branch predictors adequate to this processor Develop a model without a ROB, only checkpoint based