Tolerating Long Latency Instructions Phd student: Adrián Cristal. Advisors: Mateo Valero, Antonio González and Josep Llosa
Motivation 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor 4 Way superscalar 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor
Tolerating Long Latency Instructions Resources Study Large ROBs Early Release + Virtual Registers
Resources Study I – FP Programs 2.2 X For 2K Rob, the processor needs 1K queues, and more than 1K registers Motivation 4 Way Superscalar Processor 32 KB L1 256 KB L2 16 K entries Gshare BP 500 cycles memory latency
Resources Study IV – Integer 0.98 X 1.44 X
Resources Study II – Metodology Two basic strategies: Early Recycle: To free the resources as soon as possible. Late Assignment: To assign resources as late as possible. Instructions: Short Term: These are instructions that depend only on short latency instructions or in short term instructions. Long Term: These are instructions that depend on a long latency instruction (i.e. load miss) or depend on la long term instruction.
Resources Study III – Methodology Processor 4 Way Superscalar 2048 ROBs entries 2080 Physical Registers 2048 Queues (Integer, FP, Load, Store) 16 K entries GShare Branch Predictor 32 KB L1 - 2 ports – 1 cycle 256 KB L2 – 2 ports – 10 cycles 500 cycles main memory
Resources Study V – FP Instructions in-flight
Resources Study VI – Int Instructions in-flight
Resources Study VII – FP Register File Early Release Virtual Registers
Resources Study VIII – Integer Register File
Resources Study IX – FP Queue Late assignment Remove – Reinsert Dependence Chain
Resources Study X – Integer Queue
Resources Study XI – FP Load Queue Early Release Checkpointing
Resources Study XII – Integer Load Queue
Resources Study XIII – FP Store Queue
Resources Study XIV – Integer Store Queue
Virtual ROBs (commit out of order) Target: To create Large Reorder Buffers using a samller one Characteristics: Based on checkpoints Uses CAM Maps Tables Scheme Less than 1 KB for 8 checkpoints (256 registers) Allows Early Release of registers Allows Early Release of Loads Allows Out-of-Order Commit
Early Release + Virtual Registers I Early Release of registers Needs a mechanism to recover from exceptions Checkpointing Needs a pending counter for each register When an instruction is decoded, each pending counter associated with the source registers is incremented and when the instructions is issued, the pending counter is decremented. The instructions in a wrong way, are “nullified” and issued in order to maintain the pending counter. Highly coupled with the renaming logic CAM maps table scheme A register can be freed if it is not referenced in any map table, and if his pending counter is zero.
Early Release + Virtual Registers II Decouple renaming from physical register allocation. Needs a new map table from Virtual Register to Physical Register. The CAM Maps Tables, map from Logical Registers to Virtual Registers. Each entry is added with an entry for the physical register.
Early Release + Virtual Registers III Evaluation
Early Release + Virtual Registers IV Evaluation FP
Early Release + Virtual Registers V Evaluation Integer
Future works Finish the evaluation of Early Register Release + Virtual Registers Work on a model to remove and reinsert instructions in the queue Join the Virtual ROBs and Early Release + Virtual registers + Remove and Reinsert