Download presentation
Presentation is loading. Please wait.
Published byJosé Francisco Paz Toledo Modified over 6 years ago
1
Tolerating Long Latency Instructions
Phd student: Adrián Cristal. Advisors: Mateo Valero, Antonio González and Josep Llosa
2
Motivation 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor
4 Way superscalar 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor
3
Tolerating Long Latency Instructions
Resources Study Large ROBs Early Release + Virtual Registers
4
Resources Study I – FP Programs
2.2 X For 2K Rob, the processor needs 1K queues, and more than 1K registers Motivation 4 Way Superscalar Processor 32 KB L1 256 KB L2 16 K entries Gshare BP 500 cycles memory latency
5
Resources Study IV – Integer
0.98 X 1.44 X
6
Resources Study II – Metodology
Two basic strategies: Early Recycle: To free the resources as soon as possible. Late Assignment: To assign resources as late as possible. Instructions: Short Term: These are instructions that depend only on short latency instructions or in short term instructions. Long Term: These are instructions that depend on a long latency instruction (i.e. load miss) or depend on la long term instruction.
7
Resources Study III – Methodology
Processor 4 Way Superscalar 2048 ROBs entries 2080 Physical Registers 2048 Queues (Integer, FP, Load, Store) 16 K entries GShare Branch Predictor 32 KB L1 - 2 ports – 1 cycle 256 KB L2 – 2 ports – 10 cycles 500 cycles main memory
8
Resources Study V – FP Instructions in-flight
9
Resources Study VI – Int Instructions in-flight
10
Resources Study VII – FP Register File
Early Release Virtual Registers
11
Resources Study VIII – Integer Register File
12
Resources Study IX – FP Queue
Late assignment Remove – Reinsert Dependence Chain
13
Resources Study X – Integer Queue
14
Resources Study XI – FP Load Queue
Early Release Checkpointing
15
Resources Study XII – Integer Load Queue
16
Resources Study XIII – FP Store Queue
17
Resources Study XIV – Integer Store Queue
18
Virtual ROBs (commit out of order)
Target: To create Large Reorder Buffers using a samller one Characteristics: Based on checkpoints Uses CAM Maps Tables Scheme Less than 1 KB for 8 checkpoints (256 registers) Allows Early Release of registers Allows Early Release of Loads Allows Out-of-Order Commit
19
Early Release + Virtual Registers I
Early Release of registers Needs a mechanism to recover from exceptions Checkpointing Needs a pending counter for each register When an instruction is decoded, each pending counter associated with the source registers is incremented and when the instructions is issued, the pending counter is decremented. The instructions in a wrong way, are “nullified” and issued in order to maintain the pending counter. Highly coupled with the renaming logic CAM maps table scheme A register can be freed if it is not referenced in any map table, and if his pending counter is zero.
20
Early Release + Virtual Registers II
Decouple renaming from physical register allocation. Needs a new map table from Virtual Register to Physical Register. The CAM Maps Tables, map from Logical Registers to Virtual Registers. Each entry is added with an entry for the physical register.
21
Early Release + Virtual Registers III
Evaluation
22
Early Release + Virtual Registers IV
Evaluation FP
23
Early Release + Virtual Registers V
Evaluation Integer
24
Future works Finish the evaluation of Early Register Release + Virtual Registers Work on a model to remove and reinsert instructions in the queue Join the Virtual ROBs and Early Release + Virtual registers + Remove and Reinsert
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.