Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tolerating Long Latency Instructions

Similar presentations


Presentation on theme: "Tolerating Long Latency Instructions"— Presentation transcript:

1 Tolerating Long Latency Instructions
Phd student: Adrián Cristal. Advisors: Mateo Valero, Antonio González and Josep Llosa

2 Motivation 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor
4 Way superscalar 32 KB L1 – 1 port 256 KB L2 2048 bimodal branch predictor

3 Tolerating Long Latency Instructions
Resources Study Large ROBs Early Release + Virtual Registers

4 Resources Study I – FP Programs
2.2 X For 2K Rob, the processor needs 1K queues, and more than 1K registers Motivation 4 Way Superscalar Processor 32 KB L1 256 KB L2 16 K entries Gshare BP 500 cycles memory latency

5 Resources Study IV – Integer
0.98 X 1.44 X

6 Resources Study II – Metodology
Two basic strategies: Early Recycle: To free the resources as soon as possible. Late Assignment: To assign resources as late as possible. Instructions: Short Term: These are instructions that depend only on short latency instructions or in short term instructions. Long Term: These are instructions that depend on a long latency instruction (i.e. load miss) or depend on la long term instruction.

7 Resources Study III – Methodology
Processor 4 Way Superscalar 2048 ROBs entries 2080 Physical Registers 2048 Queues (Integer, FP, Load, Store) 16 K entries GShare Branch Predictor 32 KB L1 - 2 ports – 1 cycle 256 KB L2 – 2 ports – 10 cycles 500 cycles main memory

8 Resources Study V – FP Instructions in-flight

9 Resources Study VI – Int Instructions in-flight

10 Resources Study VII – FP Register File
Early Release Virtual Registers

11 Resources Study VIII – Integer Register File

12 Resources Study IX – FP Queue
Late assignment Remove – Reinsert Dependence Chain

13 Resources Study X – Integer Queue

14 Resources Study XI – FP Load Queue
Early Release Checkpointing

15 Resources Study XII – Integer Load Queue

16 Resources Study XIII – FP Store Queue

17 Resources Study XIV – Integer Store Queue

18 Virtual ROBs (commit out of order)
Target: To create Large Reorder Buffers using a samller one Characteristics: Based on checkpoints Uses CAM Maps Tables Scheme Less than 1 KB for 8 checkpoints (256 registers) Allows Early Release of registers Allows Early Release of Loads Allows Out-of-Order Commit

19 Early Release + Virtual Registers I
Early Release of registers Needs a mechanism to recover from exceptions Checkpointing Needs a pending counter for each register When an instruction is decoded, each pending counter associated with the source registers is incremented and when the instructions is issued, the pending counter is decremented. The instructions in a wrong way, are “nullified” and issued in order to maintain the pending counter. Highly coupled with the renaming logic CAM maps table scheme A register can be freed if it is not referenced in any map table, and if his pending counter is zero.

20 Early Release + Virtual Registers II
Decouple renaming from physical register allocation. Needs a new map table from Virtual Register to Physical Register. The CAM Maps Tables, map from Logical Registers to Virtual Registers. Each entry is added with an entry for the physical register.

21 Early Release + Virtual Registers III
Evaluation

22 Early Release + Virtual Registers IV
Evaluation FP

23 Early Release + Virtual Registers V
Evaluation Integer

24 Future works Finish the evaluation of Early Register Release + Virtual Registers Work on a model to remove and reinsert instructions in the queue Join the Virtual ROBs and Early Release + Virtual registers + Remove and Reinsert


Download ppt "Tolerating Long Latency Instructions"

Similar presentations


Ads by Google