Download presentation
Presentation is loading. Please wait.
Published bySadie Buckland Modified over 9 years ago
1
It’s all about latency Henk Neefs Dept. of Electronics and Information Systems (ELIS) University of Gent
2
Overview Introduction of processor model Show importance of latency Techniques to handle latency Quantify memory latency effect Why consider optical interconnects? Latency of an optical interconnect Conclusions
3
Out-of-order processor pipeline I-cache fetchdecode instruction window rename architectural register file LD ST execution units ‘future’ register file INT in-order retirement
4
Branch latency I-cache fetchdecode instruction window rename LD ST execution units ‘future’ register file INT BR time ADDORSTXORLD ORBRSTXORLD... BR latency
5
Eliminate branch latency By prediction: predict outcome of branch => eliminate dependency (with a high probability) By predication: convert control dependency to data dependency => eliminate control dependency
6
while (pointer!=0) pointer = pointer.next; Load latency Loop: LD R1, R1(32) BNE R1, Loop cycles LD CPI = 2 cycles/2 instructions = 1 cycle/instruction load latency = 2 cycles branch latency = 1 cycle BNE LD BNE LD BNE LD execution units
7
When longer load latency cycles LD CPI = 8 cycles/2 instructions = 4 cycles/instruction load latency = 2+6 cycles branch latency = 1 cycle BNE execution units When L1-cache misses and L2-cache hits: LD When L2-cache misses and main memory hits: load latency = 2+6+60 cycles CPI = 34 cycles/instruction
8
Memory hierarchy register file execution units L1 cache L2 cache main memory hard drive storage capacity and latency
9
L1 cache latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs
10
Main memory latency IPC = Instructions Per clock Cycle, 1 Ghz processor, spec95 programs
11
Performance and latency performance change = sensitivity * load latency change
12
Increase performance by eliminating/reducing load latency: –By prefetching: predict the next miss and fetch the data to e.g. L1-cache –By address prediction: address known earlier => load executed earlier => data early in register file or reducing sensitivity to load latency: – by fine-grain multithreading
13
Some prefetch techniques Stride prefetching: search for pattern with constant stride e.g. walking through a matrix (row- or column-order) Markov prefetching: recurring patterns of misses 2031425364 stride: 11 miss history prediction 10 110 15 12 100 …...
14
Stride prefetching IPC = Instructions Per clock Cycle, 1 Ghz processor, program: compress
15
Prefetching and sensitivity Factors of “performance sensitivity to latency” increase with stride-prefetching:
16
Latency is important: generalization to other processor architectures Consider schedule of program: time Present in every program execution: Latency of instruction execution Latency of communication => latency important whatever processor architecture
17
Optical interconnects (OI) Mature components: – Vertical-Cavity Surface Emitting Lasers (VCSELs) – Light Emitting Diodes (LEDs) Very high bandwidths Are replacing electronic interconnects in telecom and networks Useful for short inter-chip and even intra-chip interconnects?
18
OI in processor context At levels close to processor core, latency is very important => latency of OI determines how far OI penetrates in the memory hierarchy What is the latency of an optical interconnect?
19
An optical link Total latency = buffer latency + VCSEL/LED latency + time of flight + receiver latency LED/VCSEL buffer/modulation/bias fiber or light conductor receiver diode transimpedance amplifier
20
VCSEL characteristics A small semiconductor laser Carrier density should be high enough for lasing action
21
Total VCSEL link latency consists of Buffer latency Parasitic capacitances and series resistances of VCSEL and pads Threshold carrier density build up From low optical output to final optical output (intrinsic latency) Time of flight (TOF) Receiver latency
22
Total optical link latency CMOS: 0.6 m0.25 m0.6 m0.25 m @ 1 mW
23
Latency as function of power
24
Conclusions When combining performance sensitivity and optical latency we conclude: –optical interconnects are feasible to main memory and for multiprocessors –for interconnects close to processor core, optical interconnects have too high latency with present (telecom) devices, drivers and receivers => but now evolution to lower latency devices, drivers and receivers is taking place... For more information on the presented results: Henk Neefs, Latentiebeheersing in processors, PhD Universiteit Gent, January 2000 www.elis.rug.ac.be/~neefs
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.