Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006.

Similar presentations


Presentation on theme: "Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006."— Presentation transcript:

1 Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006

2 Anshul Kumar, CSE IITD slide 2 Pipelined Processors Function-parallel Instr level (ILP)Thread levelProcess level Pipelined processors VLIWs Superscalar processors Parallel architectures Data-parallel Intel’s terminology: intra ILP inter ILP

3 Anshul Kumar, CSE IITD slide 3 Processor Performance MIPS and MFLOPS may not truly represent performance Execution time of a program true measure of performance SPEC rating acceptable

4 Anshul Kumar, CSE IITD slide 4 Execution Time and Clock Period Program exec time = T prog = N * T inst = N * CPI *  t N :Number of instructions CPI :Cycles per instruction(Av)  t :Clock cycle time IF D RF EX/AG M WB Instruction execution time = T inst = CPI*  t tt

5 Anshul Kumar, CSE IITD slide 5 What influences clock period? T prog = N * CPI *  t Technology -  t  Software - N  Architecture - N * CPI *  t  Instruction set architecture (ISA) trade-off N vs CPI *  t Micro architecture (  A) trade-offCPI vs  t

6 Anshul Kumar, CSE IITD slide 6 Determining Clock Period Clock Period =  t = P max P max = max propagation delay Clock P max Comb Reg

7 Anshul Kumar, CSE IITD slide 7 Ideal Pipelining  t = T inst / S CPI = 1 Effective time per inst T eff = 1 * T inst / S T inst S stages

8 Anshul Kumar, CSE IITD slide 8 Pipelining with hazards  t = T inst / S CPI = 1 + (S - 1) * b T eff = (1 + (S - 1) * b) * T inst / S T inst S stages Frequency of interruptions - b

9 Anshul Kumar, CSE IITD slide 9

10 Anshul Kumar, CSE IITD slide 10 A more realistic view  t = P max + C P max = max propagation delay C = clocking overhead Clock P max C Comb Reg

11 Anshul Kumar, CSE IITD slide 11 Clocking Overhead Fixed overheadc –Setup time –Output delay Variable overhead (stretching factor)k –Clock skew  t = T inst / S + k * T inst / S + c = (1 + k) * T inst / S + c

12 Anshul Kumar, CSE IITD slide 12 Pipelining with Clocking Overhead T eff = [1 + (S - 1) * b] * [(1 + k) * T inst / S + c] S opt =  [(1 - b) * (1 + k) * T inst / (b * c)]

13 Anshul Kumar, CSE IITD slide 13

14 Anshul Kumar, CSE IITD slide 14 Partitioning instruction into cycles with non-uniform stage times IF D RF AG T DF EX PA One action - one pipeline stage => large quantization overhead Multiple actions per stage? Multiple stages per action?

15 Anshul Kumar, CSE IITD slide 15 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns

16 Anshul Kumar, CSE IITD slide 16 Optimal Pipelining T inst = 4+6+10+3+12+9+3+6+10+3+22+2 = 90 ns b = 0.2c = 4 nsk = 5% S opt =  [(1 - b) * (1 + k) * T inst / (b * c)] = 9.7  9 T seg = 10 ns

17 Anshul Kumar, CSE IITD slide 17 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns T seg = 10 ns S = 10  t = 14.5 ns S *  t = 145 ns

18 Anshul Kumar, CSE IITD slide 18 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns S = 9 T seg = 13 ns  t = 17.65 ns S *  t = 159 ns

19 Anshul Kumar, CSE IITD slide 19 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns T seg = 20 ns S = 5  t = 25 ns S *  t = 125 ns

20 Anshul Kumar, CSE IITD slide 20 ComparisonComparison

21 Anshul Kumar, CSE IITD slide 21 Cycle Quantization Delays are not integral multiple of clock period Total overhead = clocking overhead + quantization overhead S *  t  T inst + S * C(ignoring k) quantization overhead = S * (  t - C) -T inst reduces as clock period becomes small

22 Anshul Kumar, CSE IITD slide 22 Other Timing Approaches Self Timed Circuits –No centralized free running clock –An operation begins as soon as its inputs are available, that is, all its predecessors have completed –Higher speed, lower power consumption Wave Pipelining –Omit inter-stage registers –Reduced clocking overhead

23 Anshul Kumar, CSE IITD slide 23 Conventional vs Wave Pipelining Conventional Pipeline Registers separate adjoining stages Clock period > max prop delay Inter-stage data stored in registers Wave Pipeline No registers between adjoining stages Clock period less than max prop delay Waves of data propagate through combinational network (effectively, data is stored in the combinational circuit delay!)

24 Anshul Kumar, CSE IITD slide 24 No pipelining X Clock Reg X X’Y Y

25 Conventional pipelining X Clock Reg X X’YY’ZZ’W X’ Y Y’ Z Z’ W

26 Anshul Kumar, CSE IITD slide 26 Wave pipelining X Clock Reg X Z’W W

27 Anshul Kumar, CSE IITD slide 27 TimingTiming Comb ckt XY Clock Reg X Y p propagation delay s set-up time T  p + s T clock period

28 Anshul Kumar, CSE IITD slide 28 Timing with clock skew Comb ckt XY Clock Reg X Y ps T T  p + s + 2    Clock skew = 

29 Anshul Kumar, CSE IITD slide 29 Variation in propagation delay Different delays in different paths Delay variation due to process / temperature/ power variations Data-dependent delay variations

30 Anshul Kumar, CSE IITD slide 30 Timing for wave pipelining Comb ckt XY Clock Reg X Y T   p + s + 4   pmin pmax pp T

31 Anshul Kumar, CSE IITD slide 31 Timing for wave pipelining (expanded view) pmin  (n-1) T + 2  nT  pmax + s + 2   T   p + s + 4  pp T X Y (n-1) T nT pminpmax

32 Anshul Kumar, CSE IITD slide 32 ComparisonComparison Conventional Pipeline T  pmax/n + s + 2  (plus cycle quantization overhead) nT  pmax + ns + 2n  Wave Pipeline T   p + s + 4  nT  pmax + s + 2 

33 Anshul Kumar, CSE IITD slide 33 Problems with wave pipelining Need to balance delays Narrow range of clock frequencies Control difficult Not very suitable for non-linear pipelines

34 Anshul Kumar, CSE IITD slide 34 Additional Reading Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474.


Download ppt "Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006."

Similar presentations


Ads by Google