Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006.

Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006

Anshul Kumar, CSE IITD slide 2 Pipelined Processors Function-parallel Instr level (ILP)Thread levelProcess level Pipelined processors VLIWs Superscalar processors Parallel architectures Data-parallel Intel’s terminology: intra ILP inter ILP

Anshul Kumar, CSE IITD slide 3 Processor Performance MIPS and MFLOPS may not truly represent performance Execution time of a program true measure of performance SPEC rating acceptable

Anshul Kumar, CSE IITD slide 4 Execution Time and Clock Period Program exec time = T prog = N * T inst = N * CPI *  t N :Number of instructions CPI :Cycles per instruction(Av)  t :Clock cycle time IF D RF EX/AG M WB Instruction execution time = T inst = CPI*  t tt

Anshul Kumar, CSE IITD slide 5 What influences clock period? T prog = N * CPI *  t Technology -  t  Software - N  Architecture - N * CPI *  t  Instruction set architecture (ISA) trade-off N vs CPI *  t Micro architecture (  A) trade-offCPI vs  t

Anshul Kumar, CSE IITD slide 6 Determining Clock Period Clock Period =  t = P max P max = max propagation delay Clock P max Comb Reg

Anshul Kumar, CSE IITD slide 7 Ideal Pipelining  t = T inst / S CPI = 1 Effective time per inst T eff = 1 * T inst / S T inst S stages

Anshul Kumar, CSE IITD slide 8 Pipelining with hazards  t = T inst / S CPI = 1 + (S - 1) * b T eff = (1 + (S - 1) * b) * T inst / S T inst S stages Frequency of interruptions - b

Anshul Kumar, CSE IITD slide 9

Anshul Kumar, CSE IITD slide 10 A more realistic view  t = P max + C P max = max propagation delay C = clocking overhead Clock P max C Comb Reg

Anshul Kumar, CSE IITD slide 11 Clocking Overhead Fixed overheadc –Setup time –Output delay Variable overhead (stretching factor)k –Clock skew  t = T inst / S + k * T inst / S + c = (1 + k) * T inst / S + c

Anshul Kumar, CSE IITD slide 12 Pipelining with Clocking Overhead T eff = [1 + (S - 1) * b] * [(1 + k) * T inst / S + c] S opt =  [(1 - b) * (1 + k) * T inst / (b * c)]

Anshul Kumar, CSE IITD slide 13

Anshul Kumar, CSE IITD slide 14 Partitioning instruction into cycles with non-uniform stage times IF D RF AG T DF EX PA One action - one pipeline stage => large quantization overhead Multiple actions per stage? Multiple stages per action?

Anshul Kumar, CSE IITD slide 15 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns

Anshul Kumar, CSE IITD slide 16 Optimal Pipelining T inst = 4+6+10+3+12+9+3+6+10+3+22+2 = 90 ns b = 0.2c = 4 nsk = 5% S opt =  [(1 - b) * (1 + k) * T inst / (b * c)] = 9.7  9 T seg = 10 ns

Anshul Kumar, CSE IITD slide 17 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns T seg = 10 ns S = 10  t = 14.5 ns S *  t = 145 ns

Anshul Kumar, CSE IITD slide 18 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns S = 9 T seg = 13 ns  t = 17.65 ns S *  t = 159 ns

Anshul Kumar, CSE IITD slide 19 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute 7+7+8 ns T seg = 20 ns S = 5  t = 25 ns S *  t = 125 ns

Anshul Kumar, CSE IITD slide 20 ComparisonComparison

Anshul Kumar, CSE IITD slide 21 Cycle Quantization Delays are not integral multiple of clock period Total overhead = clocking overhead + quantization overhead S *  t  T inst + S * C(ignoring k) quantization overhead = S * (  t - C) -T inst reduces as clock period becomes small

Anshul Kumar, CSE IITD slide 22 Other Timing Approaches Self Timed Circuits –No centralized free running clock –An operation begins as soon as its inputs are available, that is, all its predecessors have completed –Higher speed, lower power consumption Wave Pipelining –Omit inter-stage registers –Reduced clocking overhead

Anshul Kumar, CSE IITD slide 23 Conventional vs Wave Pipelining Conventional Pipeline Registers separate adjoining stages Clock period > max prop delay Inter-stage data stored in registers Wave Pipeline No registers between adjoining stages Clock period less than max prop delay Waves of data propagate through combinational network (effectively, data is stored in the combinational circuit delay!)

Anshul Kumar, CSE IITD slide 24 No pipelining X Clock Reg X X’Y Y

Conventional pipelining X Clock Reg X X’YY’ZZ’W X’ Y Y’ Z Z’ W

Anshul Kumar, CSE IITD slide 26 Wave pipelining X Clock Reg X Z’W W

Anshul Kumar, CSE IITD slide 27 TimingTiming Comb ckt XY Clock Reg X Y p propagation delay s set-up time T  p + s T clock period

Anshul Kumar, CSE IITD slide 28 Timing with clock skew Comb ckt XY Clock Reg X Y ps T T  p + s + 2    Clock skew = 

Anshul Kumar, CSE IITD slide 29 Variation in propagation delay Different delays in different paths Delay variation due to process / temperature/ power variations Data-dependent delay variations

Anshul Kumar, CSE IITD slide 30 Timing for wave pipelining Comb ckt XY Clock Reg X Y T   p + s + 4   pmin pmax pp T

Anshul Kumar, CSE IITD slide 31 Timing for wave pipelining (expanded view) pmin  (n-1) T + 2  nT  pmax + s + 2   T   p + s + 4  pp T X Y (n-1) T nT pminpmax

Anshul Kumar, CSE IITD slide 32 ComparisonComparison Conventional Pipeline T  pmax/n + s + 2  (plus cycle quantization overhead) nT  pmax + ns + 2n  Wave Pipeline T   p + s + 4  nT  pmax + s + 2 

Anshul Kumar, CSE IITD slide 33 Problems with wave pipelining Need to balance delays Narrow range of clock frequencies Control difficult Not very suitable for non-linear pipelines

Anshul Kumar, CSE IITD slide 34 Additional Reading Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474.

Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006.

Similar presentations

Presentation on theme: "Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006.

Similar presentations

Presentation on theme: "Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006."— Presentation transcript:

Similar presentations

About project

Feedback