Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Lecture 4: CPU Performance
Adding the Jump Instruction
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
COMP25212 Further Pipeline Issues. Cray 1 COMP25212 Designed in 1976 Cost $8,800,000 8MB Main Memory Max performance 160 MFLOPS Weight 5.5 Tons Power.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006.
Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.
Slide 1Michael Flynn EE382 Winter/99 EE382 Processor Design Winter 1999 Chapter 2 Lectures Clocking and Pipelining.
1 RISC Pipeline Han Wang CS3410, Spring 2010 Computer Science Cornell University See: P&H Chapter 4.6.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks.
The Optimum Pipeline Depth for a Microprocessor Fang Pang Oct/01/02.
©UCB CS 162 Computer Architecture Lecture 1 Instructor: L.N. Bhuyan
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
Parallelism Processing more than one instruction at a time. Pipelining
Asanovic/Devadas Spring Simple Instruction Pipelining Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology.
Introduction 9th January, 2006 CSL718 : Architecture of High Performance Systems.
Outline Classification ILP Architectures Data Parallel Architectures
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Example Architectures 6th Apr, 2006.
Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.
Anshul Kumar, CSE IITD CSL718 : Pipelined Processors  Types of Pipelines  Types of Hazards 16th Jan, 2006.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
1 Pipelining Part I CS What is Pipelining? Like an Automobile Assembly Line for Instructions –Each step does a little job of processing the instruction.
Pipelining and Parallelism Mark Staveley
Reconfigurable Computing - Pipelined Systems John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound, Western.
How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.
COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
CSL718 : Pipelined Processors
Lecture 18: Pipelining I.
Computer Organization
15-740/ Computer Architecture Lecture 4: Pipelining
CSL718 : Superscalar Processors
Lecture 16: Basic Pipelining
Lecture: Pipelining Basics
CMSC 611: Advanced Computer Architecture
Timing Model of a Superscalar O-o-O processor in HAsim Framework
15-740/ Computer Architecture Lecture 7: Pipelining
Pipelining.
Pipelining.
Chapter 4 The Processor Part 2
Vishwani D. Agrawal James J. Danaher Professor
Topic 6: Pipelining and Pipelined Architecture
Serial versus Pipelined Execution
Computer Structure S.Abinash 11/29/ _02.
The Processor Lecture 3.4: Pipelining Datapath and Control
Topic 6: Pipelining and Pipelined Architecture
Vishwani D. Agrawal James J. Danaher Professor
Pipelining Appendix A and Chapter 3.
Levels in Processor Design
How does the CPU work? CPU’s program counter (PC) register has address i of the first instruction Control circuits “fetch” the contents of the location.
Instruction Level Parallelism
Lecture: Pipelining Basics
Pipelining.
Presentation transcript:

Anshul Kumar, CSE IITD CSL718 : Pipelined Processors PipelineTimings 12th Jan, 2006

Anshul Kumar, CSE IITD slide 2 Pipelined Processors Function-parallel Instr level (ILP)Thread levelProcess level Pipelined processors VLIWs Superscalar processors Parallel architectures Data-parallel Intel’s terminology: intra ILP inter ILP

Anshul Kumar, CSE IITD slide 3 Processor Performance MIPS and MFLOPS may not truly represent performance Execution time of a program true measure of performance SPEC rating acceptable

Anshul Kumar, CSE IITD slide 4 Execution Time and Clock Period Program exec time = T prog = N * T inst = N * CPI *  t N :Number of instructions CPI :Cycles per instruction(Av)  t :Clock cycle time IF D RF EX/AG M WB Instruction execution time = T inst = CPI*  t tt

Anshul Kumar, CSE IITD slide 5 What influences clock period? T prog = N * CPI *  t Technology -  t  Software - N  Architecture - N * CPI *  t  Instruction set architecture (ISA) trade-off N vs CPI *  t Micro architecture (  A) trade-offCPI vs  t

Anshul Kumar, CSE IITD slide 6 Determining Clock Period Clock Period =  t = P max P max = max propagation delay Clock P max Comb Reg

Anshul Kumar, CSE IITD slide 7 Ideal Pipelining  t = T inst / S CPI = 1 Effective time per inst T eff = 1 * T inst / S T inst S stages

Anshul Kumar, CSE IITD slide 8 Pipelining with hazards  t = T inst / S CPI = 1 + (S - 1) * b T eff = (1 + (S - 1) * b) * T inst / S T inst S stages Frequency of interruptions - b

Anshul Kumar, CSE IITD slide 9

Anshul Kumar, CSE IITD slide 10 A more realistic view  t = P max + C P max = max propagation delay C = clocking overhead Clock P max C Comb Reg

Anshul Kumar, CSE IITD slide 11 Clocking Overhead Fixed overheadc –Setup time –Output delay Variable overhead (stretching factor)k –Clock skew  t = T inst / S + k * T inst / S + c = (1 + k) * T inst / S + c

Anshul Kumar, CSE IITD slide 12 Pipelining with Clocking Overhead T eff = [1 + (S - 1) * b] * [(1 + k) * T inst / S + c] S opt =  [(1 - b) * (1 + k) * T inst / (b * c)]

Anshul Kumar, CSE IITD slide 13

Anshul Kumar, CSE IITD slide 14 Partitioning instruction into cycles with non-uniform stage times IF D RF AG T DF EX PA One action - one pipeline stage => large quantization overhead Multiple actions per stage? Multiple stages per action?

Anshul Kumar, CSE IITD slide 15 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute ns

Anshul Kumar, CSE IITD slide 16 Optimal Pipelining T inst = = 90 ns b = 0.2c = 4 nsk = 5% S opt =  [(1 - b) * (1 + k) * T inst / (b * c)] = 9.7  9 T seg = 10 ns

Anshul Kumar, CSE IITD slide 17 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute ns T seg = 10 ns S = 10  t = 14.5 ns S *  t = 145 ns

Anshul Kumar, CSE IITD slide 18 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute ns S = 9 T seg = 13 ns  t = ns S *  t = 159 ns

Anshul Kumar, CSE IITD slide 19 ExampleExample Put Away 2 ns Data - ALU 3 ns Addr - MAR 3 ns Data - IR 3 ns PC - MAR 4 ns Cache Dir 6 ns Cache Data 10 ns Decode 6+6 ns Gen Addr 9ns Cache Data 10 ns Execute ns T seg = 20 ns S = 5  t = 25 ns S *  t = 125 ns

Anshul Kumar, CSE IITD slide 20 ComparisonComparison

Anshul Kumar, CSE IITD slide 21 Cycle Quantization Delays are not integral multiple of clock period Total overhead = clocking overhead + quantization overhead S *  t  T inst + S * C(ignoring k) quantization overhead = S * (  t - C) -T inst reduces as clock period becomes small

Anshul Kumar, CSE IITD slide 22 Other Timing Approaches Self Timed Circuits –No centralized free running clock –An operation begins as soon as its inputs are available, that is, all its predecessors have completed –Higher speed, lower power consumption Wave Pipelining –Omit inter-stage registers –Reduced clocking overhead

Anshul Kumar, CSE IITD slide 23 Conventional vs Wave Pipelining Conventional Pipeline Registers separate adjoining stages Clock period > max prop delay Inter-stage data stored in registers Wave Pipeline No registers between adjoining stages Clock period less than max prop delay Waves of data propagate through combinational network (effectively, data is stored in the combinational circuit delay!)

Anshul Kumar, CSE IITD slide 24 No pipelining X Clock Reg X X’Y Y

Conventional pipelining X Clock Reg X X’YY’ZZ’W X’ Y Y’ Z Z’ W

Anshul Kumar, CSE IITD slide 26 Wave pipelining X Clock Reg X Z’W W

Anshul Kumar, CSE IITD slide 27 TimingTiming Comb ckt XY Clock Reg X Y p propagation delay s set-up time T  p + s T clock period

Anshul Kumar, CSE IITD slide 28 Timing with clock skew Comb ckt XY Clock Reg X Y ps T T  p + s + 2    Clock skew = 

Anshul Kumar, CSE IITD slide 29 Variation in propagation delay Different delays in different paths Delay variation due to process / temperature/ power variations Data-dependent delay variations

Anshul Kumar, CSE IITD slide 30 Timing for wave pipelining Comb ckt XY Clock Reg X Y T   p + s + 4   pmin pmax pp T

Anshul Kumar, CSE IITD slide 31 Timing for wave pipelining (expanded view) pmin  (n-1) T + 2  nT  pmax + s + 2   T   p + s + 4  pp T X Y (n-1) T nT pminpmax

Anshul Kumar, CSE IITD slide 32 ComparisonComparison Conventional Pipeline T  pmax/n + s + 2  (plus cycle quantization overhead) nT  pmax + ns + 2n  Wave Pipeline T   p + s + 4  nT  pmax + s + 2 

Anshul Kumar, CSE IITD slide 33 Problems with wave pipelining Need to balance delays Narrow range of clock frequencies Control difficult Not very suitable for non-linear pipelines

Anshul Kumar, CSE IITD slide 34 Additional Reading Wayne P. Burleson, Maciej Ciesielski, Fabian Klass, and Wentai Liu, “Wave-Pipelining: A Tutorial and Research Survey”, IEEE Trans. on VLSI Systems, vol. 6, no. 3, September 1998, pp. 464 – 474.