Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

Pipelining (Week 8).
1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
1 Lecture: Out-of-order Processors Topics: out-of-order implementations with issue queue, register renaming, and reorder buffer, timing, LSQ.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Chapter 8. Pipelining.
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
Pipelined Processor II (cont’d) CPSC 321
Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 7: Out-of-Order Processors Today: out-of-order pipeline, memory disambiguation, basic branch prediction (Sections 3.4, 3.5, 3.7)
Chapter 12 Pipelining Strategies Performance Hazards.
Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipelining Andreas Klappenecker CPSC321 Computer Architecture.
1 Lecture 18: Pipelining Today’s topics:  Hazards and instruction scheduling  Branch prediction  Out-of-order execution Reminder:  Assignment 7 will.
Pipelined Processor II CPSC 321 Andreas Klappenecker.
1 Lecture 9: Dynamic ILP Topics: out-of-order processors (Sections )
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.
COMPUTER ARCHITECTURE Assoc.Prof. Stasys Maciulevičius Computer Dept.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
Topics covered: Pipelining CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
F 1 E 1 F 2 E 2 F 3 E 3 F 1 E 1 F 2 E 2 F 3 E 3 I 1 I 2 I 3 I 1 I 2 I 3 Instruction (a) Sequential execution (c) Pipelined execution Figure 8.1. Basic.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
1 Pipelining (Chapter 8) TU-Delft TI1400/12-PDS Course website:
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
COMPSYS 304 Computer Architecture Speculation & Branching Morning visitors - Paradise Bay, Bay of Islands.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
Chapter Six.
Computer Architecture Chapter (14): Processor Structure and Function
CDA3101 Recitation Section 8
Instruction Level Parallelism
Pipelining Chapter 6.
William Stallings Computer Organization and Architecture 8th Edition
Lecture: Out-of-order Processors
Single Clock Datapath With Control
Pipeline Implementation (4.6)
CDA 3101 Spring 2016 Introduction to Computer Organization
Morgan Kaufmann Publishers The Processor
Lecture 6: Advanced Pipelines
Lecture 10: Out-of-order Processors
Lecture 11: Out-of-order Processors
Lecture: Out-of-order Processors
Pipelining Chapter 6.
Chapter 8. Pipelining.
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture 18: Pipelining Today’s topics:
Systems Architecture II
Chapter 8. Pipelining.
Chapter 8. Pipelining.
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Chapter Six.
Chapter Six.
* From AMD 1996 Publication #18522 Revision E
Instruction Execution Cycle
Chapter 8. Pipelining.
Pipeline Control unit (highly abstracted)
Pipelining Chapter 6.
Conceptual execution on a processor which exploits ILP
Presentation transcript:

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.1. Basic idea of instruction pipelining.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.2. A 4-stage pipeline.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.4. Pipeline stall caused by a cache miss in F2.

Figure 8.6. Pipeline stalled by data dependency between D 2 and W 1.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.7. Operand forwarding in a pipelined processor.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure 8.9. Branch timing.

F : Fetch instruction E : Execute instruction W : Write results D : Dispatch/ Decode Instruction queue Instruction fetch unit Figure Use of an instruction queue in the hardware organization of Figure 8.2b. unit

X Figure 8.11.Branch timing in the presence of an instruction queue. Branch target address is computed in the D stage. F 1 D 1 E 1 E 1 E 1 W 1 F 4 W 3 E 3 I 5 (Branch) I 1 F 2 D Clock cycle E 2 W 2 F 3 D 3 E 4 D 4 W 4 F 5 D 5 F 6 F k D k E k F k+1 D 1 I 2 I 3 I 4 I 6 I k I 1 W k E Queue length1 Time

Add LOOPShift_leftR1 Decrement Branch=0 R2 LOOP NEXT (a) Original program loop LOOPDecrementR2 Branch=0 Shift_left LOOP R1 NEXT (b) Reordered instructions Figure Reordering of instructions for a delayed branch. Add R1,R3

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Execution timing showing the delay slot being filled during the last two passes through the loop in Figure 8.12.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Timing when a branch decision has been incorrectly predicted as not taken.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure State-machine representation of branch prediction algorithms.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Figure Equivalent operations using complex and simple addressing modes.

Add Compare Branch=0 R1,R2 R3,R4... Compare Add Branch=0 R3,R4 R1,R2... (a) A program fragment (b) Instructions reordered Figure Instruction reordering.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Datapath modified for pipelined execution, with Interstage buffers at the input and output of the ALU.

I 1 (Fadd) D 1 D 2 D 3 D 4 E 1A E 1B E 1C E 2 E 3 E 3 E 3 E 4 W 1 W 2 W 3 W 4 I 2 (Add) I 3 (Fsub) I 4 (Sub) Figure 8.20.An example of instruction execution flow in the processor of Figure 8.19, assuming no hazards are encountered Clock cycle Time F 1 F 2 F 3 F 4 7

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Instruction completion in program order.

LDXR3,0,R6Loadnumberofitemsinthelist. ORR0, R4 tobeusedasoffsetinthelist ORR0, R7ClearR7tobeusedasaccumulator. LOOPSTARTLDXR3,R4,R5LoadlistitemintoR5. ADDR5,R7,R7Addnumbertoaccumulator. ADDR4,8,R4Pointtothenextentry. SUBccR6,1,R6DecrementR6andsetconditionflags. BGxcc,LOOPSTARTLoopifmoreitemsinthelist. NEXT... (a) Desired program loop LDXR3,0,R6 ORR0, R4 ORR0, R7 LOOPSTARTLDXR3,R4,R5 ADDR4,8,R4 SUBccR6,1,R6 BG,ptxcc,LOOPSTARTPredictedtaken,Annulbit=0 ADDR5,R7,R7 NEXT... (b) Instructions reorganized to use the delay slot Figure 8.22.An addition loop showing the use of the branch delay slot and branch prediction.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Main building blocks of the UltraSPARC II processor.

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Example of instruction grouping.

ADDR3,R5,R6GECN1N2N3W LDSWR4,R7,R6GECN1N2N3W (a) Instructions with common destination MOVRZR1,R6,R7GECN1N2N3W ORR7,R8,R9GECN1N2N3W (b) Delay caused by MOVR instruction Figure 8.26 Dispatch delays due to hazards.

Inte ger re gister file Anne x IEU0 IEU1 ALU Interstage buffers Figure Integer execution unit.

I 1 (Icc)GEC I 2 (BRcc)GEC I 3 GEC I 4 GEC I 5 GE I 6 GE I 7 GE I 8 GE I 9 G I 10 G I 11 G I 12 G  Abort Figure Worst-case timing for an incorrectly predicted branch.

Integer register file/ annex Figure Load and store unit. GECN1 data tags dTLB D-Cache Compare Load/store queue Miss To E-Cache

Please see “portrait orientation” PowerPoint file for Chapter 8 Figure Execution flow.

Please see “portrait orientation” PowerPoint file for Chapter 8 Table 8.1. Examples of SPARC instructions.