Part IV Data Path and Control

Slides:

Advertisements

Similar presentations

1 Pipelining Part 2 CS Data Hazards Data hazards occur when the pipeline changes the order of read/write accesses to operands that differs from.

Advertisements

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Zhao Zhang Iowa State University Revised from original.

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.

Chapter 12 Pipelining Strategies Performance Hazards.

Pipelining III Andreas Klappenecker CPSC321 Computer Architecture.

EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.

Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.

Intel Architecture. Changes in architecture Software architecture: –Front end (Feature changes such as adding more graphics, changing the background colors,

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

July 2005Computer Architecture, Data Path and ControlSlide 1 Part IV Data Path and Control.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.

Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.

CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

Winter 2002CSE Topic Branch Hazards in the Pipelined Processor.

1 Lecture 7: Speculative Execution and Recovery Branch prediction and speculative execution, precise interrupt, reorder buffer.

1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,

1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.

Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 13: Branch prediction (Chapter 4/6)

Use of Pipelining to Achieve CPI < 1

CS 352H: Computer Systems Architecture

Dynamic Scheduling Why go out of style?

Advanced Architectures

Computer Architecture Chapter (14): Processor Structure and Function

Computer Organization CS224

CSL718 : Superscalar Processors

Instruction Level Parallelism

Computer Architecture

Dynamic Branch Prediction

Morgan Kaufmann Publishers

The University of Adelaide, School of Computer Science

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Samira Khan University of Virginia Nov 13, 2017

Pipeline Implementation (4.6)

Morgan Kaufmann Publishers The Processor

Chapter 4 The Processor Part 4

ECS 154B Computer Architecture II Spring 2009

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Part IV Data Path and Control

Instruction Level Parallelism and Superscalar Processors

Morgan Kaufmann Publishers The Processor

Lecture 6: Advanced Pipelines

The processor: Pipelining and Branching

Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2

Branch statistics Branches occur every 4-6 instructions (16-25%) in integer programs; somewhat less frequently in scientific ones Unconditional branches.

Computer Architecture

Krste Asanovic Electrical Engineering and Computer Sciences

Lecture 7: Dynamic Scheduling with Tomasulo Algorithm (Section 2.4)

CS203 – Advanced Computer Architecture

ECE 445 – Computer Organization

CSC3050 – Computer Architecture

Dynamic Hardware Prediction

Part IV Data Path and Control

Presentation transcript:

Part IV Data Path and Control July 2005 Computer Architecture, Data Path and Control

IV Data Path and Control Design a simple computer (MicroMIPS) to learn about: Data path – part of the CPU where data signals flow Control unit – guides data signals through data path Pipelining – a way of achieving greater performance Topics in This Part Chapter 13 Instruction Execution Steps Chapter 14 Control Unit Synthesis Chapter 15 Pipelined Data Paths Chapter 16 Pipeline Performance Limits July 2005 Computer Architecture, Data Path and Control

16 Pipeline Performance Limits Pipeline performance limited by data & control dependencies Hardware provisions: data forwarding, branch prediction Software remedies: delayed branch, instruction reordering Topics in This Chapter 16.1 Data Dependencies and Hazards 16.2 Data Forwarding 16.3 Pipeline Branch Hazards 16.4 Delayed Branch and Branch Prediction 16.5 Dealing with Exceptions 16.6 Advanced Pipelining July 2005 Computer Architecture, Data Path and Control

16.1 Data Dependencies and Hazards Fig. 16.1 Data dependency in a pipeline. July 2005 Computer Architecture, Data Path and Control

Resolving Data Dependencies via Forwarding Fig. 16.2 When a previous instruction writes back a value computed by the ALU into a register, the data dependency can always be resolved through forwarding. July 2005 Computer Architecture, Data Path and Control

Certain Data Dependencies Lead to Bubbles Fig. 16.3 When the immediately preceding instruction writes a value read out from the data memory into a register, the data dependency cannot be resolved through forwarding (i.e., we cannot go back in time) and a bubble must be inserted in the pipeline. July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 16.2 Data Forwarding Fig. 16.4 Forwarding unit for the pipelined MicroMIPS data path. July 2005 Computer Architecture, Data Path and Control

Design of the Data Forwarding Units Let’s focus on designing the upper data forwarding unit Table 16.1 Partial truth table for the upper forwarding unit in the pipelined MicroMIPS data path. Fig. 16.4 Forwarding unit for the pipelined MicroMIPS data path. RegWrite3 RegWrite4 s2matchesd3 s2matchesd4 RetAddr3 RegInSrc3 RegInSrc4 Choose x x2 1 x4 y4 x3 y3 Incorrect in textbook July 2005 Computer Architecture, Data Path and Control

Hardware for Inserting Bubbles Fig. 16.5 Data hazard detector for the pipelined MicroMIPS data path. July 2005 Computer Architecture, Data Path and Control

16.3 Pipeline Branch Hazards Software-based solutions Compiler inserts a “no-op” after every branch (simple, but wasteful) Branch is redefined to take effect after the instruction that follows it Branch delay slot(s) are filled with useful instructions via reordering Hardware-based solutions Mechanism similar to data hazard detector to flush the pipeline Constitutes a rudimentary form of branch prediction: Always predict that the branch is not taken, flush if mistaken More elaborate branch prediction strategies possible July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 16.4 Branch Prediction Predicting whether a branch will be taken  Always predict that the branch will not be taken  Use program context to decide (backward branch is likely taken, forward branch is likely not taken)  Allow programmer or compiler to supply clues  Decide based on past history (maintain a small history table); to be discussed later  Apply a combination of factors: modern processors use elaborate techniques due to deep pipelines July 2005 Computer Architecture, Data Path and Control

Branch-Prediction Buffer (BPB) Also called a “branch history table” Low-order n bits of branch address are used to index a table of branch history data. May have “collisions” between distant branches. Associative tables also possible In each entry, k bits of information about the history (past behavior) of that branch are stored. Common values of k: 1, 2, and larger Entry is used to predict what the branch will do. The actual behavior of the branch will be used to update the entry.

Simple 1-bit Branch Prediction The entry for a branch has only two states: Bit = 1 “The last time this branch was encountered, it was taken. I predict it will be taken next time.” Bit = 0 “The last time this branch was encountered, it was not taken. I predict it will not be taken next time.” This predictor will make 2 mistakes each time a loop is encountered. At the end of the first & last iterations. May always mispredict in pathological cases!

2-bit Branch Prediction 2 bits  4 states Commonly used to code the most recent branch outcome, & the most recent run of 2 consecutive identical outcomes. Strategy: Prediction mirrors most recent run of 2. Only 1 misprediction per loop execution, after the first time the loop is reached. On last iteration

State Transition Diagram State N1 (10) State T1 (01) State N2 (00)

A Simple Branch Prediction Algorithm Fig. 16.6 Four-state branch prediction scheme. Example 16.1 L1: ---- ---- L2: ---- br <c2> L2 br <c1> L1 20 iter’s 10 iter’s Impact of different branch prediction schemes Solution Always taken: 11 mispredictions, 94.8% accurate 1-bit history: 20 mispredictions, 90.5% accurate 2-bit history: Same as always taken July 2005 Computer Architecture, Data Path and Control

Misprediction rate for 2-bit BPB (with 4,096 entries)

Hardware Implementation of Branch Prediction Fig. 16.7 Hardware elements for a branch prediction scheme. The mapping scheme used to go from PC contents to a table entry is the same as that used in direct-mapped caches (Chapter 18) July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control 16.5 Advanced Pipelining Deep pipeline = superpipeline; also, superpipelined, superpipelining Parallel instruction issue = superscalar, j-way issue (2-4 is typical) Fig. 16.8 Dynamic instruction pipeline with in-order issue, possible out-of-order completion, and in-order retirement. July 2005 Computer Architecture, Data Path and Control

Performance Improvement for Deep Pipelines Hardware-based methods Lookahead past an instruction that will/may stall in the pipeline (out-of-order execution; requires in-order retirement) Issue multiple instructions (requires more ports on register file) Eliminate false data dependencies via register renaming Predict branch outcomes more accurately, or speculate Software-based method Pipeline-aware compilation Loop unrolling to reduce the number of branches Loop: Compute with index i Loop: Compute with index i Increment i by 1 Compute with index i + 1 Go to Loop if not done Increment i by 2 Go to Loop if not done July 2005 Computer Architecture, Data Path and Control

CPI Variations with Architectural Features Table 16.2 Effect of processor architecture, branch prediction methods, and speculative execution on CPI. Architecture Methods used in practice CPI Nonpipelined, multicycle Strict in-order instruction issue and exec 5-10 Nonpipelined, overlapped In-order issue, with multiple function units 3-5 Pipelined, static In-order exec, simple branch prediction 2-3 Superpipelined, dynamic Out-of-order exec, adv branch prediction 1-2 Superscalar 2- to 4-way issue, interlock & speculation 0.5-1 Advanced superscalar 4- to 8-way issue, aggressive speculation 0.2-0.5 July 2005 Computer Architecture, Data Path and Control

Development of Intel’s Desktop/Laptop Micros In the beginning, there was the 8080; led to the 80x86 = IA32 ISA Half a dozen or so pipeline stages 80286 80386 80486 Pentium (80586) A dozen or so pipeline stages, with out-of-order instruction execution Pentium Pro Pentium II Pentium III Celeron Two dozens or so pipeline stages Pentium 4 More advanced technology Instructions are broken into micro-ops which are executed out-of-order but retired in-order More advanced technology July 2005 Computer Architecture, Data Path and Control

16.6 Dealing with Exceptions Exceptions present the same problems as branches How to handle instructions that are ahead in the pipeline? (let them run to completion and retirement of their results) What to do with instructions after the exception point? (flush them out so that they do not affect the state) Precise versus imprecise exceptions Precise exceptions hide the effects of pipelining and parallelism by forcing the same state as that of strict sequential execution (desirable, because exception handling is not complicated) Imprecise exceptions are messy, but lead to faster hardware (interrupt handler can clean up to offer precise exception) July 2005 Computer Architecture, Data Path and Control

Computer Architecture, Data Path and Control Where Do We Go from Here? Memory Design: How to build a memory unit that responds in 1 clock Input and Output: Peripheral devices, I/O programming, interfacing, interrupts Higher Performance: Vector/array processing Parallel processing July 2005 Computer Architecture, Data Path and Control