The Optimum Pipeline Depth for a Microprocessor Fang Pang Oct/01/02.

Slides:

Advertisements

Similar presentations

Instruction Level Parallelism and Superscalar Processors

Advertisements

Tor Aamodt EECE 476: Computer Architecture Slide Set #6: Multicycle Operations.

Final Project : Pipelined Microprocessor Joseph Kim.

Pipeline Summary Try to put everything together for pipelines Before going onto caches. Peer Instruction Lecture Materials for Computer Architecture by.

Out Of Order Execution (Part 1) Updated by Franck Sala.

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. CPUs CPU performance CPU power consumption. 1.

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.

Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.

Pipelining 5. Two Approaches for Multiple Issue Superscalar –Issue a variable number of instructions per clock –Instructions are scheduled either statically.

1 Advanced Computer Architecture Limits to ILP Lecture 3.

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Datorteknik Pipelining bild 1 Acceleration How to improve speed? At what costs?

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.

ECE 4100/6100 Advanced Computer Architecture Lecture 3 Performance Prof. Hsien-Hsin Sean Lee School of Electrical and Computer Engineering Georgia Institute.

S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.

Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.

Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.

Electrical and Computer Engineering Department

Superscalar Processors (Pictured above is the DEC Alpha 21064) Presented by Jeffery Aguiar.

1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.

1 Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections )

Pipelining Andreas Klappenecker CPSC321 Computer Architecture.

Computer Architecture Lecture 2 Instruction Set Principles.

Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.

Pipelined Processor II CPSC 321 Andreas Klappenecker.

7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.

Appendix A Pipelining: Basic and Intermediate Concepts

Pipelining Basics Assembly line concept An instruction is executed in multiple steps Multiple instructions overlap in execution A step in a pipeline is.

Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.

W.S Computer System Design Lecture 7 Wannarat Suntiamorntut.

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Chapter 2 Summary Classification of architectures Features that are relatively independent of instruction sets “Different” Processors –DSP and media processors.

July 2005Computer Architecture, Data Path and ControlSlide 1 Part IV Data Path and Control.

Microprocessor Microarchitecture Limits of Instruction-Level Parallelism Lynn Choi Dept. Of Computer and Electronics Engineering.

Computer Organization and Architecture Tutorial 1 Kenneth Lee.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

ECE 252 / CPS 220 Pipelining Professor Alvin R. Lebeck Compsci 220 / ECE 252 Fall 2008.

CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

Classic Model of Parallel Processing

1 COMP541 Pipelined MIPS Montek Singh Mar 30, 2010.

Lecture 8Fall 2006 Chapter 6: Superscalar Adapted from Mary Jane Irwin at Penn State University for Computer Organization and Design, Patterson & Hennessy,

E X C E E D I N G E X P E C T A T I O N S VLIW-RISC CSIS Parallel Architectures and Algorithms Dr. Hoganson Kennesaw State University Instruction.

Pipelining and Parallelism Mark Staveley

CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S

How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.

Modern processor design

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

Introduction to Computer Organization Pipelining.

CSIE30300 Computer Architecture Unit 13: Introduction to Multiple Issue Hsin-Chou Chi [Adapted from material by and

Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.

PipeliningPipelining Computer Architecture (Fall 2006)

ECE/CS 552: Pipelining © Prof. Mikko Lipasti

Lynn Choi Dept. Of Computer and Electronics Engineering

CS203 – Advanced Computer Architecture

Review of Computer Organization

Computer Architecture

Topic 6: Pipelining and Pipelined Architecture

Pipeline Hazards

Pipeline Performance.

Pipelining: critical path, pipeline hazards Prof. Eric Rotenberg

Part IV Data Path and Control

Acceleration How to improve speed? At what costs?.

Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2,

Procedure Return Predictors

Presentation transcript:

The Optimum Pipeline Depth for a Microprocessor Fang Pang Oct/01/02

The choice of the structure of the pipeline is fundamental in the design of a microprocessor. Is there an optimum pipeline depth for a microprocessor that gives the best performance?

There is a tradeoff between the greater throughput of a deeper pipeline and the larger penalty for hazards in the deeper pipeline. This tradeoff leads to an optimum design point.

Two intuitive ways to see that performance will be optimal for a specific pipeline depth : CPI (Cycles / Instruction) Cycle time of a processor

The true measure of performance in the processor is the average time it takes to execute an instruction. This is the time / Instruction (TPI), the inverse of the MIPs ( Million Instructions per second ) number. The TPI is just the product of the cycle time and the CPI.

How a Processor spend its time? T = T BZ + T NBZ (T BZ : the time that the execution unit is doing useful work ; T NBZ : the time that the execution is stalled by any of pipeline hazards. )

Processor’s busy time (T BZ ) T BZ = N I * t s (N I: the number of instructions ; t s : the time for an instruction to pass each stage of the pipeline ) t s = t o + t p / p (t o: the latch overhead for the technology used; t p: the total logic delay of the processor ; p: the number of pipeline stages in the design ) T BZ = N I * ( t o + t p / p)

For a superscalar processor, multiple instructions may be executed at the same time. T BZ = ( N I /a ) * ( t o + t p / p) ( a: measure of the average degree of superscalar processing whenever the e-unit is busy)

Processor’s not-busy time (T NBZ ) Considering each pipeline hazard causes a full pipeline stall T NBZ = N H * t pipe (N H: the number of pipeline hazards ; t pipe: the total pipeline delay ) t pipe = t s * p = ( t o + t p / p)* p = t o * p + t p The total pipeline delay is just the product of each pipeline stage delay, ts, and the number of pipeline stages in the processor. T NBZ = N H * ( t o * p + t p )

Considering each pipeline hazard has its own not-busy time T NBZ = N H * ( t o * p + t p )* γ (t hazard : each hazard’s not-busy time ;  h: the fraction of the total pipeline delay encountered by each particular hazard, between 0 and 1 ) (γ : the fraction of the total pipeline delay averaged over all hazards )

T = T BZ + T NBZ = (N I /a)*( t o + t p / p)+ N H *(t o * p + t p )* γ Processor time N H / N I: depend on the workload being executed and microarchitecture (EX: branch prediction accuracy ) a, γ : depend on microarchitecture and the workload t o: depends on technology t p : depends on technology and microarchitecture

Optimum pipeline depth P opt 2 = ( N I t p ) / ( a N H γ t o ) When we can have deeper pipeline ? N H decreases, workloads have fewer hazards. t o decreases, technology reduces the latch overhead, relative to the total logic path, t p. γ decreases, the fraction of the pipeline that hazards stall decreases.

Simulator result

Optimum pipeline depth’s various dependencies

Dependence on the degree of superscalar processing (a)

Dependence on the degree of pipeline hazard (N H, γ)

Summary A theory has been presented of the optimum pipeline depth for a microprocessor. The theory has been tested by simulating a variable depth pipeline model, and the two are found to be in excellent agreement. It is found that the competition between "storing" instructions in a deeper pipeline to increase throughput and limiting the number of pipeline stalls from various pipeline hazards, results in an optimum pipeline depth.

Discussion