Lecture 4: CPU Performance

Slides:

Advertisements

Similar presentations

Advertisements

Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.

CMPT 334 Computer Organization

Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.

Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.

Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.

Pipelining Preview Basics & Challenges

Instruction-Level Parallelism (ILP)

© Kavita Bala, Computer Science, Cornell University Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipelining See: P&H Chapter 4.5.

S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.

1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.

1 Recap (Pipelining). 2 What is Pipelining? A way of speeding up execution of tasks Key idea : overlap execution of multiple taks.

Computer ArchitectureFall 2007 © October 31, CS-447– Computer Architecture M,W 10-11:20am Lecture 17 Review.

Appendix A Pipelining: Basic and Intermediate Concepts

ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.

Lecture 15: Pipelining and Hazards CS 2011 Fall 2014, Dr. Rozier.

Pipeline Hazard CT101 – Computing Systems. Content Introduction to pipeline hazard Structural Hazard Data Hazard Control Hazard.

Lecture 7: Pipelining Review Kai Bu

1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.

Memory/Storage Architecture Lab Computer Architecture Pipelining Basics.

Lecture 05: Pipelining Basics & Hazards Kai Bu

Computer Science Education

Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,

Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.

Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.

Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.

Pipeline Hazards. CS5513 Fall Pipeline Hazards Situations that prevent the next instructions in the instruction stream from executing during its.

CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.

1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.

Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.

How Computers Work Lecture 12 Page 1 How Computers Work Lecture 12 Introduction to Pipelining.

Pipelining Example Laundry Example: Three Stages

Instructor: Senior Lecturer SOE Dan Garcia CS 61C: Great Ideas in Computer Architecture Pipelining Hazards 1.

EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.

1. Convert the RISCEE 1 Architecture into a pipeline Architecture (like Figure 6.30) (showing the number data and control bits). 2. Build the control line.

11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.

10/11: Lecture Topics Execution cycle Introduction to pipelining

Introduction to Computer Organization Pipelining.

Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.

LECTURE 10 Pipelining: Advanced ILP. EXCEPTIONS An exception, or interrupt, is an event other than regular transfers of control (branches, jumps, calls,

Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.

Lecture 5. MIPS Processor Design Pipelined MIPS #1 Prof. Taeweon Suh Computer Science & Engineering Korea University COSE222, COMP212 Computer Architecture.

Real-World Pipelines Idea Divide process into independent stages

Computer Organization

Stalling delays the entire pipeline

ARM Organization and Implementation

Morgan Kaufmann Publishers

ELEN 468 Advanced Logic Design

Morgan Kaufmann Publishers The Processor

Single Clock Datapath With Control

Pipeline Implementation (4.6)

CDA 3101 Spring 2016 Introduction to Computer Organization

Pipelining: Advanced ILP

Morgan Kaufmann Publishers The Processor

Pipelining review.

Serial versus Pipelined Execution

Pipelining in more detail

CSC 4250 Computer Architectures

Data Hazards Data Hazard

The Processor Lecture 3.6: Control Hazards

Control unit extension for data hazards

Pipelined Implementation : Part I

Pipelining: Basic Concepts

Pipelining Appendix A and Chapter 3.

Pipelining Hazards.

Presentation transcript:

Lecture 4: CPU Performance

A Modern Processor Intel Core i7

Processor Performance Lower bounds that characterize the maximum performance: Latency Bound Occurs when operations must be performed in strict sequence (e.g. data dependency) Minimum time to perform the operations sequentially Throughput Bound Characterizes the raw computing capacity of the processor’s functional units. Maximum operations per cycle

Pipelining s1 s2 s3 Without pipeline With pipeline stages stages s3 s3 time time Without pipeline With pipeline

Pipelining Without pipeline With pipeline T1 = s . t . n stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline T1 = s . t . n Tp = s . t + (n-1).t Speedup = T1 / Tp = s.n = s . s+(n-1) s/n +(1-1/n) Speedup = s n s – stages n – tasks t – time per stage Throughput = n . Tp

Pipelining Slowest stage determines the pipeline performance s1 s2 s3 10 30 20 s1 s2 s3 stages stages s3 s3 s2 s2 s1 s1 time time Without pipeline With pipeline Slowest stage determines the pipeline performance

Computational Pipelines Combinatorial logic Reg clock R R R Comb.log. A Comb.log. B Comb.log. C clock

Limitations of Pipelining Nonuniform partitioning Stage delays may be nonuniform Throughput is limited by the slowest stage Deep pipelining Large number of stages Modern processors have deep pipelines (15 or more) to increase the clock rate. 50ps 20ps 150ps 20ps 100ps 20ps Comb.log. A R B C clock 50ps 20ps 50ps 20ps 50ps 20ps R R R … Comb.log. A Comb.log. B Comb.log. C clock

Pipelined Parallel Adder a4,b4 a3,b3 a2,b2 a1,b1

Pipelined Parallel Adder c4,d4 c3,d3 c2,d2 c1,d1 a4,b4 a3,b3 a2,b2 a1+b1

Pipelined Parallel Adder e4,f4 e3,f3 e2,f2 e1,f1 c2,d2 c1+d1 c4,d4 c3,d3 a3,b3 a2+b2 a1+b1 a4,b4

Pipelined Parallel Adder g4,h4 g3,h3 g2,h2 g1,h1 e4,f4 e3,f3 e2,f2 e1+f1 c4,d4 c3,d3 c2+d2 c1+d1 a3+b3 a4,b4 a2+b2 a1+b1

Pipelined Parallel Adder g3,h3 g2,h2 g1+h1 g4,h4 e4,f4 e3,f3 e2+f2 e1+f1 c4,d4 c3+d3 c2+d2 c1+d1 a4+b4 a3+b3 a2+b2 a1+b1

Instruction Execution Pipeline Instruction Fetch Cycle (IF) Fetch current instruction from memory Increment PC Instruction decode / register fetch cycle (ID) Decode instruction Compute possible branch target Read registers from the register file Execution / effective address cycle (EX) Form the effective address ALU performs the operation specified by the opcode Memory access (MEM) Memory read for load instruction Memory write for store instruction Write-back cycle (WB) Write result into register file IF ID EX MEM WB

Instruction Execution Pipeline IF ID EX MEM WB stages WB MEM EX ID IF time

Pipeline Hazards Structural hazards Data Hazards Control Hazards

Pipeline Hazards Structural Hazards Arise from resource conflicts when the hardware cannot support all possible combinations of instructions simultaneously in overlapped execution. stages stall (bubble) WB MEM EX ID IF time IF ID EX MEM WB Mem Reg ALU Mem Reg

Pipeline Hazards Data Hazards Arise when an instruction depends on the results of a previous instruction in a way that is exposed by the overlapping of instructions. ADD R1, R2, R3 SUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10, R1, R11 stages WB MEM EX ID IF time IF ID EX MEM WB Mem Reg ALU Mem Reg

Pipeline Hazards Data Hazards Forwarding (by-passing) IF ID EX MEM WB Mem Reg ALU Mem Reg IF ID EX MEM WB Mem Reg ALU Mem Reg IF ID EX MEM WB Mem Reg ALU Mem Reg IF ID EX MEM WB Mem Reg ALU Mem Reg

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Arise from pipelining of instructions (e.g. branch) that change PC. LOOP: LOAD 100,X ADD 200,X STORE 300,X DECX BNE LOOP ... for i=n to 1 ci = ai + bi stages WB MEM EX ID IF time

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Freeze (flush) BRA L1 ... L1: NEXT NEXT stages WB MEM EX ID IF time

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Predicted-not-taken BNE L1 NEXT ... L1: NEXT stages WB MEM EX ID IF time Not taken Taken

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Predicted-taken BNE L1 NEXT ... L1: NEXT stages WB MEM EX ID IF time Not taken Taken

Control (Branch) Hazards Pipeline Hazards Control (Branch) Hazards Delayed branch ADD R1,R2,R3 if (R2=0) branch L1 delay slot NEXT ... L1: NEXT if (R2=0) branch L1 ADD R1,R2,R3 NEXT ... L1: NEXT branch instruction sequential successor Branch target if taken stages WB MEM EX ID IF time Not taken Taken

Levels of Parallelism Bit level parallelism Within arithmetic logic circuits Instruction level parallelism Multiple instructions execute per clock cycle Memory system parallelism Overlap of memory operations with computation Operating system parallelism More than one processor Multiple jobs run in parallel on SMP Loop level Procedure level