Lecture: Pipelining Basics

Slides:



Advertisements
Similar presentations
1 Lecture 2: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM Video 1: Using AM.
Advertisements

1 Lecture: Pipelining Hazards Topics: Basic pipelining implementation, hazards, bypassing HW2 posted, due Wednesday.
Adding the Jump Instruction
CS1104: Computer Organisation School of Computing National University of Singapore.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Lecture: Pipelining Basics
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
1 Lecture 3: Pipelining Basics Biggest contributors to performance: clock speed, parallelism Today: basic pipelining implementation (Sections A.1-A.3)
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
CSCE 212 Quiz 4 – 2/16/11 *Assume computes take 1 clock cycle, loads and stores take 10 cycles and branches take 4 cycles and that they are running on.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Quantitative principles of computer design  Measuring cost.
1 Lecture 2: System Metrics and Pipelining Today’s topics: (Sections 1.6, 1.7, 1.9, A.1)  Performance summaries  Quantitative principles of computer.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Lecture 16: Basic CPU Design
1 Introduction Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design Text for CS/EE 6810: Hennessy and.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
1 Lecture 2: Metrics to Evaluate Systems Topics: Power and technology trends wrap-up, benchmark suites, performance equation, summarizing performance with.
What have mr aldred’s dirty clothes got to do with the cpu
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Performance Performance
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
Lecture 16: Basic Pipelining
1 Lecture: Metrics to Evaluate Performance Topics: Benchmark suites, Performance equation, Summarizing performance with AM, GM, HM  Video 1: Using AM.
11 Pipelining Kosarev Nikolay MIPT Oct, Pipelining Implementation technique whereby multiple instructions are overlapped in execution Each pipeline.
1 Lecture 3: Pipelining Basics Today: chapter 1 wrap-up, basic pipelining implementation (Sections C.1 - C.4) Reminders:  Sign up for the class mailing.
1 Lecture: Pipelining Extensions Topics: control hazards, multi-cycle instructions, pipelining equations.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
1 Lecture 20: OOO, Memory Hierarchy Today’s topics:  Out-of-order execution  Cache basics.
1 Lecture: Benchmarks, Pipelining Intro Topics: Performance equations wrap-up, Intro to pipelining.
1 Lecture: Pipeline Wrap-Up and Static ILP Topics: multi-cycle instructions, precise exceptions, deep pipelines, compiler scheduling, loop unrolling, software.
Lecture 18: Pipelining I.
Lecture 16: Basic Pipelining
Lecture: Pipelining Basics
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards
ECE232: Hardware Organization and Design
Lecture: Pipelining Basics
Chapter 4 The Processor Part 2
Lecture 16: Basic Pipelining
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
Lecture: Pipelining Hazards
Lecture 18: Pipelining Today’s topics:
Lecture: Static ILP Topics: compiler scheduling, loop unrolling, software pipelining (Sections C.5, 3.2)
Lecture: Pipelining Hazards
Lecture 5: Pipelining Basics
Serial versus Pipelined Execution
Lecture 18: Pipelining Today’s topics:
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.
Lecture: Static ILP Topics: loop unrolling, software pipelines (Sections C.5, 3.2) HW3 posted, due in a week.
Lecture 19: Branches, OOO Today’s topics: Instruction scheduling
An Introduction to pipelining
Lecture: Pipelining Extensions
Guest Lecturer TA: Shreyas Chand
Lecture 20: OOO, Memory Hierarchy
Lecture: Pipelining Extensions
Lecture 17: Pipelining Today’s topics: 5-stage pipeline Hazards.
Addressing mode summary
Lecture: Pipelining Hazards
Pipelining Appendix A and Chapter 3.
Lecture: Pipelining Basics
Presentation transcript:

Lecture: Pipelining Basics Topics: Performance equations wrap-up, Basic pipelining implementation Video 1: What is pipelining? Video 2: Clocks and latches Video 3: An example 5-stage pipeline Video 4: Loads/Stores and RISC/CISC Turn in HW1 Guest teacher, Manju Shevgoor, on Monday

An Alternative Perspective - I Each program is assumed to run for an equal number of cycles, so we’re fair to each program The number of instructions executed per cycle is a measure of how well a program is doing on a system The appropriate summary measure is sum of IPCs or AM of IPCs = 1.2 instr + 1.8 instr + 0.5 instr cyc cyc cyc This measure implicitly assumes that 1 instr in prog-A has the same importance as 1 instr in prog-B

An Alternative Perspective - II Each program is assumed to run for an equal number of instructions, so we’re fair to each program The number of cycles required per instruction is a measure of how well a program is doing on a system The appropriate summary measure is sum of CPIs or AM of CPIs = 0.8 cyc + 0.6 cyc + 2.0 cyc instr instr instr This measure implicitly assumes that 1 instr in prog-A has the same importance as 1 instr in prog-B

AM vs. GM GM of IPCs = 1 / GM of CPIs AM of IPCs represents thruput for a workload where each program runs sequentially for 1 cycle each; but high-IPC programs contribute more to the AM GM of IPCs does not represent run-time for any real workload (what does it mean to multiply instructions?); but every program’s IPC contributes equally to the final measure

Problem 6 My new laptop has a clock speed that is 30% higher than the old laptop. I’m running the same binaries on both machines. Their IPCs are listed below. I run the binaries such that each binary gets an equal share of CPU time. What speedup is my new laptop providing? P1 P2 P3 AM GM Old-IPC 1.2 1.6 2.0 1.6 1.57 New-IPC 1.6 1.6 1.6 1.6 1.6 AM of IPCs is the right measure. Could have also used GM. Speedup with AM would be 1.3.

Speedup Vs. Percentage “Speedup” is a ratio = old exec time / new exec time “Improvement”, “Increase”, “Decrease” usually refer to percentage relative to the baseline = (new perf – old perf) / old perf A program ran in 100 seconds on my old laptop and in 70 seconds on my new laptop What is the speedup? (1/70) / (1/100) = 1.42 What is the percentage increase in performance? ( 1/70 – 1/100 ) / (1/100) = 42% What is the reduction in execution time? 30%

Building a Car Unpipelined Start and finish a job before moving to the next Jobs Time

The Assembly Line Pipelined A B C A B C A B C A B C Break the job into smaller stages A B C A B C A B C Jobs A B C Time

Clocks and Latches Stage 1 Stage 2

Clocks and Latches Stage 1 L Stage 2 L Clk

Some Equations Unpipelined: time to execute one instruction = T + Tovh For an N-stage pipeline, time per stage = T/N + Tovh Total time per instruction = N (T/N + Tovh) = T + N Tovh Clock cycle time = T/N + Tovh Clock speed = 1 / (T/N + Tovh) Ideal speedup = (T + Tovh) / (T/N + Tovh) Cycles to complete one instruction = N Average CPI (cycles per instr) = 1

Problem 1 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 equal sequential pipeline stages. Answer the following, assuming that there are no stalls in the pipeline. What are the cycle times in the two processors? What are the clock speeds? What are the IPCs? How long does it take to finish one instr? What is the speedup from pipelining?

Problem 1 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 equal sequential pipeline stages. Answer the following, assuming that there are no stalls in the pipeline. What are the cycle times in the two processors? 5.2ns and 1.2ns What are the clock speeds? 192 MHz and 833 MHz What are the IPCs? 1 and 1 How long does it take to finish one instr? 5.2ns and 6ns What is the speedup from pipelining? 833/192 = 4.34

Problem 2 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 sequential pipeline stages. The stages have the following lengths: 1ns; 0.6ns; 1.2ns; 1.4ns; 0.8ns. Answer the following, assuming that there are no stalls in the pipeline. What is the cycle time in the new processor? What is the clock speed? What is the IPC? How long does it take to finish one instr? What is the speedup from pipelining? What is the max speedup from pipelining?

Problem 2 An unpipelined processor takes 5 ns to work on one instruction. It then takes 0.2 ns to latch its results into latches. I was able to convert the circuits into 5 sequential pipeline stages. The stages have the following lengths: 1ns; 0.6ns; 1.2ns; 1.4ns; 0.8ns. Answer the following, assuming that there are no stalls in the pipeline. What is the cycle time in the new processor? 1.6ns What is the clock speed? 625 MHz What is the IPC? 1 How long does it take to finish one instr? 8ns What is the speedup from pipelining? 625/192 = 3.26 What is the max speedup from pipelining? 5.2/0.2 = 26

A 5-Stage Pipeline Source: H&P textbook

A 5-Stage Pipeline Use the PC to access the I-cache and increment PC by 4

A 5-Stage Pipeline Read registers, compare registers, compute branch target; for now, assume branches take 2 cyc (there is enough work that branches can easily take more)

A 5-Stage Pipeline ALU computation, effective address computation for load/store

A 5-Stage Pipeline Memory access to/from data cache, stores finish in 4 cycles

A 5-Stage Pipeline Write result of ALU computation or load into register file

RISC/CISC Loads/Stores

Problem 3 For the following code sequence, show how the instrs flow through the pipeline: ADD R1, R2,  R3 BEZ R4, [R5] LD [R6]  R7 ST [R8]  R9

Pipeline Summary RR ALU DM RW ADD R1, R2,  R3 Rd R1,R2 R1+R2 -- Wr R3 BEZ R1, [R5] Rd R1, R5 -- -- -- Compare, Set PC LD 8[R3]  R6 Rd R3 R3+8 Get data Wr R6 ST 8[R3]  R6 Rd R3,R6 R3+8 Wr data --

Problem 4 Convert this C code into equivalent RISC assembly instructions a[i] = b[i] + c[i];

Problem 4 Convert this C code into equivalent RISC assembly instructions a[i] = b[i] + c[i]; LD [R1], R2 # R1 has the address for variable i MUL R2, 8, R3 # the offset from the start of the array ADD R4, R3, R7 # R4 has the address of a[0] ADD R5, R3, R8 # R5 has the address of b[0] ADD R6, R3, R9 # R6 has the address of c[0] LD [R8], R10 # Bringing b[i] LD [R9], R11 # Bringing c[i] ADD R10, R11, R12 # Sum is in R12 ST [R7], R12 # Putting result in a[i]

Problem 5 Design your own hypothetical 8-stage pipeline.

Title Bullet