1 ECE369 ECE369 Pipelining. 2 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument.

Slides:



Advertisements
Similar presentations
Morgan Kaufmann Publishers The Processor
Advertisements

1 ECE369 ECE369 Pipelining. 2 ECE369 addm (rs), rt # Memory[R[rs]] = R[rt] + Memory[R[rs]]; Assume that we can read and write the memory in the same cycle.
Pipeline Summary Try to put everything together for pipelines Before going onto caches. Peer Instruction Lecture Materials for Computer Architecture by.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Pipelining and Control Hazards Oct
Lecture Objectives: 1)Define branch prediction. 2)Draw a state machine for a 2 bit branch prediction scheme 3)Explain the impact on the compiler of branch.
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
Intro to Computer Org. Pipelining, Part 2 – Data hazards + Stalls.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Forwarding and Hazards MemberRole William ElliottTeam Leader Jessica Tyler ShulerWiki Specialist Tyler KimseyLead Engineer Cameron CarrollEngineer Danielle.
The University of Adelaide, School of Computer Science
Assembly Language Working with the CPU.
1 A few words about the quiz Closed book, but you may bring in a page of handwritten notes. –You need to know what the “core” MIPS instructions do. –I.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
S. Barua – CPSC 440 CHAPTER 6 ENHANCING PERFORMANCE WITH PIPELINING This chapter presents pipelining.
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
1 Chapter Six - 2nd Half Pipelined Processor Forwarding, Hazards, Branching EE3055 Web:
1 Stalling  The easiest solution is to stall the pipeline  We could delay the AND instruction by introducing a one-cycle delay into the pipeline, sometimes.
1 Lecture 5: MIPS Examples Today’s topics:  the compilation process  full example – sort in C Reminder: 2 nd assignment will be posted later today.
CSCE 212 Quiz 9 – 3/30/11 1.What is the clock cycle time based on for single-cycle and for pipelining? 2.What two actions can be done to resolve data hazards?
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Pipelined Datapath and Control (Lecture #15) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer.
Lecture 24: CPU Design Today’s topic –Multi-Cycle ALU –Introduction to Pipelining 1.
1 Pipelining Reconsider the data path we just did Each instruction takes from 3 to 5 clock cycles However, there are parts of hardware that are idle many.
Pipelining Enhancing Performance. Datapath as Designed in Ch. 5 Consider execution of: lw $t1,100($t0) lw $t2,200($t0) lw $t3,300($t0) Datapath segments.
Comp Sci pipelining 1 Ch. 13 Pipelining. Comp Sci pipelining 2 Pipelining.
Chapter 4 The Processor. Chapter 4 — The Processor — 2 Introduction We will examine two MIPS implementations A simplified version A more realistic pipelined.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Pipelining Basics.
CS 1104 Help Session IV Five Issues in Pipelining Colin Tan, S
5/13/99 Ashish Sabharwal1 Pipelining and Hazards n Hazards occur because –Don’t have enough resources (ALU’s, memory,…) Structural Hazard –Need a value.
1  1998 Morgan Kaufmann Publishers Chapter Six. 2  1998 Morgan Kaufmann Publishers Pipelining Improve perfomance by increasing instruction throughput.
LECTURE 7 Pipelining. DATAPATH AND CONTROL We started with the single-cycle implementation, in which a single instruction is executed over a single cycle.
CMPE 421 Parallel Computer Architecture Part 3: Hardware Solution: Control Hazard and Prediction.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos.
1 Pipelining CDA 3101 Discussion Section Question 1 – 6.1 Suppose that time for an ALU operation can be shortened by 25% in the following figure.
ECE/CS 552: Pipeline Hazards © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim.
Pipelining: Implementation CPSC 252 Computer Organization Ellen Walker, Hiram College.
EECS 322 April 10, 2000 EECS 322 Computer Architecture Pipeline Control, Data Hazards and Branch Hazards.
EECS 370 Discussion smbc-comics.com.
Chapter Six.
Speed up on cycle time Stalls – Optimizing compilers for pipelining
Lecture 6: Assembly Programs
Computer Organization CS224
CS2100 Computer Organization
CSCI206 - Computer Organization & Programming
Lecture: Pipelining Basics
Chapter 4 The Processor Part 4
Pipelining: Advanced ILP
Morgan Kaufmann Publishers The Processor
Pipelining review.
The processor: Pipelining and Branching
Pipelined Processor Design
CSCI206 - Computer Organization & Programming
CSCI206 - Computer Organization & Programming
Data Hazards Data Hazard
Chapter Six.
The Processor Lecture 3.6: Control Hazards
Chapter Six.
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Pipelining: Basic Concepts
Lecture 6: Assembly Programs
Systems Architecture II
Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2,
Presentation transcript:

1 ECE369 ECE369 Pipelining

2 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument string to uppercase. Assume that this function is called with a string that contains exactly 100 lowercase letters, followed by the null terminator. toupper: lb $t2, 0($a0) beq $t2, $0, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 122, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, 0($a0) # Store back in string next: addi $a0, $a0, 1 j toupper exit: jr $ra (a) How many instructions would be executed for this function?

3 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument string to uppercase. Assume that this function is called with a string that contains exactly 100 lowercase letters, followed by the null terminator. toupper: lb $t2, 0($a0) beq $t2, $0, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 122, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, 0($a0) # Store back in string next: addi $a0, $a0, 1 j toupper exit: jr $ra (b) Assume that we implement a single-cycle processor, with a cycle time of 8ns. How much time would be needed to execute the function?

4 ECE369 “toupper” :converts any lowercase characters (with ASCII codes between 97 and 122) in the null-terminated argument string to uppercase. toupper: lb $t2, 0($a0) beq $t2, $0, exit # Stop at end of string blt $t2, 97, next # Not lowercase bgt $t2, 122, next # Not lowercase sub $t2, $t2, 32 # Convert to uppercase sb $t2, 0($a0) # Store back in string next: addi $a0, $a0, 1 j toupper exit: jr $ra (c) assume a 5-stage pipeline, each stage takes one clock cycle, register file can be read and written in the same cycle, forwarding is done whenever possible, and stalls are inserted otherwise, branches are resolved in the ID stage and are predicted correctly 90% of the time, jump instructions are fully pipelined, so no stalls or flushes are needed, how many total cycles are needed for the “toupper” with these assumptions?

5 ECE369 (d) If the cycle time of the pipelined machine is 2ns, how would its performance compare to that of the single cycle processor from Part (b)?

6 ECE369 Solution (a) loop will be executed 100 times. three more instructions are executed to process the final null character. That’s 803 instructions!

7 ECE369 Solution (b) each instruction in one cycle, for a CPI of instructions are executed in all,: CPU time = Number of instructions executed × CPI × Clock cycle time = 803 instructions × 1 cycle/instruction × 8 ns/cycle = 6424 ns

8 ECE369 Solution (c) The “base” number of cycles needed, assuming no stalls or flushes, would be four cycles to fill the pipelineand 803 more cycles to complete the function, for a total of 807 cycles. However, the given code also includes 301 branches (three for each of the first 100 loop iterations, and one for the end of the string). If 10% of these are mispredicted and require one instruction to be flushed, there will be a total of 0.10 x 301 x 1 = 30 wasted cycles for flushes. There is also a hazard between the “lb” instruction and the “beq” right after it. If we assume forwarding is done whenever possible and that registers can be written and read in the same cycle, we’d still need to insert a two-cycle stall between these instructions, since branches are determined in the ID stage. The “lb/beq” sequence is executed 101 times, so we need a total of 202 stall cycles. The total number of cycles would then be = 1039.

9 ECE369 Solution (d) At 2 ns per cycle, the pipelined system will require 1039 x 2 = 2078 ns. This is a performance improvement of 6424/2078, or roughly three times!