EENG449b/Savvides Lec 3.1 1/20/04 January 20, 2004 Prof. Andreas Savvides Spring 2004 EENG 449bG/CPSC 439bG Computer.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
ELEN 468 Advanced Logic Design
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Pipelining Hwanmo Sung CS147 Presentation Professor Sin-Min Lee.
Computer Architecture
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania Computer Organization Pipelined Processor Design 1.
Computer ArchitectureFall 2007 © October 24nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
EENG449b/Savvides Lec 4.1 1/22/04 January 22, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer.
DLX Instruction Format
EENG449b/Savvides Lec 4.1 1/25/05 January 25 and 25, 2005 Prof. Andreas Savvides Spring g449b EENG 449b/CPSC.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
Appendix A Pipelining: Basic and Intermediate Concepts
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
EENG449b/Savvides Lec 2.1 1/13/05 January 13, 2005 Prof. Andreas Savvides Spring 2005 EENG 449bG/CPSC 439bG Computer Systems Lecture 2 Instruction Set.
ENGS 116 Lecture 51 Pipelining and Hazards Vincent H. Berk September 30, 2005 Reading for today: Chapter A.1 – A.3, article: Patterson&Ditzel Reading for.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE Computer Organization Lecture 17 - Pipelined.
9.2 Pipelining Suppose we want to perform the combined multiply and add operations with a stream of numbers: A i * B i + C i for i =1,2,3,…,7.
CS1104: Computer Organisation School of Computing National University of Singapore.
Pipelining. 10/19/ Outline 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and Interrupts Conclusion.
Integrated Circuits Costs
EEL5708 Lotzi Bölöni EEL 5708 High Performance Computer Architecture Pipelining.
Appendix A Pipelining: Basic and Intermediate Concept
Analogy: Gotta Do Laundry
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.
CECS 440 Pipelining.1(c) 2014 – R. W. Allison [slides adapted from D. Patterson slides with additional credits to M.J. Irwin]

Cs 152 L1 3.1 DAP Fa97,  U.CB Pipelining Lessons °Pipelining doesn’t help latency of single task, it helps throughput of entire workload °Multiple tasks.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
Pipelining Example Laundry Example: Three Stages
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
HazardsCS510 Computer Architectures Lecture Lecture 7 Pipeline Hazards.
Introduction to Computer Organization Pipelining.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Lecture 18: Pipelining I.
Computer Organization
Review: Instruction Set Evolution
Morgan Kaufmann Publishers
ELEN 468 Advanced Logic Design
CMSC 611: Advanced Computer Architecture
5 Steps of MIPS Datapath Figure A.2, Page A-8
ECE232: Hardware Organization and Design
Appendix A - Pipelining
Chapter 3: Pipelining 순천향대학교 컴퓨터학부 이 상 정 Adapted from
Chapter 4 The Processor Part 2
Lecturer: Alan Christopher
Serial versus Pipelined Execution
The Processor Lecture 3.6: Control Hazards
An Introduction to pipelining
Pipelining Appendix A and Chapter 3.
Introduction to Computer Organization and Architecture
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
Presentation transcript:

EENG449b/Savvides Lec 3.1 1/20/04 January 20, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG Computer Systems Lecture 3 MIPS Instruction Set & Intro to Pipelining

EENG449b/Savvides Lec 3.2 1/20/04 The MIPS Architecture Features: GPRs with load-store Displacement, Immediate and Register Indirect Addressing Modes Data sizes: 8-, 16-, 32-, 64-bit integers and 64- bit floating point numbers Simple instructions: load, store, add, sub, move register-register, shift Compare equal, compare not equal, compare less, branch, jump call and return Fixed instruction encoding for performance, variable instruction encoding for size Provide at least 16 general purpose registers

EENG449b/Savvides Lec 3.3 1/20/04 MIPS Architecture Features Registers: bit GPRs (R0, R1…R31) –Note: R0 is always 0 !!! bit Floating Point Registers (F0,F1… F31) Data types: 8-bit bytes, 16-bit half words 32-bit single precision and 64-bit double precision floating point instructions Addressing Modes: Immediate (Add R4, R3 --- Regs[R4]<-Regs[R4]+3 Displacement (Add R4, 100(R1) – Regs[R4]<- Mem[100+Regs[R1]] Register indirect (place 0 in the displacement field) –E.g Add R4, 0(R1) Absolute Addressing (place R0 as the base register) –E.g Add R4, 1000(R0)

EENG449b/Savvides Lec 3.4 1/20/04 MIPS Instruction Format op – opcode (basic operation of the instruction) rs – first register operant rt – second register operant rd – register destination operant shamnt – shift amount funct – Function Example: LW t0, 1200($t1) binary Note: The numbers for these examples are form “Computer Organization & Design”, Chapter 3

EENG449b/Savvides Lec 3.5 1/20/04 MIPS Instruction Format op – opcode (basic operation of the instruction) rs – first register operant rt – second register operant rd – register destination operant shamnt – shift amount funct – Function Example: Add $t0, $s2,$t binary Note: The numbers for these examples are form “Computer Organization & Design”, Chapter 3

EENG449b/Savvides Lec 3.6 1/20/04 MIPS Instruction Format op – opcode (basic operation of the instruction) rs – first register operant rt – second register operant rd – register destination operant shamnt – shift amount funct – Function Example: j ? ? binary You fill it in!

EENG449b/Savvides Lec 3.7 1/20/04 MIPS Operations Four broad classes supported: 1.Loads and stores (figure 2.28) Different data sizes: LD, LW, LH, LB, LBU … 2.ALU Operations (figure 2.29) –Add, sub, and, or … –They are all register-register operations 3.Control Flow Instructions (figure 2.30) –Branches (conditional) and Jumps (unconditional) 4.Floating Point Operations

EENG449b/Savvides Lec 3.8 1/20/04 Levels of Representation High Level Language Program Assembly Language Program Machine Language Program Control Signal Specification Compiler Assembler Machine Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; lw$t15,0($t2) lw$t16,4($t2) sw$t16,0($t2) sw$t15,4($t2) °°°°

EENG449b/Savvides Lec 3.9 1/20/04 Execution Cycle Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction Obtain instruction from program storage Determine required actions and instruction size Locate and obtain operand data Compute result value or status Deposit results in storage for later use Determine successor instruction

EENG449b/Savvides Lec /20/04 5 Steps of MIPS Datapath Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc LMDLMD ALU MUX Memory Reg File MUX Data Memory MUX Sign Extend 4 Adder Zero? Next SEQ PC Address Next PC WB Data Inst RD RS1 RS2 Imm

EENG449b/Savvides Lec /20/04

EENG449b/Savvides Lec /20/04 Announcements Homework 1 is out –Chapter 1: Problems 1.2, 1.3, 1.17 –Chapter 2: Problems 2.5, 2.11, 2.12, 2.19 –Appendix A: Problems A.1, A.5, A.6, A.7, A.11 –Due Thursday, Feb 5, 2:00pm Note the paper on DSP processors on the website Reading for this week: Patterson and Hennessy Appendix A –This lecture we are covering A1 and A2, next lecture will cover the rest of the appendix Need to form teams for projects –Select a topic –Signup for group appointments with me

EENG449b/Savvides Lec /20/04 List of Possible Projects Power saving schemes in embedded microprocessors Embedded operating system enhancements and scheduling schemes for sensor interfaces –Available operating systems TinyOS, PALOS, uCOS-II Time synchronization in sensor networks and its hardware implications Efficient microcontroller interfaces and control mechanisms for articulated nodes Network protocols and/or data memory management for sensor networks I also encourage you to propose your own project

EENG449b/Savvides Lec /20/04 Introduction to Pipelinening Pipelining – leverage parallelism in hardware by overlapping instruction execution

EENG449b/Savvides Lec /20/04 Fast, Pipelined Instruction Interpretation Instruction Register Operand Registers Instruction Address Result Registers Next Instruction Instruction Fetch Decode & Operand Fetch Execute Store Results NI IF D E W NI IF D E W NI IF D E W NI IF D E W NI IF D E W Time Registers or Mem

EENG449b/Savvides Lec /20/04 Sequential Laundry Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? ABCD PM Midnight TaskOrderTaskOrder Time

EENG449b/Savvides Lec /20/04 Pipelined Laundry Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads ABCD 6 PM Midnight TaskOrderTaskOrder Time

EENG449b/Savvides Lec /20/04 Pipelining Lessons Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup ABCD 6 PM 789 TaskOrderTaskOrder Time

EENG449b/Savvides Lec /20/04 Instruction Pipelining Execute billions of instructions, so throughput is what matters –except when? What is desirable in instruction sets for pipelining? –Variable length instructions vs. all instructions same length? –Memory operands part of any operation vs. memory operands only in loads or stores? –Register operand many places in instruction format vs. registers located in same place?

EENG449b/Savvides Lec /20/04 Requirements for Pipelining Goal: Start a new instruction at every cycle What are the hardware implications? Two different tasks should not attempt to use the same datapath resource on the same clock cycle. Instructions should not interfere with each other Need to have separate data and instruction memories Need increased memory bandwidth –A 5-stage pipeline operating at the same clock rate as pipelined version requires 5 times the bandwidth Need to introduce pipeline registers Register file used in two places in the ID and WB stages –Perform reads in the first half and writes in the second half.

EENG449b/Savvides Lec /20/04 Pipeline Requirements… Need separate instruction and Data memories: Structural Hazard Register file Read in the first half, write in the second half cycle

EENG449b/Savvides Lec /20/04 Add registers between pipeline stages Prevent interference between 2 instructions Carry data from one stage to the next Edge triggered

EENG449b/Savvides Lec /20/04 Pipelining Hazards Hazards: circumstances that would cause incorrect execution if next instruction where launched Structural Hazards:Attempting to use the same hardware to do two different things at the same time Data Hazards:Instruction depends on result of prior instruction still in the pipeline Control Hazards:Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps) Common Solution: “Stall” the pipeline, until the hazard is resolved by inserting one or more “bubbles” in the pipeline

EENG449b/Savvides Lec /20/04 Data Hazards Occurs when the relative timing of instructions is altered because of pipelining Consider the following code: DADD R1, R2, R3 DSUB R4, R1, R5 AND R6, R1, R7 OR R8, R1, R9 XOR R10, R1, R11

EENG449b/Savvides Lec /20/04 Data Hazard

EENG449b/Savvides Lec /20/04 Data Hazards: Data Forwarding

EENG449b/Savvides Lec /20/04 Data Hazards Requiring Stalls LD R1,0(R2) DSUB R4,R1,R5 AND R6,R1,R7 OR R8,R1,R9 HAVE to stall for 1 cycle…

EENG449b/Savvides Lec /20/04 Four Branch Hazard Alternatives #1: Stall until branch direction is clear #2: Predict Branch Not Taken –Execute successor instructions in sequence –“Squash” instructions in pipeline if branch actually taken –Advantage of late pipeline state update –47% MIPS branches not taken on average –PC+4 already calculated, so use it to get next instruction #3: Predict Branch Taken –53% MIPS branches taken on average –But haven’t calculated branch target address in MIPS »MIPS still incurs 1 cycle branch penalty »Other machines: branch target known before outcome

EENG449b/Savvides Lec /20/04 Four Branch Hazard Alternatives #4: Delayed Branch –Define branch to take place AFTER a following instruction branch instruction sequential successor 1 sequential successor sequential successor n branch target if taken –1 slot delay allows proper decision and branch target address in 5 stage pipeline –MIPS uses this Branch delay of length n

EENG449b/Savvides Lec /20/04 Delayed Branch Where to get instructions to fill branch delay slot? –Before branch instruction –From the target address: only valuable when branch taken –From fall through: only valuable when branch not taken –Canceling branches allow more slots to be filled Compiler effectiveness for single branch delay slot: –Fills about 60% of branch delay slots –About 80% of instructions executed in branch delay slots useful in computation –About 50% (60% x 80%) of slots usefully filled Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)

EENG449b/Savvides Lec /20/04 Pipelining Performance Issues Consider an unpipelined processor 1ns/instruction Frequency 4 cycles for ALU operations 40% 4 cycles for branches 20% 5 cycles for memory operations 40% Pipelining overhead 0.2ns For the unpipelined processor

EENG449b/Savvides Lec /20/04 Speedup from Pipelining Now if we had a pipelined processor, we assume that each instruction takes 1 cycle BUT we also have overhead so instructions take 1ns ns = 1.2ns

EENG449b/Savvides Lec /20/04 Considering the stall overhead