CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.

Slides:



Advertisements
Similar presentations
PipelineCSCE430/830 Pipeline: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Prof. Yifeng Zhu, U of Maine Fall,
Advertisements

Pipeline Example: cycle 1 lw R10,9(R1) sub R11,R2, R3 and R12,R4, R5 or R13,R6, R7.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Review: Pipelining. Pipelining Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer.
Pipelining I (1) Fall 2005 Lecture 18: Pipelining I.
Computer Architecture
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
331 W08.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 8: Datapath Design [Adapted from Dave Patterson’s UCB CS152.
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
The Processor: Datapath & Control
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Chapter 5 The Processor: Datapath and Control Basic MIPS Architecture Homework 2 due October 28 th. Project Designs due October 28 th. Project Reports.
331 W9.1Spring :332:331 Computer Architecture and Assembly Language Spring 2006 Week 9 Building a Single-Cycle Datapath [Adapted from Dave Patterson’s.
331 Lec18.1Fall :332:331 Computer Architecture and Assembly Language Fall 2003 Lecture 18 Introduction to Pipelined Datapath [Adapted from Dave.
331 Lec 14.1Fall 2002 Review: Abstract Implementation View  Split memory (Harvard) model - single cycle operation  Simplified to contain only the instructions:
ENEE350 Ankur Srivastava University of Maryland, College Park Based on Slides from Mary Jane Irwin ( )
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.
1 Atanasoff–Berry Computer, built by Professor John Vincent Atanasoff and grad student Clifford Berry in the basement of the physics building at Iowa State.
331 W10.1Spring :332:331 Computer Architecture and Assembly Language Spring 2005 Week 10 Building a Multi-Cycle Datapath [Adapted from Dave Patterson’s.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Spring W :332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.
Introduction to Pipelining Rabi Mahapatra Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley)
CSE431 L05 Basic MIPS Architecture.1Irwin, PSU, 2005 CSE 431 Computer Architecture Fall 2005 Lecture 05: Basic MIPS Architecture Review Mary Jane Irwin.
1  What is the most boring household activity?. 2 A relevant question  Assuming you’ve got: —One washer (takes 30 minutes) —One drier (takes 40 minutes)
Chapter 4 Sections 4.1 – 4.4 Appendix D.1 and D.2 Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
B 0000 Pipelining ENGR xD52 Eric VanWyk Fall
Analogy: Gotta Do Laundry
CSE431 Chapter 4A.1Irwin, PSU, 2008 CSE 431 Computer Architecture Fall 2008 Chapter 4A: The Processor, Part A Mary Jane Irwin ( )
Computer Organization CS224 Chapter 4 Part b The Processor Spring 2010 With thanks to M.J. Irwin, T. Fountain, D. Patterson, and J. Hennessy for some lecture.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Electrical and Computer Engineering University of Cyprus LAB3: IMPROVING MIPS PERFORMANCE WITH PIPELINING.
Computer Architecture and Design – ECEN 350 Part 6 [Some slides adapted from A. Sprintson, M. Irwin, D. Paterson and others]
ECE 232 L18.Pipeline.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 18 Pipelining.

CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics.
Performance of Single-cycle Design
CSE331 W10.1Irwin&Li Fall 2006 PSU CSE 331 Computer Organization and Design Fall 2006 Week 10 Section 1: Mary Jane Irwin (
Pipelining CS365 Lecture 9. D. Barbara Pipeline CS465 2 Outline  Today’s topic  Pipelining is an implementation technique in which multiple instructions.
ECE-C355 Computer Structures Winter 2008 The MIPS Datapath Slides have been adapted from Prof. Mary Jane Irwin ( )
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
Chapter 4 From: Dr. Iyad F. Jafar Basic MIPS Architecture: Single-Cycle Datapath and Control.
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
COM181 Computer Hardware Lecture 6: The MIPs CPU.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Computer Architecture Lecture 6.  Our implementation of the MIPS is simplified memory-reference instructions: lw, sw arithmetic-logical instructions:
CS Computer Architecture Week 10: Single Cycle Implementation
Lecture 18: Pipelining I.
Computer Organization
CS 230: Computer Organization and Assembly Language
Pipelines An overview of pipelining
Performance of Single-cycle Design
ECE232: Hardware Organization and Design
Basic MIPS Architecture
Chapter 4 The Processor Part 2
Lecturer: Alan Christopher
The Processor Lecture 3.2: Building a Datapath with Control
An Introduction to pipelining
Pipelining Appendix A and Chapter 3.
A relevant question Assuming you’ve got: One washer (takes 30 minutes)
The Processor: Datapath & Control.
Processor: Datapath and Control
Presentation transcript:

CML CML CS 230: Computer Organization and Assembly Language Aviral Shrivastava Department of Computer Science and Engineering School of Computing and Informatics Arizona State University Slides courtesy: Prof. Yann Hang Lee, ASU, Prof. Mary Jane Irwin, PSU, Ande Carle, UCB

CML CMLAnnouncements Project 3 –MIPS Assembler Project 4 –MIPS Simulator –Due Nov 10, 2009 Quiz 4 –Nov 5, 2009 –Single-cycle implementation Finals –Tuesday, Dec 08, 2009 –Please come on time (You’ll need all the time) –Open book, notes, and internet –No communication with any other human

CML CML Single Cycle - Abstract View Abstract View –elements that operate on data values (combinational) –elements that contain state (sequential) Implementation –Design the datapath –Design the control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data

CML CML 26 Single cycle Datapath Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero ALU controlRegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 1632 MemtoReg ALUSrc Read Address Instruction Memory Add PC 4 Shift left 2 Add PCSrc 0 1 Shift left 2 Jump 28 PC+4[31-28] 32

CML CML Instr[25-0] Single cycle Datapath + Control Read Address Instr[31-0] Instruction Memory Add PC 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU ovf zero RegWrite Data Memory Address Write Data Read Data MemWrite MemRead Sign Extend 1632 MemtoReg ALUSrc Shift left 2 Add PCSrc RegDst ALU control ALUOp Instr[5-0] Instr[15-0] Instr[25-21] Instr[20-16] Instr[15 -11] Control Unit Instr[31-26] Branch Shift left Jump PC+4[31-28] 28

CML CML Single cycle Control Unit Completely determined by the instruction opcode field –Note that a multiplexor whose control input is 0 has a definite action, even if it is not used in performing the operation InstrRegDstALUSrcMemtoRegRegWrMemRdMemWrBranchALUOp1ALUOp0 R-type X001X lw sw X1X0X1000 beq X0X0X01X1

CML CML Disadvantages of Single Cycle Implementation Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction –especially problematic for more complex instructions like floating point multiply Is wasteful of area since some functional units must be duplicated since they can not be “shared” during an instruction execution –e.g., need separate adders to do PC update and branch target address calculations, as well as an ALU to do R- type arithmetic/logic operations and data memory address calculations

CML CML How to make it fast? Parallelism Short-cuts or Caching, or Bypassing Prediction Skip some work First form of parallelism is Pipelining

CML CML Pipelining: Its Natural! Laundry Example –Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 40 minutes “Folder” takes 20 minutes ABCD

CML CML Sequential Laundry Sequential laundry takes 6 hours for 4 loads ABCD PM Midnight TaskOrderTaskOrder Time

CML CML Pipelined Laundry Pipelined laundry takes 3.5 hours for 4 loads A BCD 6 PM Midnight TaskOrderTaskOrder Time Note: More time to do project 4

CML CML Pipelining Lessons Multiple tasks operating simultaneously Pipelining doesn’t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Also, need time to “fill” and “drain” the pipeline. ABCD 6 PM 789 TaskOrderTaskOrder Time

CML CML Pipelining: Some terms If you’re doing laundry or implementing a  P, each stage where something is done called a pipe stage –In laundry example, washer, dryer, and folding table are pipe stages; clothes enter at one end, exit other –In a  P, instructions enter at one end and have been executed when they leave –Another example: auto assembly line Throughput is how often stuff comes out of a pipeline

CML CML Technical details If times for all S stages are equal to T: –Time for one initiation to complete still ST –Time between 2 initiates = T not ST –Initiations per second = 1/T Pipelining: Overlap multiple executions of same sequence –Improves THROUGHPUT, not the time to perform a single operation Other examples: –Automobile assembly plant, chemical factory, garden hose, cooking

CML CML More technical details Book’s approach to draw pipeline timing diagrams… –Time runs left-to-right, in units of stage time –Each “row” below corresponds to distinct initiation –Boundary b/t 2 column entries: pipeline register (i.e. hamper) –Must look at column contents to see what stage is doing what Wash 1Dry 1Fold 1Pack 1 Wash 2Dry 2Fold 2Pack 2 Wash 3Dry 3Fold 3Pack 3 Wash 4Dry 4Fold 4Pack 4 Wash 5Dry 5Fold 5 Wash 6Dry 6 Time for N initiations to complete: NT + (S-1)T Throughput: Time per initiation = T + (S-1)T/N  T!

CML CML Ideal pipeline speedup Latch combinational logic delay =  combinational logic delay =  combinational logic delay =  combinational logic delay =  Unpipelined Latch delay for 1 piece of data = 4  + latch setup (assume small) approximate delay for 1000 pieces of data = 4000  Latch combinational logic delay =  combinational logic delay =  combinational logic delay =  combinational logic delay =  Pipelined Latch delay for 1 piece of data = 4(  + latch setup) approximate delay for 1000 pieces of data = 3   Ideal speedup = # of pipeline stages speedup for 1000 pieces of data = 4000 = ~

CML The “new look” dataflow PC Inst. Memory 4 ADD Register File Sign Extend 1632 MuxMux MuxMux Comp. ALU Branch taken MuxMux Data Mem. IR IR MEM/ WB.IR MuxMux IF/IDID/EXEX/MEMMEM/WB Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution. Data must be stored from one stage to the next in pipeline registers/latches. hold temporary values between clocks and needed info. for execution.

CML Another way to look at it… Inst. # Inst. iIFIDEXMEMWB Inst. i+1IFIDEXMEMWB Inst. i+2IFIDEXMEMWB Inst. i+3IFIDEXMEMWB Clock Number ALU RegIMDMReg ALU RegIMDMReg ALU RegIMDMReg ALU RegIMDMReg Program execution order (in instructions) Time

CML CML Questions about control signals Following discussion relevant to a single instruction Q: Are all control signals active at the same time? Q: Can we generate all these signals at the same time?

CML CML Passing control w/pipe registers Analogy: send instruction with car on assembly line –“Install Corinthian leather interior on car stage 3” WB M EX WB M Control IF/IDID/EXEX/MEMMEM/WB I n s t r u c t i o n RegDst ALUOp ALUSrc Branch MemRead MemWrite MemtoReg RegWrite strip off signals for execution phase strip off signals for write-back phase strip off signals for memory phase Genera- tion

CML CML Pipelined datapath w/control signals Registers

CML CML A Pipelined Processor Pipeline latches: pass the status and result of the current instruction to next stage Comparison: Clock Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 Ifetch lw sw Dec/Reg Exec Mem Wr Dec/Reg Exec Mem Ifetch Single-cycle IfetchDec/Reg Exec Mem Wr IfetchDec/Reg Exec Mem Wr IfetchDec/Reg Exec Mem Wr pipelined

CML CML Yoda says… Ohhh. Great warrior. Wars not make one great