CS 61C: Great Ideas in Computer Architecture Control and Pipelining

Slides:

Advertisements

Similar presentations

Instructor: Justin Hsia

Advertisements

CS152 Lec9.1 CS152 Computer Architecture and Engineering Lecture 9 Designing Single Cycle Control.

inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 28 – CPU Design : Pipelining to Improve Performance The College Board.

CS61C L29 CPU Design : Pipelining to Improve Performance (1) Garcia, Spring 2007 © UCB Wirelessly recharge batt  Powercast & Philips have developed a.

CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.

CS61C L20 Single-Cycle CPU Control (1) Beamer, Summer 2007 © UCB Scott Beamer Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.

CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2007 © UCB 3.6 TB DVDs? Maybe!  Researchers at Harvard have found a way to use.

Savio Chau Single Cycle Controller Design Last Time: Discussed the Designing of a Single Cycle Datapath Control Datapath Memory Processor (CPU) Input Output.

Ceg3420 control.1 ©UCB, DAP’ 97 CEG3420 Computer Design Lecture 9.2: Designing Single Cycle Control.

CS 61C L34 Single Cycle CPU Control I (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

CS61C L28 CPU Design : Pipelining to Improve Performance I (1) Garcia, Fall 2006 © UCB 100 Msites!  Sometimes it’s nice to stop and reflect. The web was.

Microprocessor Design

ECE 232 L13. Control.1 ©UCB, DAP’ 97 ECE 232 Hardware Organization and Design Lecture 13 Control Design

CS152 / Kubiatowicz Lec8.1 2/22/99©UCB Spring 1999 CS152 Computer Architecture and Engineering Lecture 8 Designing Single Cycle Control Feb 22, 1999 John.

CS 61C L17 Control (1) A Carle, Summer 2006 © UCB inst.eecs.berkeley.edu/~cs61c/su06 CS61C : Machine Structures Lecture #17: CPU Design II – Control

CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.

Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.

1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Sep 9, 2002 Topic: Pipelining Basics.

CS61C L27 Single-Cycle CPU Control (1) Garcia, Spring 2010 © UCB inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures Lecture 27 Single-cycle.

CS 61C L16 Datapath (1) A Carle, Summer 2004 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #16 – Datapath Andy.

CS61C L20 Single Cycle Datapath, Control (1) Chae, Summer 2008 © UCB Albert Chae, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.

361 control Computer Architecture Lecture 9: Designing Single Cycle Control.

CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Spring 2010 © UCB inst.eecs.berkeley.edu/~cs61c UC Berkeley CS61C : Machine Structures.

ECE 232 L12.Datapath.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 12 Datapath.

CS61C L27 Single Cycle CPU Control (1) Garcia, Fall 2006 © UCB Wireless High Definition?  Several companies will be working on a “WirelessHD” standard,

CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.

Instructor: Sagar Karandikar

Computer Organization CS224 Fall 2012 Lesson 26. Summary of Control Signals addsuborilwswbeqj RegDst ALUSrc MemtoReg RegWrite MemWrite Branch Jump ExtOp.

EEM 486: Computer Architecture Designing Single Cycle Control.

Designing a Single Cycle Datapath In this lecture, slides from lectures 3, 8 and 9 from the course Computer Architecture ECE 201 by Professor Mike Schulte.

CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.

EEM 486: Computer Architecture Designing a Single Cycle Datapath.

CPE 442 single-cycle datapath.1 Intro. To Computer Architecture CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath.

CS3350B Computer Architecture Winter 2015 Lecture 5.7: Single-Cycle CPU: Datapath Control (Part 2) Marc Moreno Maza [Adapted.

Instructor: Justin Hsia 7/23/2012Summer Lecture #201 CS 61C: Great Ideas in Computer Architecture MIPS CPU Datapath, Control Introduction.

Csci 136 Computer Architecture II –Single-Cycle Datapath Xiuzhen Cheng

EEM 486: Computer Architecture Lecture 3 Designing Single Cycle Control.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Single-Cycle CPU Datapath & Control Part 2 Instructors: Krste Asanovic & Vladimir Stojanovic.

Single Cycle Controller Design

CS 110 Computer Architecture Lecture 11: Single-Cycle CPU Datapath & Control Instructor: Sören Schwertfeger School of Information.

Lecture 18: Pipelining I.

Pipelines An overview of pipelining

Nicholas Weaver & Vladimir Stojanovic

Pipelining concepts, datapath and hazards

Performance of Single-cycle Design

IT 251 Computer Organization and Architecture

(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers

CPU Control Lecture 16 CDA

Chapter 4 The Processor Part 2

(Chapter 5: Hennessy and Patterson) Winter Quarter 1998 Chris Myers

Vladimir Stojanovic and Nicholas Weaver

Exhausted TA Ben Sussman

Lecturer PSOE Dan Garcia

Inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture 20 CPU Design: Control II & Pipelining I TA Noah Johnson Greet class.

Lecturer: Alan Christopher

Instructors: Randy H. Katz David A. Patterson

Serial versus Pipelined Execution

CS152 Computer Architecture and Engineering Lecture 8 Designing a Single Cycle Datapath Start: X:40.

An Introduction to pipelining

COMS 361 Computer Organization

inst.eecs.berkeley.edu/~cs61c-to

Prof. Giancarlo Succi, Ph.D., P.Eng.

Pipelining Appendix A and Chapter 3.

Instructors: Randy H. Katz David A. Patterson

Guest Lecturer: Justin Hsia

A relevant question Assuming you’ve got: One washer (takes 30 minutes)

The Processor: Datapath & Control.

COMS 361 Computer Organization

What You Will Learn In Next Few Sets of Lectures

Processor: Datapath and Control

Presentation transcript:

CS 61C: Great Ideas in Computer Architecture Control and Pipelining Instructors: Vladimir Stojanovic and Nicholas Weaver http://inst.eecs.Berkeley.edu/~cs61c/sp16

Datapath Control Signals ExtOp: “zero”, “sign” ALUsrc: 0  regB; 1  immed ALUctr: “ADD”, “SUB”, “OR” MemWr: 1  write memory MemtoReg: 0  ALU; 1  Mem RegDst: 0  “rt”; 1  “rd” RegWr: 1  write register RegDst Rd Rt ALUctr MemtoReg MemWr Imm16 clk PC 00 4 nPC_sel & Equal PC Ext Adder Mux Inst Address 1 1 RegWr Rs Rt 5 5 5 busA 32 Rw Ra Rb ALU busW 32 RegFile busB 32 1 32 clk 32 WrEn Adr Imm16 Data In Extender 1 Data Memory 16 32 clk ALUSrc ExtOp

Summary of the Control Signals (1/2) inst Register Transfer add R[rd]  R[rs] + R[rt]; PC  PC + 4 ALUsrc=RegB, ALUctr=“ADD”, RegDst=rd, RegWr, nPC_sel=“+4” sub R[rd]  R[rs] – R[rt]; PC  PC + 4 ALUsrc=RegB, ALUctr=“SUB”, RegDst=rd, RegWr, nPC_sel=“+4” ori R[rt]  R[rs] + zero_ext(Imm16); PC  PC + 4 ALUsrc=Im, Extop=“Z”, ALUctr=“OR”, RegDst=rt,RegWr, nPC_sel=“+4” lw R[rt]  MEM[ R[rs] + sign_ext(Imm16)]; PC  PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr=“ADD”, MemtoReg, RegDst=rt, RegWr, nPC_sel = “+4” sw MEM[ R[rs] + sign_ext(Imm16)]  R[rs]; PC  PC + 4 ALUsrc=Im, Extop=“sn”, ALUctr = “ADD”, MemWr, nPC_sel = “+4” beq if (R[rs] == R[rt]) then PC  PC + sign_ext(Imm16)] || 00 else PC  PC + 4 nPC_sel = “br”, ALUctr = “SUB”

Summary of the Control Signals (2/2) See func 10 0000 10 0010 We Don’t Care :-) Appendix A op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010 add sub ori lw sw beq jump RegDst ALUSrc MemtoReg RegWrite MemWrite nPCsel Jump ExtOp ALUctr<2:0> 1 x Add Subtract Or ? Here is a table summarizing the control signals setting for the seven (add, sub, ...) instructions we have looked at. Instead of showing you the exact bit values for the ALU control (ALUctr), I have used the symbolic values here. The first two columns are unique in the sense that they are R-type instrucions and in order to uniquely identify them, we need to look at BOTH the op field as well as the func fiels. Ori, lw, sw, and branch on equal are I-type instructions and Jump is J-type. They all can be uniquely idetified by looking at the opcode field alone. Now let’s take a more careful look at the first two columns. Notice that they are identical except the last row. So we can combine these two rows here if we can “delay” the generation of ALUctr signals. This lead us to something call “local decoding.” +3 = 42 min. (Y:22) op target address rs rt rd shamt funct 6 11 16 21 26 31 immediate R-type I-type J-type add, sub ori, lw, sw, beq jump

Boolean Expressions for Controller RegDst = add + sub ALUSrc = ori + lw + sw MemtoReg = lw RegWrite = add + sub + ori + lw MemWrite = sw nPCsel = beq Jump = jump ExtOp = lw + sw ALUctr[0] = sub + beq (assume ALUctr is 00 ADD, 01 SUB, 10 OR) ALUctr[1] = or Where: rtype = ~op5  ~op4  ~op3  ~op2  ~op1  ~op0, ori = ~op5  ~op4  op3  op2  ~op1  op0 lw = op5  ~op4  ~op3  ~op2  op1  op0 sw = op5  ~op4  op3  ~op2  op1  op0 beq = ~op5  ~op4  ~op3  op2  ~op1  ~op0 jump = ~op5  ~op4  ~op3  ~op2  op1  ~op0 add = rtype  func5  ~func4  ~func3  ~func2  ~func1  ~func0 sub = rtype  func5  ~func4  ~func3  ~func2  func1  ~func0 ADD 0000 00ss ssst tttt dddd d000 0010 0000 SUB 0000 00ss ssst tttt dddd d000 0010 0010 ORI 0011 01ss ssst tttt iiii iiii iiii iiii LW 1000 11ss ssst tttt iiii iiii iiii iiii SW 1010 11ss ssst tttt iiii iiii iiii iiii BEQ 0001 00ss ssst tttt iiii iiii iiii iiii JUMP 0000 10ii iiii iiii iiii iiii iiii iiii How do we implement this in gates?

Controller Implementation opcode func RegDst add ALUSrc sub MemtoReg ori RegWrite “AND” logic “OR” logic lw MemWrite nPCsel sw Jump beq ExtOp jump ALUctr[0] ALUctr[1]

P&H Figure 4.17

Summary: Single-cycle Processor Five steps to design a processor: 1. Analyze instruction set  datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic Formulate Logic Equations Design Circuits Control Datapath Memory Processor Input Output

Single Cycle Performance Morgan Kaufmann Publishers 10 November, 2018 Single Cycle Performance Assume time for actions are 100ps for register read or write; 200ps for other events Clock period is? Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps 1.25 GHz Clock rate (cycles/second = Hz) = 1/Period (seconds/cycle) Chapter 4 — The Processor

Single Cycle Performance Morgan Kaufmann Publishers 10 November, 2018 Single Cycle Performance Assume time for actions are 100ps for register read or write; 200ps for other events Clock period is? Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps 1.25 GHz What can we do to improve clock rate? Will this improve performance as well? Want increased clock rate to mean faster programs Chapter 4 — The Processor

Levels of Representation/Interpretation temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; High Level Language Program (e.g., C) Compiler lw $t0, 0($2) lw $t1, 4($2) sw $t1, 0($2) sw $t0, 4($2) Anything can be represented as a number, i.e., data or instructions Assembly Language Program (e.g., MIPS) Assembler 0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111 Machine Language Program (MIPS) Machine Interpretation Hardware Architecture Description (e.g., block diagrams) Architecture Implementation Logic Circuit Description (Circuit Schematic Diagrams)

No More Magic! CS61A CS61B CS61C ✔ CS61C  EE40 Phys 7B Software I/O system Processor Compiler Operating System (Mac OSX) Application (ex: browser) Digital Design Circuit Design Instruction Set Architecture Datapath & Control transistors Memory Hardware Software Assembler

Administrivia Project 2-2 due 3/8 @ 23:59:59 (Tue) Guerrilla Sessions: MIPS CPU Wed 3/09 3 - 5 PM @ 241 Cory Sat 3/12 1 - 3 PM @ 651 @ 611 Soda

Gotta Do Laundry Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold, and put away Washer takes 30 minutes Dryer takes 30 minutes “Folder” takes 30 minutes “Stasher” takes 30 minutes to put clothes into drawers A B C D

Sequential Laundry Sequential laundry takes 8 hours for 4 loads 30 Time 6 PM 7 8 9 10 11 12 1 2 AM T a s k O r d e B C D A

Pipelined Laundry Pipelined laundry takes 3.5 hours for 4 loads! 12 2 AM 6 PM 7 8 9 10 11 1 Time 30 T a s k O r d e B C D A

Pipelining Lessons (1/2) 6 PM 7 8 9 Time B C D A 30 T a s k O r d e Pipelining doesn’t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Time to “fill” pipeline and time to “drain” it reduces speedup: 2.3x (8/3.5) v. 4x (8/2) in this example

Pipelining Lessons (2/2) 6 PM 7 8 9 Time B C D A 30 T a s k O r d e Suppose new Washer takes 20 minutes, new Stasher takes 20 minutes. How much faster is pipeline? Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup

Execution Steps in MIPS Datapath 1) IFtch: Instruction Fetch, Increment PC 2) Dcd: Instruction Decode, Read Registers 3) Exec: Mem-ref: Calculate Address Arith-log: Perform Operation 4) Mem: Load: Read Data from Memory Store: Write Data to Memory 5) WB: Write Data Back to Register

Single Cycle Datapath instruction memory +4 rt rs rd registers ALU PC instruction memory +4 rt rs rd registers ALU Data imm 2. Decode/ Register Read 1. Instruction Fetch 5. Write Back 3. Execute 4. Memory

Pipeline registers Need registers between stages PC instruction memory +4 rt rs rd registers ALU Data imm 2. Decode/ Register Read 1. Instruction Fetch 5. Write Back 3. Execute 4. Memory Need registers between stages To hold information produced in previous cycle

More Detailed Pipeline Morgan Kaufmann Publishers 10 November, 2018 More Detailed Pipeline Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 IF for Load, Store, … Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 ID for Load, Store, … Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 EX for Load Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 MEM for Load Chapter 4 — The Processor

Morgan Kaufmann Publishers 10 November, 2018 WB for Load – Oops! Wrong register number! Chapter 4 — The Processor

Corrected Datapath for Load Morgan Kaufmann Publishers 10 November, 2018 Corrected Datapath for Load Chapter 4 — The Processor

Pipelined Execution Representation Time IF ID EX MEM WB Each instruction has identical latency! Every instruction must take same number of steps, so some stages will idle e.g. MEM stage for any arithmetic instruction

Graphical Pipeline Diagrams 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Write Back PC instruction memory +4 Register File rt rs rd ALU Data imm MUX Use datapath figure below to represent pipeline: IF ID EX Mem WB ALU I$ Reg D$

Graphical Pipeline Representation RegFile: left half is write, right half is read Time (clock cycles) I n s t r O d e I$ ALU Reg Reg I$ D$ ALU I$ Reg I$ ALU Reg D$ I$ Load Add Store Sub Or D$ Reg ALU

Pipelining Performance (1/3) Morgan Kaufmann Publishers Pipelining Performance (1/3) Use Tc (“time between completion of instructions”) to measure speedup Equality only achieved if stages are balanced (i.e. take the same amount of time) If not balanced, speedup is reduced Speedup due to increased throughput Latency for each instruction does not decrease Chapter 4 — The Processor

Pipelining Performance (2/3) Morgan Kaufmann Publishers Pipelining Performance (2/3) Assume time for stages is 100ps for register read or write 200ps for other stages What is pipelined clock rate? Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 800ps sw 700ps R-format 600ps beq 500ps Chapter 4 — The Processor

Pipelining Performance (3/3) Morgan Kaufmann Publishers Pipelining Performance (3/3) Single-cycle Tc = 800 ps f = 1.25GHz Here using Tc as “time between completion of instructions.” Pipelined Tc = 200 ps f = 5GHz Chapter 4 — The Processor

Clicker/Peer Instruction Logic in some stages takes 200ps and in some 100ps. Clk-Q delay is 30ps and setup-time is 20ps. What is the maximum clock frequency at which a pipelined design can operate? A: 10GHz B: 5GHz C: 6.7GHz D: 4.35GHz E: 4GHz