11/4/091 Implementing an ISA, part II - Control David E. Culler CS61CL Nov 4, 2009 Lecture 10 UCB CS61CL F09 Lec 10.

Slides:



Advertisements
Similar presentations
Lecture 4: CPU Performance
Advertisements

CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
CIS 314 Fall 2005 MIPS Datapath (Single Cycle and Multi-Cycle)
1 IKI20210 Pengantar Organisasi Komputer Kuliah no. 25: Pipeline 10 Januari 2003 Bobby Nazief Johny Moningka
Computer Architecture
CS252/Patterson Lec 1.1 1/17/01 Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer.
ELEN 350 Multi-Cycle Datapath Adapted from the lecture notes of John Kubiatowicz (UCB) and Hank Walker (TAMU)
10/28/091 Implementing an Instruction Set David E. Culler CS61CL Oct 28, 2009 Lecture 9 UCB CS61CL F09 Lec 9.
ECE 232 L22.Pipeline3.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 22 Pipelining,
11/4/251 Interactions between Processor Design and Memory System Design David E. Culler CS61CL Nov 25, 2009 Lecture 12 UCB CS61CL F09 Lec 12.
361 datapath Computer Architecture Lecture 8: Designing a Single Cycle Datapath.
CS 61C L19 Pipelining II (1) A Carle, Summer 2005 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #19: Pipelining II
Mary Jane Irwin ( ) [Adapted from Computer Organization and Design,
CS61C L26 Single Cycle CPU Datapath II (1) Garcia © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent ISA => RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
EECC550 - Shaaban #1 Lec # 5 Winter CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
©UCB CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan
ECE 232 L15.Miulticycle.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 15 Multi-cycle.
CS61C L28 CPU Design : Pipelining to Improve Performance I (1) Garcia, Fall 2006 © UCB 100 Msites!  Sometimes it’s nice to stop and reflect. The web was.
ECE 232 L19.Pipeline2.1 Adapted from Patterson 97 ©UCBCopyright 1998 Morgan Kaufmann Publishers ECE 232 Hardware Organization and Design Lecture 19 Pipelining,
CS61CL Machine Structures Lec 6 – Number Representation David Culler Electrical Engineering and Computer Sciences University of California, Berkeley.
CS61C L25 CPU Design : Designing a Single-Cycle CPU (1) Garcia, Fall 2006 © UCB T-Mobile’s Wi-Fi / Cell phone  T-mobile just announced a new phone that.
Computer ArchitectureFall 2007 © October 22nd, 2007 Majd F. Sakr CS-447– Computer Architecture.
Pipelining Datapath Adapted from the lecture notes of Dr. John Kubiatowicz (UC Berkeley) and Hank Walker (TAMU)
CS61C L26 CPU Design : Designing a Single-Cycle CPU II (1) Garcia, Fall 2006 © UCB Lecturer SOE Dan Garcia inst.eecs.berkeley.edu/~cs61c.
Inst.eecs.berkeley.edu/~cs61c UCB CS61C : Machine Structures Lecture 25 CPU design (of a single-cycle CPU) Intel is prototyping circuits that.
CS152 / Kubiatowicz Lec13.1 3/17/03©UCB Spring 2003 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
EEM 486: Computer Architecture Lecture 3 Designing a Single Cycle Datapath.
Pipelining - II Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
CS 61C L16 Datapath (1) A Carle, Summer 2004 © UCB inst.eecs.berkeley.edu/~cs61c/su05 CS61C : Machine Structures Lecture #16 – Datapath Andy.
CS61CL Machine Structures Lec 8 – State and Register Transfers David Culler Electrical Engineering and Computer Sciences University of California, Berkeley.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
CS152 / Kubiatowicz Lec /17/01©UCB Fall 2001 CS152 Computer Architecture and Engineering Lecture 13 Introduction to Pipelining: Datapath and Control.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
Pipelining - II Rabi Mahapatra Adapted from CS 152C (UC Berkeley) lectures notes of Spring 2002.
EECC550 - Shaaban #1 Lec # 5 Spring CPU Design Steps 1. Analyze instruction set operations using independent RTN => datapath requirements.
Lecture 12: Pipeline Datapath Design Professor Mike Schulte Computer Architecture ECE 201.
CS3350B Computer Architecture Winter 2015 Lecture 5.6: Single-Cycle CPU: Datapath Control (Part 1) Marc Moreno Maza [Adapted.
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Pipelining (I). Pipelining Example  Laundry Example  Four students have one load of clothes each to wash, dry, fold, and put away  Washer takes 30.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
EEM 486: Computer Architecture Designing a Single Cycle Datapath.
CS61CL Machine Structures Lec 5 – Instruction Set Architecture
W.S Computer System Design Lecture 4 Wannarat Suntiamorntut.
CSIE30300 Computer Architecture Unit 04: Basic MIPS Pipelining Hsin-Chou Chi [Adapted from material by and
Oct. 18, 2000Machine Organization1 Machine Organization (CS 570) Lecture 4: Pipelining * Jeremy R. Johnson Wed. Oct. 18, 2000 *This lecture was derived.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
By Wannarat Computer System Design Lecture 4 Wannarat Suntiamorntut.
CS252/Patterson Lec 1.1 1/17/01 معماري کامپيوتر - درس نهم pipeline برگرفته از درس : Prof. David A. Patterson.
EE524/CptS561 Jose G. Delgado-Frias 1 Processor Basic steps to process an instruction IFID/OFEXMEMWB Instruction Fetch Instruction Decode / Operand Fetch.
Introduction to Computer Organization Pipelining.
CMSC 611: Advanced Computer Architecture Pipelining Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted.
Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 10 Computer Hardware Design (Pipeline Datapath and Control Design) Prof. Dr.
Lecture 18: Pipelining I.
Computer Organization
CMSC 611: Advanced Computer Architecture
5 Steps of MIPS Datapath Figure A.2, Page A-8
ECE232: Hardware Organization and Design
CDA 3101 Spring 2016 Introduction to Computer Organization
ECE232: Hardware Organization and Design
School of Computing and Informatics Arizona State University
Chapter 4 The Processor Part 2
Lecturer: Alan Christopher
Pipelining in more detail
A Multiple Clock Cycle Instruction Implementation
Single Cycle vs. Multiple Cycle
CMCS Computer Architecture Lecture 20 Pipelined Datapath and Control April 11, CMSC411.htm Mohamed.
COMS 361 Computer Organization
Recall: Performance Evaluation
Presentation transcript:

11/4/091 Implementing an ISA, part II - Control David E. Culler CS61CL Nov 4, 2009 Lecture 10 UCB CS61CL F09 Lec 10

Review: TinyMIPS Reg-Reg instructions (op == 0) –adduR[rd] := R[rs] + R[rt]; pc:=pc+4 –subuR[rd] := R[rs] - R[rt]; pc:=pc+4 Reg-Immed (op != 0) –lw R[rt] := Mem[ R[ rs ] + signEx(Im16) ] –swMem[ R[ rs ] + signEx(Im16) ] := R[rt] Jumps –jPC := PC || addr || 00 –jrPC := R[rs] Branches –BEQPC := (R[rs] == R[rt]) ? PC + signEx(im16) : PC+4 –BLTZPC := (R[rs] < 0) ? PC + signEx(im16) : PC+4 11/4/09 UCB CS61CL F09 Lec 10 2

Review: DataPath + Control 11/4/09 UCB CS61CL F09 Lec 10 3 °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel

Control State Machine (abstract) 11/4/09 UCB CS61CL F09 Lec 10 4 I-Fetch IR := Mem[pc] AddU R[rd]:=R[rs]+R[rt]; pc := pc+4 SubU R[rd] := R[rs]-R[rt]; pc := pc+4 LW R[rt] := mem[R[rs]+sx16]; pc := pc+4 SW mem[R[rs]+sx16] := R[rt]; pc := pc+4 J pc := pc ||addr||00 JR pc := R[rs] BR-taken pc := pc + sx16||00 BR-not taken pc := pc + 4 reset ~reset&OP==addu ~reset&OP==lw ~reset & ( (OP==beq) & ~EQ)) | (OP==Bneg) & ~N)) )

Ifetch: IR := mem[pc] RAM_addr <- A <- PC;(pc2A, ~s2A) IR_in <- D <- RAM_data;(~i2D,m2D,~b2D,~s2D) IR := IR_in;(ld_ir,~ld_pc,~ld_reg, ~wrt) 11/4/09 UCB CS61CL F09 Lec 10 5 °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel

Control State Machine 11/4/09 UCB CS61CL F09 Lec 10 6 I-Fetch IR := Mem[pc] pc2A,~s2A,~ir2D,m2D, ~b2D,~s2D,ld_ir,~ld_pc, ~ld_reg, ~wrt AddU R[rd]:=R[rs]+R[rt]; pc := pc+4 SubU R[rd] := R[rs]-R[rt]; pc := pc+4 LW R[rt] := mem[R[rs]+sx16]; pc := pc+4 SW mem[R[rs]+sx16] := R[rt]; pc := pc+4 J pc := pc ||addr||00 JR pc := R[rs] BR-taken pc := pc + sx16||00 BR-not taken pc := pc + 4 reset ~reset&OP==addu ~reset&OP==lw ~reset & ( (OP==beq) & ~EQ)) | (OP==Bneg) & ~N)) )

Exec: R[rd]:=R[rs]+R[rt]; pc:=pc+4; npc_sel=0,ld_pc,~pc2A,~ld_ir,~i2D,~wrt,~m2D,~rt_sel,ld_reg,~b2D,~sx_sel,~ comp,~s2A,s2D 11/4/09 UCB CS61CL F09 Lec 10 7 °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel

Control State Machine 11/4/09 UCB CS61CL F09 Lec 10 8 I-Fetch IR := Mem[pc] pc2A,~s2A,~ir2D,m2D, ~b2D,~s2D,ld_ir,~ld_pc, ~ld_reg, ~wrt AddU R[rd]:=R[rs]+R[rt]; pc := pc+4 npc_sel=0,ld_pc,~pc2A,~ld_ir,~i2D,~wrt,~m2D,~rt_sel,ld_reg,~b2D,~sx_sel,~comp,~s2A,s2D SubU R[rd] := R[rs]-R[rt]; pc := pc+4 LW R[rt] := mem[R[rs]+sx16]; pc := pc+4 SW mem[R[rs]+sx16] := R[rt]; pc := pc+4 J pc := pc ||addr||00 JR pc := R[rs] BR-taken pc := pc + sx16||00 BR-not taken pc := pc + 4 reset ~reset&OP==addu ~reset&OP==lw ~reset & ( (OP==beq) & ~EQ)) | (OP==Bneg) & ~N)) )

Exec: R[rd]:=R[rs]-R[rt]; pc:=pc+4; npc_sel=0,ld_pc,~pc2A,~ld_ir,~i2D,~wrt,~m2D,~rt_sel,ld_reg,~b2D,~sx_sel,c omp,~s2A,s2D 11/4/09 UCB CS61CL F09 Lec 10 9 °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D || + npc_sel

Exec LW: R[rt]:= Mem[R[rs]+SXim16]; npc_sel=0, ld_pc, m2D, rt_sel, ld_reg, sx_sel, s2A 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D || + npc_sel

Exec SW: Mem[R[rs]+SXim16] := R[rt] 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D || + npc_sel

Exec J: PC := PC || addr || 00 npc_sel=1, ld_pc, i2D 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel

Exec JR: PC := R[rs] npc_sel=2, ld_pc, s2D, sx_sel=2 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel

Exec Br Taken: PC := PC + SX16 npc_sel=3, ld_pc, i2D 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM A D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel

Controller Specification 11/4/09 UCB CS61CL F09 Lec 10 15

Adminstration HW7 due midnight Mid Term 2 Monday 11/9 –5:30 – 7:30 RM: 145 Dwinelle –alternate Friday 11/4 3:00-5:00 rm – 310 Soda –Review session Sunday Soda Project 3 –incremental lab check offs Flex lab mon (9-1) and tues (9-5) –midterm final prep –project 3 help 11/4/09 UCB CS61CL F09 Lec 10 16

Controller Implementation 11/4/09 UCB CS61CL F09 Lec clk reset exec op eq n npc_sel s2D

Combinational Logic per Ctrl Point 11/4/09 UCB CS61CL F09 Lec execpc2A execldPC wrt exec op

Multiplexor Control I /4/09 UCB CS61CL F09 Lec op I x00xx00 I

Faster Clock Clock Period > Longest path from reg out to input + reg delay 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM Addr D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel A B S

Multi-Cycle Controller State Machine 11/4/09 UCB CS61CL F09 Lec Op / Dcd A := R[rs], B := R[rt] AddU S:=A+B; pc := pc+4 SubU S := A – B; pc := pc+4 LW S=A+sx16; pc := pc+4 SW S=A+sx16; pc := pc+4 J pc := pc ||addr||00 JR pc := R[rs] BR-taken pc := pc + sx16||00 BR-not taken pc := pc + 4 I-Fetch IR := Mem[pc] R[rd]:=S; read MAR:=S; R[rd]:=D wrt MAR:=S; MDR:=B’

Time State control Move control word through the stages Decode per stage Active stage moves “around the ring” 11/4/09 UCB CS61CL F09 Lec °°° PC + A BCi IR IR_ex IR_mem IR_wb mem

Time State Control 11/4/09 UCB CS61CL F09 Lec °°° Asel Bsel Dsel ld PC + A BCi ~ IR RAM Addr D xtxt +4 pc2Ald_pcld_irwrtm2Dld_regsx_selcomps2As2D rs rtrd b2D rt_sel i2D ||+ npc_sel A B S ifetch decode exec mem wb

More regular multi-cycle execution 11/4/09 UCB CS61CL F09 Lec Op / Dcd A := R[rs], B := R[rt] AddU S:=A+B; pc := pc+4 SubU S := A – B; pc := pc+4 LW S=A+sx16; pc := pc+4 SW S=A+sx16; pc := pc+4 J pc := pc ||addr||00 JR pc := R[rs] BR-taken pc := pc + sx16||00 BR-not taken pc := pc + 4 I-Fetch IR := Mem[pc] R[rd]:=S; R[rd]:=S’; read MAR:=S; R[rd]:=D wrt MAR:=S; MDR:=B’ S’:=S;

Sequence of Multi-step Operations Operation implemented as sequence of step on distinct resources –wash => dry => fold Multiple independent Operations 11/4/09 UCB CS61CL F09 Lec IF DCD Ex Mem WB

6/27/2015 cs61cl f09 lec 5 26 Technology Trends Clock Rate: ~30% per year Transistor Density: ~35% Chip Area: ~15% Transistors per chip: ~55% Total Performance Capability: ~100% by the time you graduate... –3x clock rate (>10 GHz) –10x transistor count (100 Billion transistors) –30x raw capability plus 16x dram density, 32x disk density (60% per year) Network bandwidth, …

Pipelining Overlap consecutive operations 11/4/09 UCB CS61CL F09 Lec 10 27

6/27/2015 cs61cl f09 lec 5 28 Performance(X) Execution_time(Y) n == Performance(Y) Execution_time(X) Definition: Performance Performance is in units of things per sec –bigger is better If we are primarily concerned with response time performance(x) = 1 execution_time(x) " X is n times faster than Y" means

Pipeline Performance N operations performed in k steps each Sequential Time: N*k Lower bound: N (1 every cycle) Pipeline Time: k – 1 + N Bound on Speedup on k-stage pipeline < k Speedup(k,N) = Time(1,N)/Time(k,N) = N*k / (N+k-1) ≈ N / (1+k/N) StartUp Cost: k-1 Peak Rate Half Power point 11/4/09 UCB CS61CL F09 Lec 10 29

6/27/2015 cs61cl f09 lec 5 30 Performance Trends MIPS R3000

6/27/2015 cs61cl f09 lec 5 31 Processor Performance (1.35X before, 1.55X now) 1.54X/yr

Pipelined control 11/4/09 UCB CS61CL F09 Lec °°° PC + A BCi IR IR_ex IR_mem IR_wb imem Dmem

Pipelined Instruction Execution Fetch Instruction Every cycle Launch into a pipeline What if they are not independent? –structural hazards »two operations need to use same resource –data dependence »later instruction needs to use the value produce by an earlier on Detect Wait till hazard clears 11/4/09 UCB CS61CL F09 Lec 10 33

Pipelined “Bubble” 11/4/09 UCB CS61CL F09 Lec °°° PC + A BCi IR IR_ex IR_mem IR_wb imem Dmem