Multicycle and Microcode

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
CS-447– Computer Architecture Lecture 12 Multiple Cycle Datapath
Multicycle Review Performance Examples. Single Cycle MIPS Implementation All instructions take the same amount of time Signals propagate along longest.
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Preparation for Midterm Binary Data Storage (integer, char, float pt) and Operations, Logic, Flip Flops, Switch Debouncing, Timing, Synchronous / Asynchronous.
RISC. Rational Behind RISC Few of the complex instructions were used –data movement – 45% –ALU ops – 25% –branching – 30% Cheaper memory VLSI technology.
Computer ArchitectureFall 2008 © October 6th, 2008 Majd F. Sakr CS-447– Computer Architecture.
Datapath and Control Andreas Klappenecker CPSC321 Computer Architecture.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
1 COMP541 Multicycle MIPS Montek Singh Apr 4, 2012.
COMP541 Multicycle MIPS Montek Singh Apr 8, 2015.
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
EECS 322: Computer Architecture
CPE232 Basic MIPS Architecture1 Computer Organization Multi-cycle Approach Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
1 A single-cycle MIPS processor  An instruction set architecture is an interface that defines the hardware operations which are available to software.
COMP541 Multicycle MIPS Montek Singh Mar 25, 2010.
December 26, 2015©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
TEST 1 – Tuesday March 3 Lectures 1 - 8, Ch 1,2 HW Due Feb 24 –1.4.1 p.60 –1.4.4 p.60 –1.4.6 p.60 –1.5.2 p –1.5.4 p.61 –1.5.5 p.61.
February 22, 2016©2003 Craig Zilles (derived from slides by Howard Huang) 1 A single-cycle MIPS processor  As previously discussed, an instruction set.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
Stored Program Concept Learning Objectives Learn the meaning of the stored program concept The processor and its components The fetch-decode-execute and.
OCR GCSE Computer Science Teaching and Learning Resources
Multi-Cycle Datapath and Control
CS161 – Design and Architecture of Computer Systems
CS/COE 1541 (term 2174) Jarrett Billingsley
IT 251 Computer Organization and Architecture
CSCI206 - Computer Organization & Programming
Performance of Single-cycle Design
Systems Architecture I
Computer Organization & Design Microcode for Control Sec. 5
Processor Architecture: Introduction to RISC Datapath (MIPS and Nios II) CSCE 230.
The Interconnect, Control, and Instruction Decoding
CISC (Complex Instruction Set Computer)
ECS 154B Computer Architecture II Spring 2009
Processor (I).
\course\cpeg323-08F\Topic6b-323
CS/COE0447 Computer Organization & Assembly Language
Design of the Control Unit for Single-Cycle Instruction Execution
Multiple Cycle Implementation of MIPS-Lite CPU
System Architecture 1 Chapter 2.
Chapter 4 The Processor Part 2
Single-cycle datapath, slightly rearranged
Central Processing Unit
Computer Organization “Central” Processing Unit (CPU)
Design of the Control Unit for One-cycle Instruction Execution
A pipeline diagram Clock cycle lw $t0, 4($sp) IF ID
The Multicycle Implementation
Pipelining in more detail
Systems Architecture II
Chapter Five The Processor: Datapath and Control
CSCI206 - Computer Organization & Programming
\course\cpeg323-05F\Topic6b-323
The Multicycle Implementation
Systems Architecture I
Vishwani D. Agrawal James J. Danaher Professor
CS/COE 0447 Jarrett Billingsley
Pipelining, Superscalar, and Out-of-order architectures
Addressing mode summary
Multi-Cycle Datapath Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Chapter Four The Processor: Datapath and Control
Instruct Set Architecture Variations
Introduction to Computer Organization and Architecture
Systems Architecture I
Pipelined datapath and control
CS161 – Design and Architecture of Computer Systems
Presentation transcript:

Multicycle and Microcode CS/COE 0447 Jarrett Billingsley

Class announcements how's the project coming along? :^) my god, last week of lecture thank god CS447

Multicycle CPUs CS447

Chop chop not all instructions take the same amount of time, so… make different instructions take different amounts of time! and by that, we mean different numbers of clock cycles 3 cycles 4 cycles 100 cycles j j or lw or lw CS447

The instructions' steps remember the five phases of execution? all instructions have IF, ID, EX, but only some write to memory/regs F D X beq/j F D X W add/sub etc. F D X M …….. W lw CS447

The multicycle datapath from a bird's-eye view each phase of execution has its own unit, and between phases, we insert registers to hold onto the data for the next phase. Instruction Memory F Control Register File D X Data Memory M W CS447

Watching an add (animated) let's watch an add instruction flow through the datapath! Instruction Memory F Clock! D Clock! X Clock! Data Memory M set all control signals... add... add Control Register File W Clock! data flows back to registers... CS447

CPI (and IPC) CPI (Cycles Per Instruction) measures the average number of cycles it takes to complete one instruction IPC (instructions per cycle) is its reciprocal multi-issue CPUs can execute multiple instructions in one clock cycle! WOAH 8O so, what's the CPI for the single-cycle implementation? uh, 1. by definition. what about for a multicycle implementation? ……????? hmmm CS447

So what the heck has this bought us? let's say our clock cycle time decreased from 15ns to 1ns! that's from 66 MHz to 1 GHz! :D ...buuut our CPI (cycles per instruction) increased a lot. with the single-cycle datapath, CPI was always 1. now the CPI is... well... uh... variable? IF ID EX WB M beq/j add/sub etc. lw if instructions vary in length, how do we calculate CPI? (let's just say lw is 10 cycles) CS447

Calculating Average CPI every program is different, and every program has a different instruction mix – how many of each kind of instruction it uses let's say we have a program where 60% of the instructions are ALU, 20% are branches, 15% are loads, and 5% are stores. ALU Branches Loads Stores % 60% 20% 15% 5% Cycles 4 3 10 9 CPI 2. Now sum the CPIs 2.4 + 0.6 + 1.5 + 0.45 = 4.95 1. for each category, multiply the proportion (percentage) by the number of cycles for that category to get the per-category CPI this is the Average CPI for THIS program. different mixes give different CPIs! CS447

The performance equation if we have n instructions, and each instruction takes CPI cycles, and each cycle takes t seconds, how long does it take to execute all the instructions? Total time=𝑛 instructions× 𝐶𝑃𝐼 cycles instruction × 𝑡 seconds cycle =𝑛×𝐶𝑃𝐼×𝑡 seconds or in English, it's the product of the instruction count, the CPI, and the length of one clock cycle CS447

So how much better is it?!??!? say we execute 500 million (5 × 108) instructions for the single-cycle datapath: CPI = 1 cycle time = 15ns (15 x 10-9 s) total time = n × CPI × cycle time = (5 × 108) × (1) × (15 × 10-9) = 7.5 seconds. for the multicycle datapath: CPI = 4.95 (much higher!) cycle time = 1ns (much lower!) total time = (5 × 108) × (4.95) × (1 × 10-9) = 2.475 seconds! CS447

doo-doo-doo-doo-doo-doo not bad! I guess? I mean, we had to increase the clock speed by a factor of 15... and we only got triple the performance out of it. if our CPI were also close to 1, it'd be 15 times as fast as the single-cycle machine... HMMMMMMMMMMMMMMMMMMMMMM. Pipelining! CS447

Multi-cycle control CS447

Things are more complicated now each instruction has multiple steps Instruction Memory F D X Data Memory M add Control Register File W CS447

we've transformed into a Von Neumann architecture. One problem solved loads and stores are no longer trying to access memory at the same time as the instruction fetch! F D X M Control Register File Memory Instruction Memory Data Memory W we've transformed into a Von Neumann architecture. CS447

What's a load look like now? (animated) the numbers are clock cycles. 1. Fetch using PC 2. Decode 3. Calculate s0 + 4 PC (pipeline registers) Control Register File instruction s0 Memory lw t0, 4(s0) s0+4 data 17 4 W 4. Load value using the computed address 5. copy into register computed address CS447

So how do we control it all? what about other instructions? loads, stores, branches… they all have different sequences of steps fetched instruction MemWrite ALUSrc ALUOp etc… RegDataSrc Control step register oh no this looks familiar this "step register" keeps track of what step of the instruction we're on CS447

these FSMs just will not leave us alone in a multicycle implementation, control is an FSM what's the first step of ANY instruction? and then? lw and then? Fetch Decode add but eventually… etc… CS447

Generalizing if we were to write a program to describe this control unit… while(true) { inst = memory[PC] switch(inst.opcode) { case LW: addr = REG[inst.rs] + inst.imm value = MEM[addr] REG[inst.rd] = value break case ADD: ... } this isn't just an illustration. we could literally make our control unit run programs. CS447

Microprogramming CS447

What is microprogramming? it's making a control unit into a tiny CPU that runs programs. these programs are called microcode (µCode). the sequencer decides what step to do next. MIPS instruction Sequencer each µInstruction is mostly a set of control signals. µPC µCode ROM etc… MemWrite ALUOp RegDataSrc ALUSrc the µCode ROM holds the µPrograms for each instruction. CS447

Read-only memory? Well… the real power comes when we make it possible to reprogram the µcode ROM! CPU µCode ROM it's inside the CPU, and it can be made of EEPROM or Flash firmware.bin so we can change it! firmware is software which serves a very important function and is hard to change (get it? it's softer than hardware but harder than software…) so what could you do if you could reprogram how your CPU works? CS447

The sky ROM size is the limit add new instructions! fix gigantic security holes in old ones! vpmultishiftqb phminposuw sqrmaxrstdec improve performance! make add instructions subtract instead!! CPI before: 3.3 CPI now: 2.8 haha pranked CS447

WHO WOULD WIN? So what's the catch? µCode ROM µPC microcoded control is flexible, but that comes at a cost: speed. Sequencer µPC µCode ROM a complicated mini-CPU inside your CPU a handful of G A T E Y B O Y S hardwired control (like your project) can be very fast. CS447

control The origins of RISC CISC CPU RISC CPU by the late 70s and early 80s, CISC architectures were reaching their peak of complexity, and that meant big complex control units CISC CPU control registers ALU RISC CPU control registers ALU and this was the right risk to take! it wasn't unusual for the control to take up the majority of the chip RISC and MIPS took the… risk of simplifying control CS447

One tool of many microcode is absolutely still in use, mostly in CISC architectures inc [eax + ecx*16 + 4] what the CPU fetches translation to "µOps" (microcode assisted) shl tmp1, ecx, 4 add tmp1, eax add tmp1, 4 load tmp2, [tmp1] add tmp2, 1 store [tmp1], tmp2 what the CPU core might actually execute CS447