1 EECS 150 - Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer.

Slides:



Advertisements
Similar presentations
Adding the Jump Instruction
Advertisements

Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
Processor System Architecture
1  1998 Morgan Kaufmann Publishers Chapter Five The Processor: Datapath and Control.
The Control Unit: Sequencing the Processor Control Unit: –provides control signals that activate the various microoperations in the datapath the select.
EECS Components and Design Techniques for Digital Systems Lec 19 – Fixed Point Arithmetic & Register Transfers Level (RTL) Design 11/2/2004 David.
Levels in Processor Design
1 COMP541 Sequencing – III (Sequencing a Computer) Montek Singh April 9, 2007.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
Chapter Five The Processor: Datapath and Control.
L15 – Control & Execution 1 Comp 411 – Spring /25/08 Control & Execution Finite State Machines for Control MIPS Execution.
CPEN Digital System Design Chapter 9 – Computer Design
Chapter 7 - Part 2 1 CPEN Digital System Design Chapter 7 – Registers and Register Transfers Part 2 – Counters, Register Cells, Buses, & Serial Operations.
CS61CL Machine Structures Lec 8 – State and Register Transfers David Culler Electrical Engineering and Computer Sciences University of California, Berkeley.
Processor I CPSC 321 Andreas Klappenecker. Midterm 1 Thursday, October 7, during the regular class time Covers all material up to that point History MIPS.
CS61C L15 Synchronous Digital Systems (1) Beamer, Summer 2007 © UCB Scott Beamer, Instructor inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture.
The Processor Andreas Klappenecker CPSC321 Computer Architecture.
The Processor Data Path & Control Chapter 5 Part 1 - Introduction and Single Clock Cycle Design N. Guydosh 2/29/04.
More Basics of CPU Design Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Computer Organization and Architecture
CS-334: Computer Architecture
Some Useful Circuits Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University.
Rabie A. Ramadan Lecture 3
1 Lecture 32 Datapath Analysis. 2 Overview °Datapaths must deal with input and output data values Implement with tri-state buffers °Necessary to control.
COSC 3430 L08 Basic MIPS Architecture.1 COSC 3430 Computer Architecture Lecture 08 Processors Single cycle Datapath PH 3: Sections
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
EKT 221/4 DIGITAL ELECTRONICS II  Registers, Micro-operations and Implementations - Part3.
Lecture 8: Processors, Introduction EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014,
Chap 7. Register Transfers and Datapaths. 7.1 Datapaths and Operations Two types of modules of digital systems –Datapath perform data-processing operations.
Chapter 4 Computer Design Basics. Chapter Overview Part 1 – Datapaths  Introduction  Datapath Example  Arithmetic Logic Unit (ALU)  Shifter  Datapath.
REGISTER TRANSFER & MICROOPERATIONS By Sohaib. Digital System Overview  Each module is built from digital components  Registers  Decoders  Arithmetic.
General Concepts of Computer Organization Overview of Microcomputer.
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Lec 15Systems Architecture1 Systems Architecture Lecture 15: A Simple Implementation of MIPS Jeremy R. Johnson Anatole D. Ruslanov William M. Mongan Some.
Gary MarsdenSlide 1University of Cape Town Chapter 5 - The Processor  Machine Performance factors –Instruction Count, Clock cycle time, Clock cycles per.
EE204 L12-Single Cycle DP PerformanceHina Anwar Khan EE204 Computer Architecture Single Cycle Data path Performance.
CDA 3101 Fall 2013 Introduction to Computer Organization The Arithmetic Logic Unit (ALU) and MIPS ALU Support 20 September 2013.
Chapter 4 MARIE: An Introduction to a Simple Computer.
IT253: Computer Organization Lecture 9: Making a Processor: Single-Cycle Processor Design Tonga Institute of Higher Education.
 Lecture 2 Processor Organization  Control needs to have the  Ability to fetch instructions from memory  Logic and means to control instruction sequencing.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
EKT 221 : Chapter 4 Computer Design Basics
Introduction to ASIC flow and Verilog HDL
Register Transfer Languages (RTL)
Datapath and Control AddressInstruction Memory Write Data Reg Addr Register File ALU Data Memory Address Write Data Read Data PC Read Data Read Data.
1  1998 Morgan Kaufmann Publishers Simple Implementation Include the functional units we need for each instruction Why do we need this stuff?
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
1 EECS Components and Design Techniques for Digital Systems Lec 20 – RTL Design Optimization 11/6/2007 Shauki Elassaad Electrical Engineering and.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ read/write and clock inputs Sequence of control signal combinations.
Functions of Processor Operation Addressing modes Registers i/o module interface Memory module interface Interrupts.
CS161 – Design and Architecture of Computer Systems
The 8085 Microprocessor Architecture
Control & Execution Finite State Machines for Control MIPS Execution.
Chap 7. Register Transfers and Datapaths
Register Transfer and Microoperations
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Processor (I).
Instructor: Alexander Stoytchev
Levels in Processor Design
Rocky K. C. Chang 6 November 2017
Computer Organization
Levels in Processor Design
Levels in Processor Design
Levels in Processor Design
ECE 448 Lecture 6 Finite State Machines State Diagrams, State Tables, Algorithmic State Machine (ASM) Charts, and VHDL code ECE 448 – FPGA and ASIC Design.
Levels in Processor Design
Computer Systems An Introducton.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Instructor: Michael Greenbaum
Presentation transcript:

1 EECS Components and Design Techniques for Digital Systems Lec 21 – RTL Design Optimization 11/16/2004 David Culler Electrical Engineering and Computer Sciences University of California, Berkeley

2 Recall: Register Trans. Level Descriptions A standard high-level representation for describing systems. It follows from the fact that all synchronous digital system can be described as a set of state elements connected by combination logic (CL) blocks: RTL comprises a set of register transfers with optional operators as part of the transfer. Example: regA  regB regC  regA + regB if (start==1) regA  regC Personal style: –use “;” to separate transfers that occur on separate cycles. –Use “,” to separate transfers that occur on the same cycle. Example (2 cycles): regA  regB, regB  0; regC  regA;

3 Components of the data path Storage –Flip-flops –Registers –Register Files –SRAM Arithmetic Units –Adders, subtraters, ALUs (built out of FAs or gates) –Comparators –Counters Interconnect –Wires –Busses –Tri-state Buffers –MUX

4 Arithmetic Circuit Design (MT revisited) Full Adder Adder Relationship of positional notation and operations on it to arithmetic circuits FA A B Cin Co S

5 Recall: Computer Organization Computer design as an application of digital logic design procedures Computer = processing unit + memory system Processing unit = control + datapath Control = finite state machine –Inputs = machine instruction, datapath conditions –Outputs = register transfer control signals, ALU operation codes –Instruction interpretation = instruction fetch, decode, execute Datapath = functional units + registers + interconnect –Functional units = ALU, multipliers, dividers, etc. –Registers = program counter, shifters, storage registers –Interconenct = busses and wires Instruction Interpreter vs Fixed Function Device

6 Datapath vs Control Datapath: Storage, FU, interconnect sufficient to perform the desired functions –Inputs are Control Points –Outputs are signals Controller: State machine to orchestrate operation on the data path –Based on desired function and signals Datapath Controller Control Points signals

7 Vector Op Example (more MT)

8 List Processor Example RTL gives us a framework for making high-level optimizations. –Fixed function unit –Approach extends to instruction interpreters General design procedure outline: 1. Problem, Constraints, and Component Library Spec. 2. “Algorithm” Selection 3. Micro-architecture Specification 4. Analysis of Cost, Performance, Power 5. Optimizations, Variations 6. Detailed Design

9 1. Problem Specification Design a circuit that forms the sum of all the 2's complements integers stored in a linked-list structure starting at memory address 0: All integers and pointers are 8-bit. The link-list is stored in a memory block with an 8-bit address port and 8-bit data port, as shown below. The pointer from the last element in the list is 0. At least one node in list. I/Os: –START resets to head of list and starts addition process. –DONE signals completion –R, Bus that holds the final result

10 1. Other Specifications Design Constraints: –Usually the design specification puts a restriction on cost, performance, power or all. We will leave this unspecified for now and return to it later. Component Library: componentdelay simple logic gates0.5ns n-bit registerclk-to-Q=0.5ns setup=0.5ns (data and LD) n-bit 2-1 multiplexor1ns n-bit adder(2 log(n) + 2)ns memory10ns read (asynchronous read) zero compare0.5 log(n) (single ported memory) Are these reasonable?

11 2. Algorithm Specification In this case the memory only allows one access per cycle, so the algorithm is limited to sequential execution. If in another case more input data is available at once, then a more parallel solution may be possible. Assume datapath state registers NEXT and SUM. –NEXT holds a pointer to the node in memory. –SUM holds the result of adding the node values to this point. If (START==1) NEXT  0, SUM  0; repeat { SUM  SUM + Memory[NEXT+1]; NEXT  Memory[NEXT]; } until (NEXT==0); R  SUM, DONE  1;

12 A_SEL 01 NEXT Memory D A == SUM NEXT_SEL LD_NEXT NEXT_ZERO SUM_SEL LD_SUM Architecture #1 Direct implementation of RTL description: Datapath Controller If (START==1) NEXT  0, SUM  0; repeat { SUM  SUM + Memory[NEXT+1]; NEXT  Memory[NEXT]; } until (NEXT==0); R  SUM, DONE  1;

13 4. Analysis of Cost, Performance, and Power Skip Power for now. Cost: –How do we measure it? # of transistors? # of gates? # of CLBs? –Depends on implementation technology. Usually we are interested in comparing the relative cost of two competing implementations. (Save this for later) Performance: –2 clock cycles per number added. –What is the minimum clock period? –The controller might be on the critical path. Therefore we need to know the implementation, and controller input and output delay.

14 Possible Controller Implementation Based on this, what is the controller input and output delay?

15 MT again… Longest path from any reg out to any reg input

16 4. Analysis of Performance

17 Critical paths A_SEL 01 NEXT Memory D A == SUM NEXT_SEL LD_NEXT NEXT_ZERO SUM_SEL LD_SUM 0 1 0

18 4. Analysis of Performance Detailed timing: clock period (T) = max (clock period for each state) T > 31ns, F < 32 MHz Observation: COMPUTE_SUM state does most of the work. Most of the components are inactive in GET_NEXT state. GET_NEXT does: Memory access + … COMPUTE_SUM does: 8-bit add, memory access, 15-bit add + … Conclusion: Move one of the adds to GET_NEXT.

19 5. Optimization Add new register named NUMA, for address of number to add. Update RTL to reflect our change (note still 2 cycles per iteration): If (START==1) NEXT  0, SUM  0, NUMA  1; repeat { SUM  SUM + Memory[NUMA]; NUMA  Memory[NEXT] + 1, NEXT  Memory[NEXT] ; } until (NEXT==0); R  SUM, DONE  1;

20 5. Optimization Architecture #2: Incremental cost: addition of another register and mux. If (START==1) NEXT  0, SUM  0, NUMA  1; repeat { SUM  SUM + Memory[NUMA]; NUMA  Memory[NEXT] + 1, NEXT  Memory[NEXT] ; } until (NEXT==0); R  SUM, DONE  1;

21 5. Optimization, Architecture #2 New timing: Clock Period (T) = max (clock period for each state) T > 23ns, F < 43Mhz Is this worth the extra cost? Can we lower the cost? Notice that the circuit now only performs one add on every cycle. Why not share the adder for both cycles?

22 5. Optimization, Architecture #3 Incremental cost: –Addition of another mux and control. Removal of an 8-bit adder. Performance: –mux adds 1ns to cycle time. 24ns, 41.67MHz. Is the cost savings worth the performance degradation?