1 2-Hardware Design Basics of Embedded Processors (cont.)

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Advertisements

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Stanford University CS243 Winter 2006 Wei Li 1 Register Allocation.
Give qualifications of instructors: DAP
CS 111: Introduction to Programming Midterm Exam NAME _________________ UIN __________________ 10/30/08 1.Who is our hero? 2.Why is this person our hero?
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Allocating Memory.
1 ITCS 3181 Logic and Computer Systems 2014 B. Wilkinson Slides8.ppt Modification date: Nov 3, 2014 Random Logic Approach The approach described so far.
CS 151 Digital Systems Design Lecture 37 Register Transfer Level
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 2: Custom single-purpose processors.
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 2: Custom single-purpose processors.
361 div.1 Computer Architecture ECE 361 Lecture 7: ALU Design : Division.
The Control Unit: Sequencing the Processor Control Unit: –provides control signals that activate the various microoperations in the datapath the select.
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 2: Custom single-purpose processors.
Dr. Turki F. Al-Somani VHDL synthesis and simulation – Part 3 Microcomputer Systems Design (Embedded Systems)
Memory Organization.
Chapter 2: Custom single-purpose processors
RT-Level Custom Design. This Week in DIG II  Introduction  Combinational logic  Sequential logic  Custom single-purpose processor design  Review.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
ENEE 408C Lab Capstone Project: Digital System Design Fall 2005 Sequential Circuit Design.
Introduction to Data Flow Graphs and their Scheduling Sources: Gang Quan.
Multiplication.
Sequential Circuits Chapter 4 S. Dandamudi To be used with S. Dandamudi, “Fundamentals of Computer Organization and Design,” Springer,  S.
Chapter 1 Algorithm Analysis
5.3 Machine-Independent Compiler Features
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
1 Chapter 7 Computer Arithmetic Smruti Ranjan Sarangi Computer Organisation and Architecture PowerPoint Slides PROPRIETARY MATERIAL. © 2014 The McGraw-Hill.
Hussein Bin Sama && Tareq Alawneh
George Mason University ECE 545 – Introduction to VHDL ECE 545 Lecture 5 Finite State Machines.
Program Efficiency & Complexity Analysis. Algorithm Review An algorithm is a definite procedure for solving a problem in finite number of steps Algorithm.
CPS120: Introduction to Computer Science Operations Lecture 9.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Module : Algorithmic state machines. Machine language Machine language is built up from discrete statements or instructions. On the processing architecture,
Arithmetic Logic Unit (ALU) Anna Kurek CS 147 Spring 2008.
Mohamed Younis CMCS 411, Computer Architecture 1 CMSC Computer Architecture Lecture 11 Performing Division March 5,
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
1 2-Hardware Design Basics of Embedded Processors.
1 Fundamentals of Computer Science Combinational Circuits.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Dr. Turki F. Al-Somani VHDL synthesis and simulation.
1 2-Hardware Design Basics of Embedded Processors (cont.)
Processor Organization and Architecture Module III.
Reconfigurable Computing - Options in Circuit Design John Morris Chung-Ang University The University of Auckland ‘Iolanthe’ at 13 knots on Cockburn Sound,
C Program Control September 15, OBJECTIVES The essentials of counter-controlled repetition. To use the for and do...while repetition statements.
Introduction to Optimization
Finite state machine optimization
Finite state machine optimization
CS2100 Computer Organisation
Register Transfer Specification And Design
2-Hardware Design Basics of Embedded Processors (cont.)
Introduction Introduction to VHDL Entities Signals Data & Scalar Types
Chap 7. Register Transfers and Datapaths
Chapter 2: Custom single-purpose processors
Optimization Code Optimization ©SoftMoore Consulting.
Fundamentals & Ethics of Information Systems IS 201
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
Data Structures Recursion CIS265/506: Chapter 06 - Recursion.
Introduction to Optimization
Introduction to Optimization
UNIVERSITY OF MASSACHUSETTS Dept
Chapter 2: Custom single-purpose processors
Chapter 2: Custom single-purpose processors
RTL Design Methodology Transition from Pseudocode & Interface
Controllers and Datapaths
UNIVERSITY OF MASSACHUSETTS Dept
Chapter 2: Custom single-purpose processors
ECE 448 Lecture 6 Finite State Machines State Diagrams vs. Algorithmic State Machine (ASM) Charts.
332:437 Lecture 14 Turing Machines and State Machine Sequences
Presentation transcript:

1 2-Hardware Design Basics of Embedded Processors (cont.)

2 Outline Introduction Combinational logic Sequential logic Custom single-purpose processor design RT-level custom single-purpose processor design Optimizing the custom single-purpose processor

3 Optimizing single-purpose processors Optimization is the task of making design metric values the best possible Optimization opportunities  original program  FSMD  datapath  FSM

4 Optimizing the original program Analyze program attributes and look for areas of possible improvement  number of computations  size of variable  time and space complexity  operations used multiplication and division very expensive How about shift right and left operations?

5 Optimizing the original program (cont’) 0: int x, y; 1: while (1) { 2: while (!go_i); 3: x = x_i; 4: y = y_i; 5: while (x != y) { 6: if (x < y) 7: y = y - x; else 8: x = x - y; } 9: d_o = x; } 0: int x, y, r; 1: while (1) { 2: while (!go_i); // x must be the larger number 3: if (x_i >= y_i) { 4: x=x_i; 5: y=y_i; } 6: else { 7: x=y_i; 8: y=x_i; } 9: while (y != 0) { 10: r = x % y; 11: x = y; 12: y = r; } 13: d_o = x; } original programoptimized program replace the subtraction operation(s) with modulo operation in order to speed up program GCD(42, 8) - 9 iterations to complete the loop x and y values evaluated as follows : (42, 8), (34, 8), (26,8), (18,8), (10, 8), (2,8), (2,6), (2,4), (2,2). GCD(42,8) - 3 iterations to complete the loop x and y values evaluated as follows: (42, 8), (8,2), (2,0)

6 Optimizing the FSMD Areas of possible improvements  merge states states with constants on transitions can be eliminated, transition taken is already known states with independent operations can be merged  separate states states which require complex operations (a*b*c*d) can be broken into smaller states to reduce hardware size Address problem in HW#2 [2.20, 2.21] Assume we already have two input adders and two input multiplier blocks

7 Optimizing the FSMD (cont.) int x, y; 2: go_i !go_i x = x_i y = y_i x<yx>y y = y -xx = x - y 3: 5: 7: 8: d_o = x 9: y = y -x 7: x = x - y 8: 6-J: x!=y 5: !(x!=y) x<y!(x<y) 6: 5-J: 1: 1 !1 x = x_i y = y_i 4: 2: 2-J: !go_i !(!go_i) d_o = x 1-J: 9: int x, y; 3: original FSMD optimized FSMD eliminate state 1 – transitions have constant values merge state 2 and state 2J – no loop operation in between them merge state 3 and state 4 – assignment operations are independent of one another merge state 5 and state 6 – transitions from state 6 can be done in state 5 eliminate state 5J and 6J – transitions from each state can be done from state 7 and state 8, respectively eliminate state 1-J – transition from state 1-J can be done directly from state 9 No state separation needed

8 Optimizing the datapath Sharing of functional units  one-to-one mapping, as done previously, is not necessary  if same operation occurs in different states, they can share a single functional unit Multi-functional units  ALUs support a variety of operations, it can be shared among operations occurring in different states

9 Optimizing the FSM State encoding  What are our common options?  task of assigning a unique bit pattern to each state in an FSM  size of state register and combinational logic vary  can be treated as an ordering problem State minimization  task of merging equivalent states into a single state state equivalent if for all possible input combinations the two states generate the same outputs and transitions to the next same state

10 Summary Custom single-purpose processors  Straightforward design techniques  Can be built to execute algorithms  Typically start with FSMD  Consider Advanced design models  Optimize the design: Speed, size, power consumption