HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s.

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Representing Boolean Functions for Symbolic Model Checking Supratik Chakraborty IIT Bombay.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
1 CS 201 Compiler Construction Machine Code Generation.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Computer Organization and Architecture
PART 4: (2/2) Central Processing Unit (CPU) Basics CHAPTER 13: REDUCED INSTRUCTION SET COMPUTERS (RISC) 1.
Chapter 10 The Traditional Approach to Design
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li.
Detailed Design Kenneth M. Anderson Lecture 21
Synthesis of Software Programs for Embedded Control Application Felice Balarin, Massimiliano Chiodo, Paolo Giusto, Harry Hsieh, ASV, etc. Presented by.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
ECE Synthesis & Verification1 ECE 667 Spring 2011 Synthesis and Verification of Digital Systems Verification Introduction.
Ritu Varma Roshanak Roshandel Manu Prasanna
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
Intermediate Code. Local Optimizations
Testing an individual module
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
October 18, 2001Cho & Kim 1 Software Synthesis EE202A Presentation October 18, 2001 Young H. Cho and Seung Hyun Kim.
Chapter 3 Memory Management: Virtual Memory
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Digitaalsüsteemide verifitseerimise kursus1 Formal verification: BDD BDDs applied in equivalence checking.
CMSC 345 Fall 2000 Unit Testing. The testing process.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Automated Design of Custom Architecture Tulika Mitra
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Computer architecture Lecture 11: Reduced Instruction Set Computers Piotr Bilski.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
Computer Architecture And Organization UNIT-II General System Architecture.
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Computer Organization CDA 3103 Dr. Hassan Foroosh Dept. of Computer Science UCF © Copyright Hassan Foroosh 2002.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Verification & Validation By: Amir Masoud Gharehbaghi
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.
Question What technology differentiates the different stages a computer had gone through from generation 1 to present?
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
Code Optimization.
Software Testing.
Register Transfer Specification And Design
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
CSCI1600: Embedded and Real Time Software
Chapter 12 Pipelining and RISC
CSc 453 Final Code Generation
CSCI1600: Embedded and Real Time Software
Presentation transcript:

HW/SW Synthesis

2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s Compilation

3 Aptix Board Consists of – micro of choice – FPGA’s – FPIC’s Aptix Board Consists of – micro of choice – FPGA’s – FPIC’s POLIS Methodology Graphical EFSM+ Esterel Graphical EFSM+ Esterel Java CFSMs Partitioning SW Synthesis SW Code + RTOS Logic Netlist HW Synthesis SW Estimation HW Estimation Physical Prototyping HW/SW Co-Simulation Performance/trade-off Evaluation Formal Verification Compilers EC

4 Hardware - Software Architecture u Hardware: s Currently: t Programmable processors (micro-controllers, DSPs) t ASICs (FPGAs) u Software: s Set of concurrent tasks s Customized Real-Time Operating System u Interfaces: s Hardware modules s Software procedures (polling, interrupt handlers,...)

5 System Partitioning CFSM1 CFSM7 CFSM6 CFSM5 CFSM4 CFSM3 CFSM2 e2 e8 e3 e2 e1 e9 e3 e5 e7 e9 port5 port1 port2 port3 HW partition 1 HW partition2 SW partition 3 Scheduler port6 port7

6 Software Synthesis u Two-level process s “Technology” (processor) independent: t best decision/assignment sequence given CFSM s “Technology” (processor) dependent: t conversion into machine code u instruction selection u instruction scheduling u register assignment (currently left to compiler) * need performance and cost analysis u Worst Case Execution Time u code and data size

7 Software Synthesis u Technology-independent phase: s Construction of Control-Data Flow Graph from CFSM (based on BDD representation of Transition Function) s Optimization of CDFG for t execution speed t code size (based on BDD sifting algorithm) u Technology-dependent phase: s Creation of (restricted) C code s Cost and performance analysis s Compilation

8 Software Implementation Problem u Input: s Set of tasks (specified by CFSMs) s Set of timing constraints (e.g., input event rates and response constraints) u Output: s Set of procedures that implement the tasks s Scheduler that satisfies the timing constraints u Minimizing: s CPU cost s Memory size s Power, etc.

9 Software Implementation u How to do it ? u Traditional approach: s Hand-coding of procedures s Hand-estimation of timing input to scheduling algorithms u Long and error-prone u Our approach: three-step automated procedure: s Synthesize each task separately s Extract (estimated) timing s Schedule the tasks u Customized RT-OS (scheduler + drivers)

10 Software Implementation u Current strategy: s Iterate between synthesis, estimation and scheduling s Designer chooses the scheduling algorithm u Future work: s Top-down propagation of timing constraints s Software synthesis under constraints s Automated scheduling selection (based on CPU utilization estimates)

11 Software Synthesis Procedure Specification, partitioning S-graph synthesis Timing estimation Scheduling, validation not feasible Code generation Compilation Testing, validation Production pass fail

12 Task Implementation u Goal: quick response time, within timing and size constraints u Problem statement: s Given a CFSM transition function and constraints s Find a procedure implementing the transition function while meeting the constraints u The procedure code is acyclic: s Powerful optimization and analysis techniques s Looping, state storage etc. are implemented outside (in the OS)

13 SW Modeling Issues u The software model should be: s Low-level enough to allow detailed optimization and estimation s High-level enough to avoid excessive details e.g. register allocation, instruction selection u Main types of “user-mode” instructions: s Data movement s ALU s Conditional/unconditional branches s Subroutine calls u RTOS handles I/O, interrupts and so on

14 SW Modeling Issues u Focus on control-dominated applications s Address only CFSM control structure optimization s Data path left as “don’t touch” u Use Decision Diagrams (Bryant ‘86) s Appropriate for control-dominated tasks s Well-developed set of optimization techniques s Augmented with arithmetic and Boolean operators, to perform data computations

15ROBDDs Reduced Ordered BDDs [Bryant 86] A node represents a function given by the Shannon decomposition f = x f x + x f x Variable appears once on any path from root to terminal Variables are ordered No two vertices represent the same function Canonical Two functions are equal if and only if their BDDs are isomorphic Þ direct application in equivalence checking f = x 1 + x 2 x 3 x1x1x1x1 x2x2x2x2 x3x3x3x3 1 0       ROBDD x1x1x1x1 f x1x1x1x1 f

16 ROBDDs and Combinational Verification u Given two circuits: s Build the ROBDDs of the outputs in terms of the primary inputs s Two circuits are equivalent if and only if the ROBDDs are isomorphic u Complexity of verification depends on the size of ROBDDs s Compact in many cases

17 ROBDDs and Memory Explosion u ROBDDs are not always compact s Size of an ROBDD can be exponential in number of variables s Can happen for real life circuits also t e.g. Multipliers Commonly known as: Memory Explosion Problem of ROBDDs

18 Technique for Handling ROBDD Memory Explosion ROBDDs Enhancements VariableOrdering Free BDDs OFDDs,OKFDDs PartitionedROBDDs RelaxOrdering Node Node Decomp. Decomp. Partitioning All the representations are canonical  combinational equivalence checking

19 b3b3b3b3 Handling Memory Explosion: Variable Ordering u BDD size very sensitive to variable ordering a 1 b 1 + a 2 b 2 + a 3 b 3 a1a1a1a1 b1b1b1b1 a2a2a2a2 b2b2b2b2 a3a3a3a3 b3b3b3b3 1 0 Good Ordering: 8 nodes 1 0 Bad Ordering: 16 nodes   a1a1a1a1 a2a2a2a2 a2a2a2a2 a3a3a3a3 a3a3a3a3 a3a3a3a3 a3a3a3a3 b1b1b1b1 b1b1b1b1 b1b1b1b1 b1b1b1b1 b2b2b2b2 b2b2b2b2

20 a1a1a1a1 b1b1b1b1 a2a2a2a2 b2b2b2b2 1 0   Handling Memory Explosion: Variable Ordering l Good static as well as dynamic ordering techniques exist l Dynamic variable reordering [Rudell 93] l Change variable order automatically during computations l Repeatedly swap a variable with adjacent variable l Swapping can be done locally l Select the best location a 1 b 1 + a 2 b 2 a1a1a1a1 a2a2a2a2 b2b2b2b2 10   a2a2a2a2 b1b1b1b1 b1b1b1b1

21 SW Model: S-graphs u Acyclic extended decision diagram computing a transition function u S-graph structure: s Directed acyclic graph s Set of finite-valued variables s TEST nodes evaluate an expression and branch accordingly s ASSIGN nodes evaluate an expression and assign its result to a variable s Basic block + branch is a general CDFG model (but we constrain it to be acyclic for optimization)

22 An Example of S-graph a := a + 1 a := 0 detect(c) a<b BEGIN END F T TF – input event c – output event y – state int a – input int b – forever if (detect(c)) if (a < b) a := a + 1 emit(y) else a := 0 emit(y) emit(y)

23 S-graphs and Functions u Execution of an s-graph computes a function from a set of input and state variables to a set of output and state variables: s Output variables are initially undefined s Traverse the s-graph from BEGIN to END u Well-formed s-graph: s Every time a function depending on a variable is evaluated, that variable has a defined value u How do we derive an s-graph implementing a given function ?

24 S-graphs and Functions u Problem statement: s Given: a finite-valued multi-output function over a set of finite-valued variables s Find: an s-graph implementing it u Procedure based on Shannon expansion f = x f x + x’ f x’ u Result heavily depends on ordering of variables in expansion s Inputs before outputs: TESTs dominate over ASSIGNs s Outputs before inputs: ASSIGNs dominate over TESTs

25 Example of S-graph Construction x = a b + c y = a b + d a b c d x := 1 y := d 0 x := 1 y := 0 x := 0 y := 1 x := 0 y := Order: a, b, c, d, x, y (inputs before outputs)

26 Example of S-graph Construction x = a b + c y = a b + d a b x := 1 y := x := c y := d Order: a, b, x, y, c, d (interleaving inputs and outputs)

27 S-graph Optimization u General trade-off: s TEST-based is faster than ASSIGN-based (each variable is visited at most once) s ASSIGN-based is smaller than TEST-based (there is more potential for sharing) u Implemented as constrained sifting of the Transition Function BDD u The procedure can be iterated over s-graph fragments: s Local optimization, depending on fragment criticality (speed versus size) s Constraint-driven optimization (still to be explored)

28 From S-graphs to Instructions  TEST nodes  conditional branches  ASSIGN nodes  ALU ops and data moves u No loops in a single CFSM transition s (User loops handled at the RTOS level) u Data flow handling: s “Don’t touch” them (except common sub-expression extraction) s Map expression DAGs to C expressions s C compiler allocates registers and select op-codes u Need source-level debugging environment (with any of the chosen entry languages)

29 Software Synthesis Procedure Specification, partitioning S-graph synthesis Timing estimation Scheduling, validation notfeasible feasible Code generation Compilation Testing, validation Production pass fail

30 POLIS : S-graph Level Estimation

31 Problems in Software Performance Estimation How to link behavior to assembly code? -> Model C code generated from S-graph and use a set of cost parameters How to handle the variety of compilers and CPUs?

32 Software Model

33 Execution Time of a Path and the Code Size Property : Form of each statement is determined by type of corresponding node. T struct = ƒ°pi Ct( pi: pi: takes value 1 if node i is on a path, otherwise 0. Ct(n,v): Ct(n,v): execution time for node type n and variable type v. and variable type v. S struct = ƒ°Cs( node_type_of (i), variable_type_of (i)) Cs(n,v): Cs(n,v): code size for node type n and variable type v. and variable type v. node_type_of (i), variable_type_of (i))

34 Cost Parameters * Pre-calculated cost parameters for: (1) Ct(n,v), Cs(n,v): Execution time and code size for node type n Execution time and code size for node type n and variable type v. and variable type v. (2) T pp, S pp : Pre- and post- execution time and code size. Pre- and post- execution time and code size. (3) T init, S init : Execution time and code size for local variable Execution time and code size for local variable initialization. initialization.

35 Problems in Software Performance Estimation How to link behavior to assembly code? How to handle the variety of compilers and CPUs? -> prepare cost parameters for each target

36 Extraction of Cost Parameters

37 Algorithm  Preprocess: extracting set of cost parameters.  Weighting nodes and edges in given S-graph with cost parameters.  Traversing weighted S-graph.  Finding maximum cost path and minimum cost path using Depth-First Search on S-graph.  Accumulating 'size' costs on all nodes.

38 S-graph Level Estimation :Algorithm Cost C is a triple (min_time, max_time, code_size) Algorithm: SGtrace (sg i ) if (sg i == NULL) return (C(0,,0)); if (sg i == NULL) return (C(0,,0)); if (sg i has been visited) if (sg i has been visited) return ( pre-calculated Ci(*,*,0) associated with sg i ); return ( pre-calculated Ci(*,*,0) associated with sg i ); C i = initialize (max_time = 0, min_time =, code_size = 0); C i = initialize (max_time = 0, min_time =, code_size = 0); for each child sg j of sg i { for each child sg j of sg i { C ij = SGtrace (sg j ) + edge cost for edge e ij ; C ij = SGtrace (sg j ) + edge cost for edge e ij ; C i.max_time = max(C i.max_time, C ij.max_time); C i.max_time = max(C i.max_time, C ij.max_time); C i.min_time = min(C i.min_time, C ij.min_time); C i.min_time = min(C i.min_time, C ij.min_time); C i.code_size += C ij.code_size; C i.code_size += C ij.code_size; } C i += node cost for node sg i ; C i += node cost for node sg i ; return (C i ); return (C i );

39 Experiments * Proposed methods implemented and examined in POLIS system. in POLIS system. * Target CPU and compiler: M68HC11 and Introl C compiler. M68HC11 and Introl C compiler. * Difference D is defined as

40 Experimental Results : S-graph Level

41 Performance and Cost Estimation: Summary u S-graph: low-level enough to allow accurate performance estimation u Cost parameters assigned to each node, depending on: s System type (CPU, memory, bus,...) s Node and expression type u Cost parameters evaluated via simple benchmarks s Need timing and size measurements for each target system s Currently implemented for MIPS, and 68HC11 processors

42 Performance and Cost Estimation u Example: 68HC11 timing estimation u Cost assigned to s-graph edges (Different for taken/not taken branches) u Estimated time: s Min: 26 cycles s Max: 126 cycles u Accuracy: within 20% of profiling a := a + 1 a := 0 detect(c) a<b BEGIN END F T TF emit(y)

43 Open Problems u Better synthesis techniques s Add state variables to simplify s-graph s Performance-driven synthesis of critical paths s Exact memory/speed trade-off u Estimation of caching and pipelining effects s May have little impact on control-dominated systems (frequent branches and context switches) s Relatively easy during co-simulation