Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li.

Slides:

Advertisements

Similar presentations

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.

Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.

Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.

High Level Languages: A Comparison By Joel Best. 2 Sources The Challenges of Synthesizing Hardware from C-Like Languages  by Stephen A. Edwards High-Level.

1 Ivan Marsic Rutgers University LECTURE 15: Software Complexity Metrics.

SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.

An Optimal Algorithm of Adjustable Delay Buffer Insertion for Solving Clock Skew Variation Problem Juyeon Kim, Deokjin Joo, Taehan Kim DAC’13.

What is an Algorithm? (And how do we analyze one?)

Synthesis of Software Programs for Embedded Control Application Felice Balarin, Massimiliano Chiodo, Paolo Giusto, Harry Hsieh, ASV, etc. Presented by.

Timing-Based Communication Refinement for CFSMs Presenters:Heloise Hse, Irene Po Mentors:Jonathan Martin, Marco Sgroi Professor:Alberto Sangiovanni-Vincentelli.

Courseware Path-Based Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads,

UnInformed Search What to do when you don’t know anything.

CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.

HW/SW Synthesis. 2 Outline u Synthesis u CFSM Optimization u Software synthesis s Problem s Task synthesis s Performance analysis s Task scheduling s.

Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.

A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.

Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Aho-Corasick String Matching An Efficient String Matching.

System Partitioning Kris Kuchcinski

Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.

A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.

Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )

Data Flow Analysis Compiler Design Nov. 8, 2005.

EECS 249 Dec 4, 1999 Extending POLIS with User Defined Data Types Arvind Thirunarayanan Extending POLIS with User Defined Data Types Arvind Thirunarayanan.

October 18, 2001Cho & Kim 1 Software Synthesis EE202A Presentation October 18, 2001 Young H. Cho and Seung Hyun Kim.

Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.

Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.

1  Staunstrup and Wolf Ed. “Hardware Software codesign: principles and practice”, Kluwer Publication, 1997  Gajski, Vahid, Narayan and Gong, “Specification,

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology High-level Specification and Efficient Implementation.

ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.

May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.

Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )

HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.

Complexity of Algorithms

“Software” Esterel Execution (work in progress) Dumitru POTOP-BUTUCARU Ecole des Mines de Paris

Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.

- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.

CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.

C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.

1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Multi-Level Logic Synthesis.

Ricochet Robots Mitch Powell Daniel Tilgner. Abstract Ricochet robots is a board game created in Germany in A player is given 30 seconds to find.

SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.

Slack Analysis in the System Design Loop Girish VenkataramaniCarnegie Mellon University, The MathWorks Seth C. Goldstein Carnegie Mellon University.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

Software Testing.

IIT Kharagpur & Kingston Uni

Introduction Introduction to VHDL Entities Signals Data & Scalar Types

Introduction to cosynthesis Rabi Mahapatra CSCE617

Estimating Timing Profiles for Simulation of Embedded Systems

Chapter 12 Pipelining and RISC

Presentation transcript:

Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li

10/29/2002EE249 Discussion Session Introduction One of the most important purposes of hw/sw codesign is to find the optimum hw/sw partition of a system level specification under particular criteria Criteria Performance(speed, or the number of clock cycles) Cost(number of components, die size, or code size) Estimation At a lower abstraction level easy and accurate, but long design iteration time At a higher abstraction level reduce the exploring time Play an important role in the synthesis and optimization

10/29/2002EE249 Discussion Session Software Performance Estimation Cost of a mixed hw/sw system based on a standard micro-processor depends on the hw size Solution: Implement a given functionality with a program on the microprocessor Problem: Software implementation often fails to meet the performance requirement Tradeoff: To implement the critical portion in the program with hardware Software performance estimation is the key

10/29/2002EE249 Discussion Session POLIS System CFSMs (Codesign Finite State Machines) Does not discriminate between hw and sw Estimation provides preliminary timing information and also a measure for hw/sw partitioning A partitioning process takes place to identify the candidate components for sw implementation S-Graph (Software graph) To optimize the trade-off between the performance and the code size of the final implementation Estimation is helpful for s-graph optimization and sw module scheduling

10/29/2002EE249 Discussion Session Related Work Software performance depends on the structure of the software program as well as on the components of the target system The structure of the software program is more difficult to estimate as the abstraction level rises Most of the results are from the object code level which is the lowest level of abstraction, and are concerned with software that has a limited structure A number of approaches have been proposed A simple prediction method Statistical methods ……

10/29/2002EE249 Discussion Session Abstraction Models in POLIS CFSM HW: be mapped into an abstract hardware description format, and synthesized into a combinational circuit and a set of latches SW: be is translated into a data structure called s-graph

10/29/2002EE249 Discussion Session Abstraction Models in POLIS S-Graph: A DAG(directed acyclic graph) with one source node and one sink node Represent the control flow of a given behavior Four types of node: BEGIN, END, TEST, ASSIGN

10/29/2002EE249 Discussion Session S-Graph Semantics: Start with the BEGIN node Traverse each node along its edge, until reaching the END node At a TEST node, select one corresponding child with the value of the associated predicate P(V) At an ASSIGN node, assign the value of the associated function A(V) to the output variable z Translate an s-graph into a C program Traverse the graph in a depth-first manner TEST: if (or switch) statement ASSIGN: assignment statement The resulting C program has the same structure

10/29/2002EE249 Discussion Session Performance Estimation Methods Modeling the target system The structure of C code generated by POLIS: Function()……(1) { Initialization of local variable(assignment statements); ……(2) Structure of mixed if or switch statements and assignment statements; ……(3) Return;……(4) }

10/29/2002EE249 Discussion Session Modeling the Target System Execution time T=T pp +k T init +T struct Code size S=S pp +k S init +S struct T pp (S pp ) :for entering and exiting the function (1)+(4) T init ( S init ):for initializing local variables(2). k is the number of local variables. T struct (S struct ):for the structure of mixed conditional statements generated from TEST nodes and assignment statements generated from ASSIGN nodes(3).

10/29/2002EE249 Discussion Session Modeling the Target System (cont.) T pp, S pp, T init, S init are constant which can be determined beforehand T struct = ΣP i C t (node_type_of(i), variable_type_of(i)) S struct = ΣC s (node_type_of(i), variable_type_of(i)) P i =1 if node i is on a path, otherwise P i =0 C t and C s can be obtained by using simple benchmark programs containing a mix of the C statement that appears in the generated C programs and analyzing the execution time and code size of the programs on the target compiler and the target CPU

10/29/2002EE249 Discussion Session Benchmark Model Four attributes to characterize a system Name of the parameter set, a name for a unit of execution time, a name for a unit of code size, and the size of an integer variable seventeen cost parameters to model the execution time, and fifteen cost parameters to model the code size A TEST node with an event-type variable/multi-valued variable with a bit mask/multi-valued variable An ASSIGN node with an event-type variable/which assigns a constant to a variable/which assigns one variable to another one Pre-processing and post-processing A branch operation Initialization of a local variable Average execution time and size for pre-defined software library functions The size of pointers The size of integer variables

10/29/2002EE249 Discussion Session S-graph Level Estimation Property Property 1. Each node in an S-graph has a one-to- one correspondence with only a few statements in the synthesized C code Property 2. The form of each statement is determined by the type of corresponding node Property 3. The S-graph is a DAG, hence it does not include loops in its structure Each node/edge is weighted according to pre- calculated cost parameters in the pre-process

10/29/2002EE249 Discussion Session S-graph Level Estimation Algorithm: SGtrace(sg i ) If (sg i ==NULL) return (C(0,∞,0)); If(sg i has been visited) return (pre-calculated C i (*,*,0) associated with sg i ); C i =initialize (max_time=0; min_time=∞; code_size=0); For each child sg j of sg i { C ij =SGtrace(sg j )+edge cost for edge e ij If(C ij.max_time> C i.max_time) C i.max_time= C ij.max_time; If(C ij.min_time< C i.min_time) C i.min_time= C ij.min_time; C i.code_size+= C ij.code_size; } C i += node cost for node sg i ; Return(C i );

10/29/2002EE249 Discussion Session S-graph Level Estimation The computational complexity: O(E) Average execution time: C ave =ΣP ij (C t (node_type_of(i), variable_type_of(i))+ C e (i,j)) P ij is the possibility of executing node i and going to node j C e (i,j) is the edge cost for edge e ij

10/29/2002EE249 Discussion Session CFSM Level Estimation Is much more difficult since a CFSM model does not closely reflect the code structure MDDs are used to represent the transition relation function of a CFSM (a node represents a multi-valued variable; ordering is important) The estimation algorithm of the MDD is based on the assumption that the maximum(minimum) cost path in an MDD is usually the maximum (minimum) cost path in the s-graph that is generated from the MDD Also based on recursive DFS traversing algorithm There is no relation between the code size of the number of the MDD nodes

10/29/2002EE249 Discussion Session Experimental Results(1)

10/29/2002EE249 Discussion Session Experimental Results(2)

10/29/2002EE249 Discussion Session Experimental Results(3) Compared to an assembly-level analysis: S-graph(Table 1): The differences in the maximum execution time are within (-10%, +10%) The differences in the minimum execution time are within (-20%,+20%) The differences in code size are within (-20%,+20%) CFSM(Table 2): The differences in the maximum execution time are within (-10%,+25%) The differences in the minimum execution time are within (-20%,+20%)

10/29/2002EE249 Discussion Session Conclusions S-graph level method provides an accurate estimation for all analysis: the maximum and minimum execution time, and code size. It is a useful technique for optimization in software synthesis because of its accuracy. CFSM level method is less accurate than the s-graph estimation, but it is still accurate enough when estimating the maximum and minimum execution time. is important for automatic partitioning of CFSMs into hardware and software parts, and also for scheduler generation.

10/29/2002EE249 Discussion Session Conclusions Two software performance estimation methods for use with the POLIS hardware/software codesign system are proposed in this paper. S-graph level method CFSM level method The experimental results showed that the accuracy of both proposed methods is high enough for use in the POLIS system.