EECS 249, Fall 1999 Task Runtime Response Optimization Using Cost-Based Operation Motion Abdallah Tabbara Bassam Tabbara Alberto Sangiovanni-Vincentelli.

Slides:



Advertisements
Similar presentations
Embedded System, A Brief Introduction
Advertisements

Bayesian Belief Propagation
CSC 4181 Compiler Construction Code Generation & Optimization.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Course Outline Traditional Static Program Analysis Software Testing
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Program Representations. Representing programs Goals.
AUTOMATIC GENERATION OF CODE OPTIMIZERS FROM FORMAL SPECIFICATIONS Vineeth Kumar Paleri Regional Engineering College, calicut Kerala, India. (Currently,
08/31/2001Copyright CECS & The Spark Project SPARK High Level Synthesis System Sumit GuptaTimothy KamMichael KishinevskyShai Rotem Nick SavoiuNikil DuttRajesh.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
A High Performance Application Representation for Reconfigurable Systems Wenrui GongGang WangRyan Kastner Department of Electrical and Computer Engineering.
Center for Embedded Computer Systems University of California, Irvine Coordinated Coarse Grain and Fine Grain Optimizations.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
System Partitioning Kris Kuchcinski
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Center for Embedded Computer Systems University of California, Irvine and San Diego Hardware and Interface Synthesis of.
Center for Embedded Computer Systems University of California, Irvine SPARK: A High-Level Synthesis Framework for Applying.
Center for Embedded Computer Systems University of California, Irvine Dynamic Common Sub-Expression Elimination during Scheduling.
Center for Embedded Computer Systems University of California, Irvine and San Diego Loop Shifting and Compaction for the.
Improving Code Generation Honors Compilers April 16 th 2002.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
EECS 249 Dec 4, 1999 Extending POLIS with User Defined Data Types Arvind Thirunarayanan Extending POLIS with User Defined Data Types Arvind Thirunarayanan.
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
October 18, 2001Cho & Kim 1 Software Synthesis EE202A Presentation October 18, 2001 Young H. Cho and Seung Hyun Kim.
Optimizing Compilers Nai-Wei Lin Department of Computer Science and Information Engineering National Chung Cheng University.
ECE355 Fall 2004Software Reliability1 ECE-355 Tutorial Jie Lian.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Incremental Inference of Black Box Components to support Integration Testing [PhD paper] Muzammil Shahbaz France Telecom R&D Grenoble Institute of Technology.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Design Space Exploration
Lecture 13 Introduction to Embedded Systems Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering.
Automated Design of Custom Architecture Tulika Mitra
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
1 Chapter 7 Function-Architecture Codesign Paradigm.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
ECE 587 Hardware/Software Co- Design Lecture 23 LLVM and xPilot Professor Jia Wang Department of Electrical and Computer Engineering Illinois Institute.
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
Code Optimization Overview and Examples
High-level optimization Jakub Yaghob
Dynamo: A Runtime Codesign Environment
Fall Compiler Principles Lecture 8: Loop Optimizations
Machine-Independent Optimization
Introduction to Embedded Systems
Code Optimization Overview and Examples Control Flow Graph
Fall Compiler Principles Lecture 10: Loop Optimizations
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Intermediate Code Generation
Code Optimization.
Presentation transcript:

EECS 249, Fall 1999 Task Runtime Response Optimization Using Cost-Based Operation Motion Abdallah Tabbara Bassam Tabbara Alberto Sangiovanni-Vincentelli University of California at Berkeley

© 1999 Tabbara et. al. EECS 249, Fall Embedded System Electronic “brain” found in many applications e.g. àConsumer electronics àTelecommunications Consists of: àSoftware: flexibility àHardware: performance Application requirements on the system: àSmall àEfficient àPower àOther metrics

© 1999 Tabbara et. al. EECS 249, Fall Hardware/Software Co-design SynthesisSynthesis DesignRepresentationDesignRepresentation Design Specification Design Specification EvaluationEvaluation ImplementationImplementation HW/SW Partitioning Micro-processor ASIC SW HW

© 1999 Tabbara et. al. EECS 249, Fall Problem Statement Target: heterogeneous control-dominated embedded system applications àFunctional decomposition captures design as a network of Finite State Machines extended with data computations (EFSMs) (e.g. Esterel front-end) Goal: run-time optimization for the synthesis of each individual task No assumptions on how tasks are composed in the whole system.

© 1999 Tabbara et. al. EECS 249, Fall Intermediate Design Representation Function Flow Graph (FFG) / C-Like Intermediate Format (CLIF) [Tabbara 99] ÞAble to represent EFSM ÞSuitable for control and data flow analysis EFSM FFG Optimized FFG SW/HW Synthesis Data Flow/Control Optimizations

© 1999 Tabbara et. al. EECS 249, Fall Function Flow Graph (FFG) is a triple G = (V, E, N 0 ) where àV is a finite set of nodes àE = (x,y), a subset of V  V, is an edge from x to y where x  Pred(y), the set of predecessor nodes of y. àN 0  N is the start node corresponding to the EFSM initial state. àOperations are associated with each node N. ÞTESTs performed on the EFSM inputs and internal variables ÞASSIGNs of computations on the input alphabet (inputs/internal variables) to the EFSM output alphabet (outputs and internal (state) variables)

© 1999 Tabbara et. al. EECS 249, Fall EFSM in FFG Form (An Example in State Tree Form) F0 F1 F2 F8 F7 F6 F5 F4 F3 S0 S1 S2

© 1999 Tabbara et. al. EECS 249, Fall Previous Work (1) Code motion (hoisting) from the software (HLS) domain(s) àAvoid unnecessary re-computations at runtime ÞTemporary variables (“registers”) at certain program points àMust be safe: Main strategy ÞAs early as possible [Morel 1979], [Knoop 1992] àPractice: register pressure ÞTemporary lifetime minimzation [Knoop 1994] Limitations: not cost based, laborious and involves addition of “synthetic nodes” in the control structure

© 1999 Tabbara et. al. EECS 249, Fall Previous Work (2) [Hailperin 98]: “cost” extension to [Knoop 94] àMetric based on individual operations (+, *, …) àNo concept of I/O preservation (Embedded Systems) We need task level runtime cost à[Castellucia 96]: Probabilities of inputs/tests guides ordering/restructuring of EFSM nodes in Esterel single automata Cost-guided Relaxed Operation Motion àUse code motion techniques: safe (correct), fast àGuidance from runtime (average/worst-case) statistics

© 1999 Tabbara et. al. EECS 249, Fall (Cost-guided) Relaxed Operation Motion Our Approach (polynomial complexity in FFG nodes) consists of 4 steps: 1.Data Flow and Control Optimizations 2.Reverse Sweep (as early as possible/cost guided) a)Dead operation addition b)Normalization c)Available operation elimination d)Copy propagation e)Dead elimination 3.Forward Sweep (register lifetime minimization) 4.Final optimization pass

© 1999 Tabbara et. al. EECS 249, Fall Motivating Example [Knoop 94] … S8: z = a + b; a = c; goto S9; S9: x = a + b; goto S10; S10: …

© 1999 Tabbara et. al. EECS 249, Fall Relaxed Operation Motion S8: _T30 = a + b; z = _T30; a = c; goto S9; S9: _T30 = a + b; x = _T30; goto S10; Optimization PassS7: _T30 = a + b; y = _T30; _T30 = a + b; _T29 = c + b; _T30 = a + b; goto S8; S8: _T30 = a + b; z = _T30; a = c; _T30 = a + b; _T29 = c + b; _T30 = a + b; goto S9; S9: _T30 = a + b; x = _T30; _T30 = a + b; a = c; _T29 = c + b; _T30 = a + b; a = c; goto S10; Dead addition

© 1999 Tabbara et. al. EECS 249, Fall Relaxed Operation Motion _T30 = a + b; …. S8: z = _T30; a = c; _T30 = c + b; goto S9; S9: x = _T30; a = c; _T30 = c + b; goto S10; Copy Propagation S1: _T31 = a + b; H = _T31; _T29 = c + b; … S8: z = H; H = _T29; goto S9; S9: x = H; goto S10; Optimization Pass S8: z = _T30; a = c; _T30 = a + b; goto S9; S9: x = _T30; a = c; _T30 = a + b; goto S10; Available Elimination

© 1999 Tabbara et. al. EECS 249, Fall Optimization and Synthesis Flow CDFG (SHIFT) Software Compilation Object Code (.o) Hardware Synthesis Netlist Or Cost Estimation Design Optimization HW/SW Co-Synthesis User InputProfiling Inference Engine Attributed FFG Relaxed Operation Motion FFG (back-end)

© 1999 Tabbara et. al. EECS 249, Fall Work In Progress Cost estimation methodology Operation motion àGuidance àLifetime optimality (forward sweep) Results collection on motivating example àWe already beat [Knoop 94] àEvaluate with various cost scenarios àCollect synthesis results

© 1999 Tabbara et. al. EECS 249, Fall Cost Estimation Using Bayesian Belief Networks (1)

© 1999 Tabbara et. al. EECS 249, Fall Cost Estimation Using Bayesian Belief Networks (2)

© 1999 Tabbara et. al. EECS 249, Fall Conclusions Novel approach for task runtime response optimization: àCode motion from software domain limited mostly to loop invariants, no real task runtime cost guidance Our approach: Relaxed Code Motion àIs “natural” in a control/data flow optimization framework àSpecialize to embedded domain tasks e.g. I/O preservation across invocations àApply application/environment driven costs to optimization