Compiling Esterel into Static Discrete-Event Code Stephen A. Edwards Columbia University Computer Science Department New York, USA

Slides:

Advertisements

Similar presentations

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.

Advertisements

Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.

Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

EECC551 - Shaaban #1 Fall 2003 lec# Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining increases performance by overlapping.

Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.

Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.

Optimizations for Faster Simulation of Esterel Programs Dumitru POTOP-BUTUCARU Advisers: Gérard Berry – Esterel Technologies Robert de Simone – INRIA,

Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.

Optimized State Encoding for Esterel Programs Dumitru POTOP-BUTUCARU.

Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)

Code Generation Mooly Sagiv html:// Chapter 4.

Introduction to Code Generation Mooly Sagiv html:// Chapter 4.

Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.

Esterel Overview Roberto Passerone ee249 discussion section.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

Tentative Schedule 20/12 Interpreter+ Code Generation 27/12 Code Generation for Control Flow 3/1 Activation Records 10/1 Program Analysis 17/1 Register.

Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.

Aho-Corasick String Matching An Efficient String Matching.

Parallelizing Compilers Presented by Yiwei Zhang.

Copyright © 2001 Stephen A. Edwards All rights reserved Esterel and Other Projects Prof. Stephen A. Edwards Columbia University, New York

Logic Simulation 2 Outline –Timing Models –Simulation Algorithms Goal –Understand timing models –Understand simulation algorithms Reading –Gate-Level Simulation.

Improving Code Generation Honors Compilers April 16 th 2002.

Introduction to Code Generation Mooly Sagiv html:// Chapter 4.

Einsterel: A Compiled Event-Driven Simulator for Esterel.

Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.

Copyright © 2001 Stephen A. Edwards All rights reserved ESUIF: An Open Esterel Compiler Stephen A. Edwards Department of Computer Science Columbia University,

LabVIEW an Introduction

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Compiling ESTEREL circuits into finite states machines BRES Yannis Stage de DEA d’Informatique 1998/1999.

P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.

Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.

CS4311 Spring 2011 Unit Testing Dr. Guoqiang Hu Department of Computer Science UTEP.

COP 4620 / 5625 Programming Language Translation / Compiler Writing Fall 2003 Lecture 10, 10/30/2003 Prof. Roy Levow.

© 2008 Wayne Wolf Overheads for Computers as Components 2nd ed. Program design and analysis Software components. Representations of programs. Assembly.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

An introduction to Esterel and its compilation

“Software” Esterel Execution (work in progress) Dumitru POTOP-BUTUCARU Ecole des Mines de Paris

1 CS 201 Compiler Construction Introduction. 2 Instructor Information Rajiv Gupta Office: WCH Room Tel: (951) Office.

© S. Ramesh / Kavi Arya / Krithi Ramamritham 1 IT-606 Embedded Systems (Software) S. Ramesh Kavi Arya Krithi Ramamritham KReSIT/ IIT Bombay.

Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:

1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

 Control Flow statements ◦ Selection statements ◦ Iteration statements ◦ Jump statements.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.

Engineering Computing I Chapter 3 Control Flow. Chapter 3 - Control Flow The control-flow of a language specify the order in which computations are performed.

Loops Simone Campanoni

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.

Efficient Evaluation of XQuery over Streaming Data

Optimization Code Optimization ©SoftMoore Consulting.

Esterel By: Sam Weinberg.

Program Flow CSCE 121 J. Michael Moore.

Copyright © 2011, Elsevier Inc. All rights Reserved.

Directed Acyclic Graphs && Topological Sorting

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Control Flow Analysis (Chapter 7)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Bubble Sort begin; int A[10]; main(){ int i,j; Do 10 i = 0, 9, 1

Code Optimization.

Presentation transcript:

Compiling Esterel into Static Discrete-Event Code Stephen A. Edwards Columbia University Computer Science Department New York, USA Vimal Kapadia and Michael Halas IBM Poughkeepsie NY USA Presented by Michael Halas SLAP 2004

Compiles Esterel into very efficient C code Minimizes runtime overhead Compile time Runtime CEC

Input I,S ; Output O,Q; An Example Modeling a shared resource every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every Takes I, and passes to group two through R Responds to R with A Makes Q delayed version of R 2 1 3

Takes I, and passes to group two through R Responds to R with A Makes Q delayed version of R await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

Takes I, and passes to group two through R Responds to R with A Makes Q delayed version of R await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

Takes I, and passes to group two through R Responds to R with A Makes Q delayed version of R await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

The GRC Representation Developed by Potop-Butucaru

Input I,S ; Output O,Q; Signal R,A in every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every Control Flow Graph

Executes once per cycle from entry to exit Input I,S ; Output O,Q; Signal R,A in every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

Input I,S ; Output O,Q; Signal R,A in every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

await I; weak abort sustain R when immediate A; emit O 0

1. Group the GRC nodes into clusters that can run without interruption 2. Assign levels – Partial Ordering Levels execute in order Clusters within the same level can execute in any order Clustering

1. Group the GRC nodes into clusters that can run without interruption 2. Assign levels – Partial Ordering Levels execute in order – Compile Time Clusters within the same level can execute in any order Clustering

1. Group the GRC nodes into clusters that can run without interruption 2. Assign levels – Partial Ordering Levels execute in order – Compile Time Clusters within the same level can execute in any order - Runtime Clustering

every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

every S do await I; weak abort sustain R when immediate A; emit O || loop pause; pause; present R then emit A end present end loop || loop present R then pause; emit Q else pause end present end loop end every

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

Level 0 Level 1 Level 2 Running A Cycle

//Cluster0 goto *head1 ; C1: goto *next1 ; C2: goto *next2 ; C3: goto *next3 ; C4: goto *next4 ; END_LEVEL2: goto *head3 ; END_LEVEL1: goto *head2 ; Linked list structure with nothing scheduled Running A Cycle Only have to run cluster 0 and jump to each level

Schedule cluster 2 in the empty structure next2 = head1, head1 = &&C2 //Cluster0 goto *head1 ; C1: goto *next1 ; C2: goto *next2 ; C3: goto *next3 ; C4: goto *next4 ; END_LEVEL2: goto *head3 ; END_LEVEL1: goto *head2 ;

//Cluster0 goto *head1 ; C1: goto *next1 ; C2: goto *next2 ; C3: goto *next3 ; C4: goto *next4 ; END_LEVEL2: goto *head3 ; END_LEVEL1: goto *head2 ; Schedule cluster 2 to the empty structure next2 = head1, head1 = &&C2

//Cluster0 goto *head1 ; C1: goto *next1 ; C2: goto *next2 ; C3: goto *next3 ; C4: goto *next4 ; END_LEVEL2: goto *head3 ; END_LEVEL1: goto *head2 ; Schedule cluster 2 to the empty structure next2 = head1, head1 = &&C2

Experimental Results

Potop-Butucaru's grc2c Beats us on four of the five examples We are substantially faster on the largest example SAXO-RT compiler We are faster on the three largest examples Five medium sized examples

Most closely resembles SAXO-RT Basic blocks Sorted topologically Executed based on run-time scheduling decisions Two main differences: Only schedule blocks within the current cycle Linked list that eliminates conditional test instead of a scoreboard

Time in seconds to execute iterations of the generated code on a 1.7 GHz Pentium 4. The height of the bars indicates the time in seconds. (Shorter is better)

C/L: Clusters Per Level The higher C/L the better

Conclusion Results in improved running times over an existing compiler that uses a similar technique (SAXO-RT) Faster than the fastest-known compiler in the largest example (Potop-Butucaru's)

Source and object code for the compiler described in this presentation is freely available as part of the Columbia Esterel Compiler distribution available from: