Fakultät für informatik informatik 12 technische universität dortmund Worst-Case Execution Time Analysis - Session 19 - Heiko Falk TU Dortmund Informatik.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

fakultät für informatik informatik 12 technische universität dortmund Additional compiler optimizations Peter Marwedel TU Dortmund Informatik 12 Germany.
Evaluation and Validation
fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
- 1 -  P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund Memory-aware compilation enables fast, energy-efficient, timing predictable.
Programming Languages and Paradigms
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Fakultät für informatik informatik 12 technische universität dortmund Lab 4: Exploiting the memory hierarchy - Session 14 - Peter Marwedel Heiko Falk TU.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
CHIMAERA: A High-Performance Architecture with a Tightly-Coupled Reconfigurable Functional Unit Kynan Fraser.
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Fundamentals of Python: From First Programs Through Data Structures
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 WCET-aware Register Allocation based on Integer-Linear Programming Heiko Falk, Norman.
Reconciling Compilers & Timing Analysis for Safety-Critical Real-Time Systems – the WCET-aware C Compiler WCC Heiko Falk Embedded Systems/Real- Time Systems.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Introduction to Analysis of Algorithms
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
Cpeg421-08S/final-review1 Course Review Tom St. John.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
System Design and Analysis
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Intermediate Code. Local Optimizations
How to Improve Usability of WCET tools Dr.-Ing. Christian Ferdinand AbsInt Angewandte Informatik GmbH.
Schedule Midterm out tomorrow, due by next Monday Final during finals week Project updates next week.
Technische Universität Dortmund Automatic mapping to tightly coupled memories and cache locking Peter Marwedel 1,2, Heiko Falk 1, Robert Pyka 1, Lars Wehmeyer.
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
Algorithm Analysis (Big O)
System/Software Testing
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Computer Science 12 Embedded Systems Group © H. Falk | Dortmund, 08-Jul-08 Overview about Computer Science 12 at Dortmund University of Technology Heiko.
Design Space Exploration
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
Data Flow in Static Profiling Cathal Boogerd, Delft University, The Netherlands Leon Moonen, Simula Research Lab, Norway ?
Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
ICD-C Compiler Framework Dr. Heiko Falk  H. Falk, ICD/ES, 2008 ICD-C Compiler Framework 1.Highlights and Features 2.Basic Concepts 3.Extensions.
Unit III : Introduction To Data Structures and Analysis Of Algorithm 10/8/ Objective : 1.To understand primitive storage structures and types 2.To.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
CSC 413/513: Intro to Algorithms NP Completeness.
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.
Compiler Optimizations ECE 454 Computer Systems Programming Topics: The Role of the Compiler Common Compiler (Automatic) Code Optimizations Cristiana Amza.
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Static Identification of Delinquent Loads V.M. Panait A. Sasturkar W.-F. Fong.
SAFEWARE System Safety and Computers Chap18:Verification of Safety Author : Nancy G. Leveson University of Washington 1995 by Addison-Wesley Publishing.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
A Brief Maximum Entropy Tutorial Presenter: Davidson Date: 2009/02/04 Original Author: Adam Berger, 1996/07/05
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)
Optimization Simone Campanoni
Fakultät für informatik informatik 12 technische universität dortmund HIR Optimizations and Transformations - Session 12 - Heiko Falk TU Dortmund Informatik.
Fakultät für informatik informatik 12 technische universität dortmund Prepass Optimizations - Session 11 - Heiko Falk TU Dortmund Informatik 12 Germany.
Achieving Timing Predictability by Combining Models
Code Optimization.
CSCI1600: Embedded and Real Time Software
Optimizing Transformations Hal Perkins Autumn 2011
Evaluation and Validation
Introduction to Algorithm and its Complexity Lecture 1: 18 slides
CSCI1600: Embedded and Real Time Software
Presentation transcript:

fakultät für informatik informatik 12 technische universität dortmund Worst-Case Execution Time Analysis - Session 19 - Heiko Falk TU Dortmund Informatik 12 Germany Slides use Microsoft cliparts. All Microsoft restrictions apply.

- 2 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Schedule of the course TimeMondayTuesdayWednesdayThursdayFriday 09:30- 11:00 1: Orientation, introduction 2: Models of computation + specs 5: Models of computation + specs 9: Mapping of applications to platforms 13: Memory aware compilation 17: Memory aware compilation 11:00 Brief break 11:15- 12:30 6: Lab*: Ptolemy 10: Lab*: Scheduling 14: Lab*: Mem. opt. 18: Lab*: Mem. opt. 12:30Lunch 14:00- 15:20 3: Models of computation + specs 7: Mapping of applications to platforms 11: High-level optimizations* 15: Memory aware compilation 19: WCET & compilers* 15:20Break 15:40- 17:00 4: Lab*: Kahn process networks 8: Mapping of applications to platforms 12: High-level optimizations* 16: Memory aware compilation 20: Wrap-up * Dr. Heiko Falk

- 3 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Outline Worst-Case Execution Time  Definition  Static WCET Analysis  WCET-aware Compilation and Optimization Function Specialization / Procedure Cloning  Introduction and Code Examples  Function Specialization and WCET EST  Results References & Summary

- 4 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Worst-Case Execution Time Definition of the Worst-Case Execution Time (WCET):  Upper bound for the execution time of a program  irrespective of the program’s input data. Problem: Determination of a program’s actual WCET is not computable! (WCET computation can be reduced to the Halting Problem)

- 5 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Estimation of Worst-Case Execution Times Solution: Estimation of upper bounds for the actual (unknown) WCET Requirements on WCET estimates:  Safeness: WCET  WCET EST !  Tightness: WCET EST – WCET  minimal

- 6 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of the static WCET-Analyzer aiT Input: Binary executable program P to be analyzed. exec2crl: Disassembler, translates P into aiT’s own LIR CRL2. Value Analysis: Determines potential contents of processor registers for any point in P’s execution time. Note: P is never executed, emulated or simulated by aiT! Only static analyses are applied.

- 7 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of the static WCET-Analyzer aiT Loop Bound Analysis: Tries to determine lower and upper iteration bounds for each loop of P. Cache Analysis: Uses formal cache model, classifies each memory access within P as guaranteed cache hit or as cache miss.

- 8 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of the static WCET-Analyzer aiT Pipeline Analysis: Includes accurate model of processor pipeline. Depending on initial pipeline states, possible cache states etc., possible final pipeline states are determined per basic block. Result of pipeline analysis is the WCET EST of each basic block.

- 9 - technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of the static WCET-Analyzer aiT Path Analysis: Models all possible execution paths of P under consideration of the WCET EST of all basic blocks and computes the overall longest execution path leading to the global WCET EST of P. Output of aiT is e.g. the length of this longest path, i.e. the WCET EST of P.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Complexity of Static WCET-Analysis Problem: WCET analysis intractable for computers! If WCET were computable, one could solve the Halting Problem by checking in O(1) if WCET <  holds. Reason: It is not computable how long P stays in a loop. Automatic loop bound analysis only feasible for simple classes of loops. (Similar problems arise for recursive functions.)

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund User Annotations for WCET Analysis Way out: aiT user must provide information about e.g. minimal / maximal loop iterations and recursion depths. Annotations file: Contains such user annotations (“Flow Facts”) and is – amongst program P to be analyzed – one more input to aiT. [AbsInt Angewandte Informatik GmbH, ]

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund ICD-C Parser ANSI C ICD-C IR Code Selector LLIR Register Allocator LLIR  CRL2 aiT WCET Analysis CRL2 + WCET EST CRL2  LLIR WCET- Opt. ASM Analyses, Optimi- zations A WCET-Aware C-Compiler (WCC) Distinctive Feature: Tight integration of WCET analyzer aiT into WCC’s backend.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Tight Coupling WCC  aiT CRL2: Low-level intermediate representation (LIR) of aiT. Structure of CRL2: Control Flow Graph (CFG) consisting of routines, basic blocks, instructions and operations. WCET analysis applied to low-level machine code  coupling WCC  aiT done in compiler backend. Converter LLIR2CRL: translates WCC’s LIR into CRL2, executes aiT in the background. After successful WCET analysis: relevant WCET data extracted from CRL2, imported into WCC.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Imported WCET Data & Flow Facts in WCC Most important WCET data imported into WCC:  WCET EST of the entire program  WCET EST of each single basic block  Worst-Case execution frequencies of each CFG edge Flow Fact Annotation within WCC:  Annotation of e.g. loop iteration bounds directly in C source code: _Pragma( “loopbound min 10 max 10” );  Since compiler optimizations may restructure loops and thus their annotated bounds, WCC automatically keeps Flow Facts consistent during all applied optimizations.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Problems during WCET EST Minimization The Worst-Case Execution Path (WCEP):  WCET of a program P = length of longest execution path of P (WCEP)  To minimize P’s WCET EST, optimizations must exclusively focus on those parts of P lying on the WCEP.  Optimization of parts not lying on the WCEP don’t reduce WCET EST at all!  Optimization strategies for WCET EST Minimization must have detailed knowledge about the WCEP. WCET Analyzer aiT provides this detailed information about the WCEP, but…

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Switching WCEPs (1) main() { if ( … ) a(); else d(); return; } a() { … b(); … } b() { … c(); … } d() { … c(); … } main a b c d 10 Cyc. 50 Cyc. 80 Cyc. 65 Cyc. 120 Cyc. WCET EST of a for a in Main Mem. Example: Scratchpad allocation of functions for WCET EST minimization Above: Function Call Graph

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Switching WCEPs (2) main() { if ( … ) a(); else d(); return; } a() { … b(); … } b() { … c(); … } d() { … c(); … } main a b c d 10 Cyc. 50 Cyc. 80 Cyc. 65 Cyc. 120 Cyc. WCET EST = 205 Cyc. Initial WCEP: main, a, b, c Length of WCEP = WCET EST : 205 SPM-Allocation of b reduces WCET EST of b down to 40

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Switching WCEPs (3) main() { if ( … ) a(); else d(); return; } a() { … b(); … } b() { … c(); … } d() { … c(); … } main a b c d 10 Cyc. 50 Cyc. 40 Cyc. 65 Cyc. 120 Cyc. Initial WCEP: main, a, b, c Length of WCEP = WCET EST : 205 SPM-Allocation of b reduces WCET EST of b down to 40

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Switching WCEPs (4) main() { if ( … ) a(); else d(); return; } a() { … b(); … } b() { … c(); … } d() { … c(); … } main a b c d 10 Cyc. 50 Cyc. 40 Cyc. 65 Cyc. 120 Cyc. WCET EST = 195 Cyc. New WCEP: main, d, c New WCET EST : 195  WCEP has switched as side effect of optimization!

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Consequences for Compiler Optimizations Optimizations aiming at WCET EST minimization…  …always have to consider that the WCEP may switch after each and every decision taken by the optimization.  …should not take their decision where to optimize on the basis of local information. Instead, they should always consider the global effects of a decision. (SPM allocation of b in previous example leads to a local reduction of b ’s WCET EST by 40 cycles. However, a global reduction of only 10 cycles was achieved!) Challenge: To design completely new optimization strategies matching the above requirements and always considering the global CFG with its current WCEP.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Function Specialization Also known as “Procedure Cloning”:  Well-known optimization technique (1993)  Goals: To enable further standard optimizations and to reduce overhead for parameter passing during function calls  Approach: For worthwhile functions, specialized copies (“Clones”) are created having less parameters than their original versions.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Typical Function Calls int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i]; if ( i == 10 ) {... } } return x[n]; } int main() {... f( y, 5, 2 ) f( z, 5, 2 )... return f( a, 5, 2 ); } Observations:  f is called several times.  f has three parameters.  Constants 5 and 2 are passed several times for parameters n and p.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Further Observations int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i]; if ( i == 10 ) {... } } return x[n]; } int main() {... f( y, 5, 2 ) f( z, 5, 2 )... return f( a, 5, 2 ); } f is so-called general purpose function: Via parameter n, an array x of arbitrary size can be processed. Control flow within f depends on function parameter. Within main, f is used for special purpose: Processing of arrays of size n = 5.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund int f( int *x, int n, int p ) { for (i=0; i<n; i++) { x[i] = p * x[i]; if ( i == 10 ) {... } } return x[n]; } int main() {... f_5_2( y ) f_5_2( z )... return f_5_2( a ); } int f_5_2( int *x ) { for (i=0; i<5; i++) { x[i] = 2 * x[i]; if ( i == 10 ) {... } } return x[5]; } Specialization of General Purpose Routines

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Expected Effects of Function Specialization Reduction of the Average-Case Execution Time (ACET):  Enabling of further standard optimizations in cloned function: Constant propagation and constant folding Strength reduction Control flow simplification  Less code to be executed in order to pass arguments to cloned function. Increase of Code Size:  Potentially, many new highly specialized clones are generated.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Why Function Specialization and WCET EST ? Motivation:  In Embedded Software: General Purpose functions heavily used in special purpose contexts.  In particular, loop bounds are very often controlled by function parameters.  Recall: Loop bounds are extremely critical for a precise WCET analysis.  Function Specialization enables the extremely precise annotation of loop bounds! [P. Lokuciejewski, H. Falk et al., Influence of Procedure Cloning on WCET Prediction, Salzburg, CODES+ISSS 2007]

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Loop Bound Annotations & Cloning int f( int *x, int n, int p ) { // loopbound min 0, max n for (i=0; i<n; i++ ) { x[i] = p * x[i]; if ( i == 10 ) {... } } return x[n]; } Original Code: Loop annotations extremely imprecise since annotation must cover all calls of f in a safe way. If f is called somewhere with n = 2000 as maximal value, the annotation must always consider a maximum of 2000 iterations. For calls like e.g. f(a, 5, 2), 2000 iterations are considered as well  Massive WCET overestimation.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Loop Bound Annotations & Cloning int f_5_2( int *x ) { // loopbound min 5, max 5 for (i=0; i<5; i++) { x[i] = 2 * x[i]; if ( i == 10 ) {... } } return x[5]; } Specialized Code: Loop annotations extremely precise since annotation covers all calls of f_5_2 exactly. For calls like e.g. f_5_2(a), exactly 5 iterations are considered  Exact WCET estimation. Original function f or other additional clones of f fully independent of annotations within f_5_2  no interdependencies during WCET analysis for all these different clones of f.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Function Specialization Attention when applying Function Specialization:  Code size increases should be kept clearly in mind during specialization!  Cloning of each function which is called at least twice with the same constant parameter usually unacceptable. Parameterized Application of Function Specialization:  Size G: Never specialize a function f having a size larger than G.  Fraction K of same constant parameters: clone f only iff at least K% of all calls of f use the same constant parameters.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Workflow of Function Specialization Function Specialization Const. Folding & Propagation Strength Reduction If-Statement Optimization G = 2000 ICD-C Expressions K = 50% Considered Processors: Infineon TriCore TC1796 ARM 7 TDMI (ARM & THUMB instruction sets)

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative WCET EST after Function Specialization WCET EST improvements between 13% and up to 95%!

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative ACETs after Function Specialization Only marginal ACET improvements of max. 3%.  Influence of overhead for parameter passing and effect of successive optimizations on ACET obviously marginal.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Discussion of Results Very astonishing: How can one optimization impact two apparently similar objectives like WCET and ACET so differently? Reasons:  Code before specialization only poorly annotatable  aiT provides very imprecise WCET estimates  Code after specialization very precisely annotatable  aiT provides highly precise WCET estimates Function Specialization obviously effective in increasing the tightness of WCET EST (cf. Slide 4). The actually unknown (since intractable) WCET should be reduced by only a similar order of magnitude as the ACET.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Relative Code Sizes after Specialization Code size increases between 2% up to 325%!

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund References Function Specialization:  D. Bacon, S. Graham, O. Sharp, Compiler Transformations for High-Performance Computing, ACM Computing Surveys 26 (4),  P. Lokuciejewski, H. Falk, M. Schwarzer et al., Influence of Procedure Cloning on WCET Prediction, CODES+ISSS, Salzburg 2007.

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Summary WCET  Upper bound of a program’s execution time  Impossible to compute a program’s actual WCET  Static WCET analysis using aiT Function Specialization / Procedure Cloning  Specialization of general purpose functions  Enabling of standard compiler optimizations in specialized functions, simplification of code required for parameter passing  Small impact on ACET, huge impact on WCET EST

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund 12th International Workshop on Software & Compilers for Embedded Systems  April 23-24, 2009 Acropolis, Nice, France  Co-located with DATE 2009  Paper Submission: around mid November 2008 Website & CfP: Mailing List:

technische universität dortmund fakultät für informatik  h. falk, informatik 12, 2008 TU Dortmund Coffee/tea break (if on schedule) Q&A?