Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti.

Slides:



Advertisements
Similar presentations
Approximation of the Worst-Case Execution Time Using Structural Analysis Matteo Corti and Thomas Gross Zürich.
Advertisements

Modern Processor Architectures Make WCET Analysis for HUME Challenging Christian Ferdinand AbsInt Angewandte Informatik GmbH.
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Introduction to Assembly language
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores Presented By: Rahil Shah Candidate for Master of Engineering in ECE Electrical.
T IME -P REDICTABLE E XECUTION OF E MBEDDED S OFTWARE ON M ULTI - CORE P LATFORMS Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury 1.
Optimizing single thread performance Dependence Loop transformations.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
1 IIES 2008 Thomas Heinz (Saarland University, CR/AEA3) | 22/03/2008 | © Robert Bosch GmbH All rights reserved, also regarding any disposal, exploitation,
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
TRACES Code Padding to Improve the WCET Calculability Christine Rochange and Pascal Sainrat Institut de Recherche en Informatique de Toulouse Toulouse.
Homework Any Questions?. Statements / Blocks, Section 3.1 An expression becomes a statement when it is followed by a semicolon x = 0; Braces are used.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,
1 Software Testing and Quality Assurance Lecture 41 – Software Quality Assurance.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
How to Improve Usability of WCET tools Dr.-Ing. Christian Ferdinand AbsInt Angewandte Informatik GmbH.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Introduction & Overview CS4533 from Cooper & Torczon.
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
Design Space Exploration
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
WCET Analysis for a Java Processor Martin Schoeberl TU Vienna, Austria Rasmus Pedersen CBS, Denmark.
Research supported by IBM CAS, NSERC, CITO Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters Marc Berndl.
A Time Predictable Instruction Cache for a Java Processor Martin Schoeberl.
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
Elements of Computing Systems, Nisan & Schocken, MIT Press, 2005, Chapter 11: Compiler II: Code Generation slide 1www.idc.ac.il/tecs.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems Some practical issues.
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
1  1998 Morgan Kaufmann Publishers Where we are headed Performance issues (Chapter 2) vocabulary and motivation A specific instruction set architecture.
 Control Flow statements ◦ Selection statements ◦ Iteration statements ◦ Jump statements.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Real-time aspects Bernhard Weirich Real-time Systems Real-time systems need to accomplish their task s before the deadline. – Hard real-time:
CSCI 212 Object-Oriented Programming in Java. Prerequisite: CSCI 111 variable assignment statement while loop for loop post-increment (i++) strong typing.
CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)
Flow Control in Imperative Languages. Activity 1 What does the word: ‘Imperative’ mean? 5mins …having CONTROL and ORDER!
Carnegie Mellon Lecture 8 Software Pipelining I. Introduction II. Problem Formulation III. Algorithm Reading: Chapter 10.5 – 10.6 M. LamCS243: Software.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Part 4 Nonlinear Programming 4.3 Successive Linear Programming.
WCET 2007, Pisa, page 1 of 27 Tid rum WCET'2007 Analysing Switch-Case Tables by Partial Evaluation Niklas Holsti Tidorum Ltd
CHaRy Software Synthesis for Hard Real-Time Systems
Worst-case Execution Time (WCET) Estimation
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
Computational Thinking, Problem-solving and Programming: General Principals IB Computer Science.
Computer Architecture Principles Dr. Mike Frank
Multiscalar Processors
Lecture 14: Iteration and Recursion (Section 6.5 – 6.6)
课程名 编译原理 Compiling Techniques
CSCI1600: Embedded and Real Time Software
COSC121: Computer Systems
Worst-Case Execution Time
Suhas Chakravarty, Zhuoran Zhao, Andreas Gerstlauer
Control Flow Analysis (Chapter 7)
Part 4 Nonlinear Programming
Processor Pipelines and Static Worst-Case Execution Time Analysis
IXP C Programming Language
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti

Matteo Corti, Goal WCET analysis: estimation of the longest possible running time Soft real-time systems: allow some approximations large applications

Matteo Corti, Thesis It is possible to perform the WCET estimation without relying on path enumeration: –bound the iterations of cyclic structures –find infeasible paths –analyze the call graph of object-oriented languages –estimate the instruction duration on modern architectures

Matteo Corti, Challenges Semantic: bounds on the iterations of cyclic control-flow structures infeasible paths Hardware-level: instruction duration modern architectures (caches, pipelines, branch prediction)

Matteo Corti, Outline Goal and thesis Semantic analysis Hardware-level analysis Environment Results Concluding remarks

Matteo Corti, Structure: Separated Approach semantic analysis HW-level analysis binary annotated binary WCET

Matteo Corti, Semantic Analysis Java bytecode Structural analysis Partial abstract interpretation Loop iteration bounds Block iteration bounds Call graph analysis Annotated assembler

Matteo Corti, Structural Analysis Powerful interval analysis Recognizes semantic constructs Useful when the source code is not available Iteratively matches the blocks with predefined patterns

Matteo Corti, Abstract Interpretation We perform a limited abstract interpretation pass over linear code segments. We discover some false paths (not containing cycles). We gather information on possible variables’ values. void foo(int i) { if (i > 0) { for(;i<10;i++) { bar(); }

Matteo Corti, Loop Iteration Bounds Bounds on the loop header computed similarly to C. Healy [RTAS’98]. Each loop is handled in isolation by analyzing the behavior of induction variables. –we consider integer local variables –we handle loops with several induction variables and multiple exit points –computes the minimal and maximal number of iterations for each loop header

Matteo Corti, Loop Header Iterations [101] [50] [100] The bounds on the iterations of the header are safe for the whole loop. But: some parts of the loop could be executed less frequently: for(int i=0; i<100; i++) { if (i < 50) { A; } else { B; } AB [101] [1] [100]

Matteo Corti, Block Iterations Block iterations are computed using the CFG root and the iteration branches. The header and the type of the biggest semantic region that includes all the predecessors of a node determine its number of iterations. P0P0 B H P1P1

Matteo Corti, Example void foo() { int i,j; for(i=0; i<100; i++) { if (i < 50) { for(j=0; j<10; j++) ; } [1] [101] [550] [100] [50] [500] [1]

Matteo Corti, Contributions (Semantic Analysis) We compute bounds on the iterations of basic blocks in quadratic time: –Structural analysis: O(B 2 ) –Loop bounds: O(B) –Block bounds: O(B) Related work –Automatically detected value-dependent constraints [Healy, RTAS’99]: –Abstract interpretation based approaches

Matteo Corti, Outline Goal and thesis Semantic analysis Hardware-level analysis Environment Results Concluding remarks

Matteo Corti, Instruction Duration Estimation Goal: compute the duration of the single instructions The maximum number of iteration for each instruction is known The duration depends on the context Limited computational context: We assume that the effects on the pipeline and caches of an instruction fade over time.

Matteo Corti, Partial Traces the last n instructions before the instruction i on a given trace n is determined experimentally ( instructions) i

Matteo Corti, WCET Estimation For every partial trace: –CPU behavior simulation (cycle precise) –duration according to the context We account for all the incoming partial traces (contexts) according to their iteration counts Block duration = ∑ instruction durations WCET = longest path

Matteo Corti, Data Caches Partial traces are too short to gather enough information on data caches Data caches are not simulated but estimated using run-time statistics The average frequency of data cache misses is measured with a set of test runs of the program

Matteo Corti, Structure: Separated Approach semantic analysis HW-level analysis run-time monitor binary annotated binary WCET cache behavior

Matteo Corti, Approximation We approximate the duration of single instructions. We do not approximate the number of times an instruction is executed. Inaccuracies are only due to cache and pipeline effects. No severe WCET underestimations are possible.

Matteo Corti, Contributions (HW-level Analysis) Partial traces evaluation –O(B) –analyze the instructions in their context –approximates the effects of instructions over time –includes run-time data for the analysis of data caches Related work –abstract interpretation based –data flow analyses

Matteo Corti, Outline Goal and thesis Semantic analysis Hardware-level analysis Environment Results Concluding remarks

Matteo Corti, Environment Java ahead-of-time bytecode to native compiler Linux Intel Pentium Pro family Semantic analysis: language independent Hardware-level analysis: architecture independent

Matteo Corti, Outline Goal and thesis Semantic analysis Hardware-level analysis Environment Results Concluding remarks

Matteo Corti, Evaluation It is not possible to test the whole input space to determine the WCET experimentally. small applications: known algorithm, the WCET can be forced at run time big applications: several runs with random input

Matteo Corti, Results – Small Kernels BenchmarkLoops MeasuredEstimated Overestimation [cycles] BubbleSort49.16· · % Division21.40· · % ExpInt31.28· · % Jacobi50.88· · % JanneComplex41.39· · % MatMult62.67· ·10 9 2% MatrixInversion111.42· · % Sieve41.29· · %

Matteo Corti, Results – Application Benchmarks Program Classes Methods Loops ObservedEstimated Over- estimation [cycles] _201_compress · · % JavaLayer · · % Linpack · · % SciMark · · % Whetstone · · %

Matteo Corti, Outline Goal and thesis Semantic analysis Hardware-level analysis Environment Results Concluding remarks

Matteo Corti, Conclusions Semantic analysis –fast partial abstract interpretation pass –scalable block iterations bounding algorithm taking into consideration different path frequencies inside loop bodies –no restrictions on the analyzed code Hardware-level analysis –instruction duration analyzed in the execution context –architecture independent