CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

Slides:



Advertisements
Similar presentations
Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
1 Optimizing compilers Managing Cache Bercovici Sivan.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
1 Lecture 6 Performance Measurement and Improvement.
Lecture 11: Memory Management
Multiprocessing Memory Management
1: Operating Systems Overview
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.
Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.
Csci4203/ece43631 Review Quiz. 1)It is less expensive 2)It is usually faster 3)Its average CPI is smaller 4)It allows a faster clock rate 5)It has a simpler.
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Cisc Complex Instruction Set Computing By Christopher Wong 1.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
CMSC 345 Fall 2000 Unit Testing. The testing process.
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY Lecture 12 CS2110 – Fall 2009.
1 Advance Computer Architecture CSE 8383 Ranya Alawadhi.
What have mr aldred’s dirty clothes got to do with the cpu
Real Time Operating Systems Lecture 10 David Andrews
Hardware Assisted Control Flow Obfuscation for Embedded Processors Xiaoton Zhuang, Tao Zhang, Hsien-Hsin S. Lee, Santosh Pande HIDE: An Infrastructure.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
Complexity of Algorithms
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Estimating the Worst-Case Energy Consumption of Embedded Software Ramkumar Jayaseelan Tulika Mitra Xianfeng Li School of Computing National University.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Computer Organization CS224 Fall 2012 Lessons 41 & 42.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Real-time aspects Bernhard Weirich Real-time Systems Real-time systems need to accomplish their task s before the deadline. – Hard real-time:
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
Advanced Architectures
Optimization Code Optimization ©SoftMoore Consulting.
5.2 Eleven Advanced Optimizations of Cache Performance
CS203 – Advanced Computer Architecture
Course Description Algorithms are: Recipes for solving problems.
Improving cache performance of MPEG video codec
CSCI1600: Embedded and Real Time Software
Page Replacement.
Worst-Case Execution Time
Evaluation and Validation
Operating Systems.
Lecture 4: Instruction Set Design/Pipelining
Course Description Algorithms are: Recipes for solving problems.
CSCI1600: Embedded and Real Time Software
Presentation transcript:

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015

Worst Case Execution Time  What is it?  Longest time a task can take  Why do we need it?  Scheduling algorithms assume it is known  Can’t say anything about real time without it  What is the goal?  Manually check each task to gets it max run time  Automatically get the run time of a task using a tool

What is the Problem  This should be easy  Knuth volume 1 does this for a variety of algorithms  Just count the number of instructions  What are the problems?  The halting problem  Almost anything you want to know about a real program is undecidable  Need to understand and limit control flows  Need to understand the hardware  Need to understand the execution model

Control Flow  To compute WCET, the control flow must be limited  Control flow can be modeled as a graph  Graph of basic blocks  Basic block: code with no branches  Once started, will execute to completion  Suppose we could compute the WCET of each block  How could we compute the run time of the program

Control Flow Graphs  Loops have to be bounded  Bounds can be fixed  Can be based on input  Need to determine the bounds  Nested loops  Fixed, based on input  Based on index of outer loop

Reducible Control Structures  Can you compute the time for an arbitrary graph?  Can be difficult  But programs don’t produce arbitrary graphs  Clean programs produce reducible graphs  A reducible graph allows you to cluster nodes  WCET of a cluster can be computed  The cluster can be replaced with a single node

Reducible  A graph is reducible iff repeated applications of the following actions yields a graph with only one node:  Replace a self loop with a single node  Replace a sequence of nodes such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node

Reducible Example B1 B2,B3,B4,B5 B6 B1 B2,B3,B4,B5 B6 B1,B2,B3,B4,B5,B6

WCET On Reducible Graphs  Assume you have WCET for each block  This should be easy – sequence of instructions  Can compute the WCET for each reduced block  Loops are bounded  Self loop = WCET(block) * loop count  Others can’t have loops  Compute MAX(WCET for each path) from start to finish

Basic Block WCET  Each instruction takes k cycles  Count the number of cycles  Multiply by the clock speed  If only it were that simple  Processor timing can depend on many factors  Pipelining, out-of-order execution  Memory behavior needs to be considered  Caching

Speculation-Based CPU Anomalies  Instruction A does conditional branch followed by B or C  Speculate B rather than C, but execute C  C is in the cache  If A is in the cache, there is time to prefetch B  B drives C out of the cache => Longer time  If A is not in the cache, then the overall time is faster

Scheduling-Based CPU Anomalies  Instructions A-B-C-D-E  B depends on A, D depends on C, E depends on D  A, D, E use resource 1 (CPU unit)  B, C use resource 2  Resource 2 initially in use  A is run first  If A is quick, then B is run followed by C,D,E  This is linear time, with no overlap  If A is slow, then C can start (resource 2 freed)  B and D can then overlap  Result is faster

Memory Behavior  Caching can change timings considerably  Both instruction and data caching  Why not just assume worst-case time / instruction  What is the cost of an I-cache miss  Can be several orders of magnitude  Can’t afford to do this for each instruction  Need to maintain a complex model of processor and cache state  Assume start state is unknown  Determining worst case input can be difficult  Need to handle preemption  This could change the processor and cache states at any time  But the number of preemptions can be limited

Approaches to WCET  We need to compute WCET  To handle real time scheduling  To understand real time limits  What can we do with real problems  Measurement-based approaches  Code-analysis based approaches  Hybrid approaches

Measurement-Based Approaches  Why not just run the code  On multiple inputs, multiple times  Recording the time it takes  Get a graph of execution times  Best, worst, distribution

Execution Time Distribution

Practical Measurements  Break the program in subtasks  Input distribution can be better controlled  Get measurements of the time for each subtask  Put these together to get total time  This can be a bit better but still not safe

How to Get Measurements  Getting Measurements  Clock time, CPU cycle counters, etc. are availbalbe  On real hardware, probes might change processor states  Simulation  Assumes you know everything about the hardware  On real hardware using hardware probes  External triggers on hardware lines  Picking inputs  Randomly (from what space, what distribution)  From sample data (how representative)  Manually (can be difficult)

Static WCET Analysis  Compiler technology can be used  Much of the same type of work that compilers do in the optimization process  Compilers need to understand control flow  Compilers want to understand loop bounds  Compilers need to understand processor state  Model the processor when generating instructions  We can use this to compute WCET

Static WCET Analysis

Static Analysis for WCET  Build the program model  Control flow graph with connected basic blocks  Include information on path dependencies  Might require programmer annotations  Compute the loop bounds  Have the programmer provide them for you  Deduce through symbolic execution and constraints  Hybrid approaches

Static Analysis for WCET  Estimate the time for each basic block  Using a model of the CPU/Memory/etc.  Tracking processor/cache states  Known X, Known not X, unknown  Produce a range instead of a single number  Typically take into account I-cache, not D-cache  Can be done using measurement  Put the result back together  Using reducible control structures  Can be formulated as linear programming  Still have to handle calls, …

Other Techniques for WCET  Partition the task into subtasks and analyze them  Partitioning can be heuristic or programmer-defined  Generally, the smaller the unit, the easier it is to analyze  Hybrid approaches  Use measurements for small units  Do both measurement and static analysis to get a better approximation  Use dynamics to determine possible initial states

State-of-the-Art Tools  Tools exist to do this work  Using programmer annotations and assistance  Tools aren’t perfect  Don’t handle preemption and scheduling  Don’t handle data caching  Don’t have the most accurate models of the CPU  Models aren’t necessarily correct  Other tools  Languages, compilers and system design for time-prediction

Next Time  Guest Lecture on Security: Vasilis Kemerlis  Project Presentations Start FRIDAY  Mechanics: Order, volunteers, …