CSCI1600: Embedded and Real Time Software

CSCI1600: Embedded and Real Time Software
Lecture 34: Worst Case Execution Time Steven Reiss, Fall 2016 Image on slide 12 doesn’t show up on mac.

Worst Case Execution Time
What is it? Longest time a task can take Why do we need it? Scheduling algorithms assume it is known Can’t say anything about real time without it What is the goal? Manually check each task to gets it max run time Automatically get the run time of a task using a tool Lecture 34: WCET 7/22/2019

What is the Problem This should be easy What are the problems?
Knuth volume 1 does this for a variety of algorithms Just count the number of instructions What are the problems? The halting problem Almost anything you want to know about a real program is undecidable Need to understand and limit control flows Assume loops are bounded Need to understand the hardware # instructions does not yield execution time Need to understand the execution model Lecture 34: WCET 7/22/2019

Control Flow To compute WCET, the control flow must be limited
Control flow can be modeled as a graph Graph of basic blocks Basic block: code with no branches Once started, will execute to completion Suppose we could compute the WCET of each block How could we compute the run time of the program Lecture 34: WCET 7/22/2019

Control Flow Graphs Loops have to be bounded Nested loops
Bounds can be fixed Bound can be based on input Need to determine the bounds Nested loops Fixed, based on input Based on index of outer loop Lecture 34: WCET 7/22/2019

Reducible Control Structures
Can you compute the time for an arbitrary graph? Can be difficult But programs don’t produce arbitrary graphs Clean programs produce reducible graphs A reducible graph allows you to cluster nodes WCET of a cluster can be computed The cluster can be replaced with a single node Lecture 34: WCET 7/22/2019

Reducible A graph is reducible if and only if repeated applications of the following actions yields a graph with only one node: Replace a self loop with a single node Replace a sequence of nodes without intermediate back edges (no internal loops) such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node Does not contain forbidden subgraph Lecture 34: WCET 7/22/2019

Reducible Example B1 B1 B2,B3,B4,B5 B2,B3,B4,B5 B6 B6
Lecture 34: WCET 7/22/2019

WCET On Reducible Graphs
Assume you have WCET for each block This should be easy – sequence of instructions Can compute the WCET for each reduced block Loops are bounded Self loop = WCET(block) * loop count Others can’t have loops Compute MAX(WCET for each path) from start to finish Lecture 34: WCET 7/22/2019

Basic Block WCET Each instruction takes k cycles
Count the number of cycles Multiply by the clock speed If only it were that simple Processor timing can depend on many factors Pipelining, speculation, out-of-order execution, memory access These introduce anomalies Memory behavior needs to be considered Caching yields different performance levels Lecture 34: WCET 7/22/2019

Speculation-Based CPU Anomalies
Instruction A does conditional branch followed by B or C Speculate B rather than C, but execute C C is in the I-cache, B is not If A is in the I-cache, there is time to prefetch B B drives C out of the cache => Longer time If A is not in the I-cache, then the overall time is faster Lecture 34: WCET 7/22/2019

Scheduling-Based CPU Anomalies
Instructions A-B-C-D-E B depends on A, D depends on C, E depends on D D, E use resource 1 (CPU unit) B, C use resource 2 Resource 2 initially in use A is run first If A is quick, then B is run followed by C,D,E This is linear time, with no overlap If A is slow, then C can start (resource 2 freed) B and D can then overlap Result is faster Lecture 34: WCET 7/22/2019

Memory Behavior Caching can change timings considerably
Both instruction and data caching Why not just assume worst-case time / instruction What is the cost of an I-cache miss Can be several orders of magnitude It doesn’t happen that often Can’t afford to do this for each instruction D-cache (for memory instructions) is similar, but less often Need to maintain a complex model of processor and cache state Assume start state is unknown Determining worst case input can be difficult Need to handle preemption This could change the processor and cache states at any time But the number of preemptions can be limited Lecture 34: WCET 7/22/2019

Approaches to WCET We need to compute WCET
To handle real time scheduling To understand real time limits What can we do with real programs Measurement-based approaches Code-analysis based approaches Hybrid approaches Lecture 34: WCET 7/22/2019

Measurement-Based Approaches
Why not just run the code On multiple inputs, multiple times Recording the time it takes Get a graph of execution times Best, worst, distribution Choose wcet > worst Lecture 34: WCET 7/22/2019

Execution Time Distribution
Lecture 34: WCET 7/22/2019

Practical Measurements
Break the program in subtasks Input distribution can be better controlled Get measurements of the time for each subtask Put these together to get total time This can be a bit better but still not safe Lecture 34: WCET 7/22/2019

How to Get Measurements
Getting Measurements Clock time, CPU cycle counters, etc. are available On real hardware, software probes might change processor states Simulation Assumes you know everything about the hardware On real hardware using hardware probes External triggers on hardware lines Picking inputs Randomly (from what space, what distribution) From sample data (how representative) Manually (can be difficult) Lecture 34: WCET 7/22/2019

Static WCET Analysis Compiler technology can be used
Much of the same type of work that compilers do in the optimization process Compilers need to understand control flow Compilers want to understand loop bounds Compilers need to understand processor state Model the processor when generating instructions We can use this to compute WCET Lecture 34: WCET 7/22/2019

Static WCET Analysis Lecture 34: WCET 7/22/2019

Static Analysis for WCET
Build the program model Control flow graph with connected basic blocks Include information on path dependencies What are path dependencies Might require programmer annotations Compute the loop bounds Have the programmer provide them for you Deduce through symbolic execution and constraints Hybrid approaches Lecture 34: WCET 7/22/2019

Static Analysis for WCET
Estimate the time for each basic block Using a model of the CPU/Memory/etc. Tracking processor/cache states Known X, Known not X, unknown Produce a range instead of a single number Typically take into account I-cache, not D-cache Can be done using measurement Put the result back together Using reducible control structures Can be formulated as linear programming Still have to handle calls, … Lecture 34: WCET 7/22/2019

Other Techniques for WCET
Partition the task into subtasks and analyze them Partitioning can be heuristic or programmer-defined Generally, the smaller the unit, the easier it is to analyze Hybrid approaches Use measurements for small units Do both measurement and static analysis to get a better approximation Use dynamics to determine possible initial states Lecture 34: WCET 7/22/2019

State-of-the-Art Tools
Tools exist to do this work Using programmer annotations and assistance Tools aren’t perfect Don’t handle preemption and scheduling Don’t handle data caching Don’t have the most accurate models of the CPU Models aren’t necessarily correct Other tools Languages, compilers and system design for time-prediction Lecture 34: WCET 7/22/2019

Homework Read Chapter 17 Exercises 17.1,17.2 Lecture 34: WCET
7/22/2019

CSCI1600: Embedded and Real Time Software

Similar presentations

Presentation on theme: "CSCI1600: Embedded and Real Time Software"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSCI1600: Embedded and Real Time Software

Similar presentations

Presentation on theme: "CSCI1600: Embedded and Real Time Software"— Presentation transcript:

Similar presentations

About project

Feedback