CSCI1600: Embedded and Real Time Software Lecture 34: Worst Case Execution Time Steven Reiss, Fall 2016 Image on slide 12 doesn’t show up on mac.
Worst Case Execution Time What is it? Longest time a task can take Why do we need it? Scheduling algorithms assume it is known Can’t say anything about real time without it What is the goal? Manually check each task to gets it max run time Automatically get the run time of a task using a tool Lecture 34: WCET 7/22/2019
What is the Problem This should be easy What are the problems? Knuth volume 1 does this for a variety of algorithms Just count the number of instructions What are the problems? The halting problem Almost anything you want to know about a real program is undecidable Need to understand and limit control flows Assume loops are bounded Need to understand the hardware # instructions does not yield execution time Need to understand the execution model Lecture 34: WCET 7/22/2019
Control Flow To compute WCET, the control flow must be limited Control flow can be modeled as a graph Graph of basic blocks Basic block: code with no branches Once started, will execute to completion Suppose we could compute the WCET of each block How could we compute the run time of the program Lecture 34: WCET 7/22/2019
Control Flow Graphs Loops have to be bounded Nested loops Bounds can be fixed Bound can be based on input Need to determine the bounds Nested loops Fixed, based on input Based on index of outer loop Lecture 34: WCET 7/22/2019
Reducible Control Structures Can you compute the time for an arbitrary graph? Can be difficult But programs don’t produce arbitrary graphs Clean programs produce reducible graphs A reducible graph allows you to cluster nodes WCET of a cluster can be computed The cluster can be replaced with a single node Lecture 34: WCET 7/22/2019
Reducible A graph is reducible if and only if repeated applications of the following actions yields a graph with only one node: Replace a self loop with a single node Replace a sequence of nodes without intermediate back edges (no internal loops) such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node Does not contain forbidden subgraph Lecture 34: WCET 7/22/2019
Reducible Example B1 B1 B2,B3,B4,B5 B2,B3,B4,B5 B6 B6 Lecture 34: WCET 7/22/2019
WCET On Reducible Graphs Assume you have WCET for each block This should be easy – sequence of instructions Can compute the WCET for each reduced block Loops are bounded Self loop = WCET(block) * loop count Others can’t have loops Compute MAX(WCET for each path) from start to finish Lecture 34: WCET 7/22/2019
Basic Block WCET Each instruction takes k cycles Count the number of cycles Multiply by the clock speed If only it were that simple Processor timing can depend on many factors Pipelining, speculation, out-of-order execution, memory access These introduce anomalies Memory behavior needs to be considered Caching yields different performance levels Lecture 34: WCET 7/22/2019
Speculation-Based CPU Anomalies Instruction A does conditional branch followed by B or C Speculate B rather than C, but execute C C is in the I-cache, B is not If A is in the I-cache, there is time to prefetch B B drives C out of the cache => Longer time If A is not in the I-cache, then the overall time is faster Lecture 34: WCET 7/22/2019
Scheduling-Based CPU Anomalies Instructions A-B-C-D-E B depends on A, D depends on C, E depends on D D, E use resource 1 (CPU unit) B, C use resource 2 Resource 2 initially in use A is run first If A is quick, then B is run followed by C,D,E This is linear time, with no overlap If A is slow, then C can start (resource 2 freed) B and D can then overlap Result is faster Lecture 34: WCET 7/22/2019
Memory Behavior Caching can change timings considerably Both instruction and data caching Why not just assume worst-case time / instruction What is the cost of an I-cache miss Can be several orders of magnitude It doesn’t happen that often Can’t afford to do this for each instruction D-cache (for memory instructions) is similar, but less often Need to maintain a complex model of processor and cache state Assume start state is unknown Determining worst case input can be difficult Need to handle preemption This could change the processor and cache states at any time But the number of preemptions can be limited Lecture 34: WCET 7/22/2019
Approaches to WCET We need to compute WCET To handle real time scheduling To understand real time limits What can we do with real programs Measurement-based approaches Code-analysis based approaches Hybrid approaches Lecture 34: WCET 7/22/2019
Measurement-Based Approaches Why not just run the code On multiple inputs, multiple times Recording the time it takes Get a graph of execution times Best, worst, distribution Choose wcet > worst Lecture 34: WCET 7/22/2019
Execution Time Distribution Lecture 34: WCET 7/22/2019
Practical Measurements Break the program in subtasks Input distribution can be better controlled Get measurements of the time for each subtask Put these together to get total time This can be a bit better but still not safe Lecture 34: WCET 7/22/2019
How to Get Measurements Getting Measurements Clock time, CPU cycle counters, etc. are available On real hardware, software probes might change processor states Simulation Assumes you know everything about the hardware On real hardware using hardware probes External triggers on hardware lines Picking inputs Randomly (from what space, what distribution) From sample data (how representative) Manually (can be difficult) Lecture 34: WCET 7/22/2019
Static WCET Analysis Compiler technology can be used Much of the same type of work that compilers do in the optimization process Compilers need to understand control flow Compilers want to understand loop bounds Compilers need to understand processor state Model the processor when generating instructions We can use this to compute WCET Lecture 34: WCET 7/22/2019
Static WCET Analysis Lecture 34: WCET 7/22/2019
Static Analysis for WCET Build the program model Control flow graph with connected basic blocks Include information on path dependencies What are path dependencies Might require programmer annotations Compute the loop bounds Have the programmer provide them for you Deduce through symbolic execution and constraints Hybrid approaches Lecture 34: WCET 7/22/2019
Static Analysis for WCET Estimate the time for each basic block Using a model of the CPU/Memory/etc. Tracking processor/cache states Known X, Known not X, unknown Produce a range instead of a single number Typically take into account I-cache, not D-cache Can be done using measurement Put the result back together Using reducible control structures Can be formulated as linear programming Still have to handle calls, … Lecture 34: WCET 7/22/2019
Other Techniques for WCET Partition the task into subtasks and analyze them Partitioning can be heuristic or programmer-defined Generally, the smaller the unit, the easier it is to analyze Hybrid approaches Use measurements for small units Do both measurement and static analysis to get a better approximation Use dynamics to determine possible initial states Lecture 34: WCET 7/22/2019
State-of-the-Art Tools Tools exist to do this work Using programmer annotations and assistance Tools aren’t perfect Don’t handle preemption and scheduling Don’t handle data caching Don’t have the most accurate models of the CPU Models aren’t necessarily correct Other tools Languages, compilers and system design for time-prediction Lecture 34: WCET 7/22/2019
Homework Read Chapter 17 Exercises 17.1,17.2 Lecture 34: WCET 7/22/2019