CSCI1600: Embedded and Real Time Software

Slides:

Advertisements

Similar presentations

Xianfeng Li Tulika Mitra Abhik Roychoudhury

Advertisements

1 Optimizing compilers Managing Cache Bercovici Sivan.

Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.

Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.

Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

Multiprocessing Memory Management

Microprocessors Introduction to ia64 Architecture Jan 31st, 2002 General Principles.

Chapter 13 Reduced Instruction Set Computers (RISC) Pipelining.

Multiscalar processors

Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.

Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:

Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.

Cisc Complex Instruction Set Computing By Christopher Wong 1.

Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.

CMSC 345 Fall 2000 Unit Testing. The testing process.

A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.

LOGO OPERATING SYSTEM Dalia AL-Dabbagh

Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.

What have mr aldred’s dirty clothes got to do with the cpu

Real Time Operating Systems Lecture 10 David Andrews

Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.

Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.

Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.

Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.

CSCI1600: Embedded and Real Time Software Lecture 11: Modeling IV: Concurrency Steven Reiss, Fall 2015.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.

CSCI1600: Embedded and Real Time Software Lecture 28: Verification I Steven Reiss, Fall 2015.

CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.

Pipelining Example Laundry Example: Three Stages

ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)

CSCI1600: Embedded and Real Time Software Lecture 17: Concurrent Programming Steven Reiss, Fall 2015.

Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.

CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Advanced Architectures

CE 454 Computer Architecture

CS101 Introduction to Computing Lecture 19 Programming Languages

The University of Adelaide, School of Computer Science

Optimization Code Optimization ©SoftMoore Consulting.

5.2 Eleven Advanced Optimizations of Cache Performance

CS203 – Advanced Computer Architecture

Algorithm Analysis CSE 2011 Winter September 2018.

Course Description Algorithms are: Recipes for solving problems.

Improving cache performance of MPEG video codec

CSCI1600: Embedded and Real Time Software

CSCI1600: Embedded and Real Time Software

Central Processing Unit

Worst-Case Execution Time

CSCI1600: Embedded and Real Time Software

Evaluation and Validation

CSCI1600: Embedded and Real Time Software

Advanced Computer Architecture

CSCI1600: Embedded and Real Time Software

CSCI1600: Embedded and Real Time Software

CSCI1600: Embedded and Real Time Software

CSCI1600: Embedded and Real Time Software

Chapter 12 Pipelining and RISC

Dynamic Hardware Prediction

Timing analysis research

Analysis of Algorithms

Lecture 4: Instruction Set Design/Pipelining

Course Description Algorithms are: Recipes for solving problems.

CSCI1600: Embedded and Real Time Software

CSCI1600: Embedded and Real Time Software

Presentation transcript:

CSCI1600: Embedded and Real Time Software Lecture 34: Worst Case Execution Time Steven Reiss, Fall 2016 Image on slide 12 doesn’t show up on mac.

Worst Case Execution Time What is it? Longest time a task can take Why do we need it? Scheduling algorithms assume it is known Can’t say anything about real time without it What is the goal? Manually check each task to gets it max run time Automatically get the run time of a task using a tool Lecture 34: WCET 7/22/2019

What is the Problem This should be easy What are the problems? Knuth volume 1 does this for a variety of algorithms Just count the number of instructions What are the problems? The halting problem Almost anything you want to know about a real program is undecidable Need to understand and limit control flows Assume loops are bounded Need to understand the hardware # instructions does not yield execution time Need to understand the execution model Lecture 34: WCET 7/22/2019

Control Flow To compute WCET, the control flow must be limited Control flow can be modeled as a graph Graph of basic blocks Basic block: code with no branches Once started, will execute to completion Suppose we could compute the WCET of each block How could we compute the run time of the program Lecture 34: WCET 7/22/2019

Control Flow Graphs Loops have to be bounded Nested loops Bounds can be fixed Bound can be based on input Need to determine the bounds Nested loops Fixed, based on input Based on index of outer loop Lecture 34: WCET 7/22/2019

Reducible Control Structures Can you compute the time for an arbitrary graph? Can be difficult But programs don’t produce arbitrary graphs Clean programs produce reducible graphs A reducible graph allows you to cluster nodes WCET of a cluster can be computed The cluster can be replaced with a single node Lecture 34: WCET 7/22/2019

Reducible A graph is reducible if and only if repeated applications of the following actions yields a graph with only one node: Replace a self loop with a single node Replace a sequence of nodes without intermediate back edges (no internal loops) such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node Does not contain forbidden subgraph Lecture 34: WCET 7/22/2019

Reducible Example B1 B1 B2,B3,B4,B5 B2,B3,B4,B5 B6 B6 Lecture 34: WCET 7/22/2019

WCET On Reducible Graphs Assume you have WCET for each block This should be easy – sequence of instructions Can compute the WCET for each reduced block Loops are bounded Self loop = WCET(block) * loop count Others can’t have loops Compute MAX(WCET for each path) from start to finish Lecture 34: WCET 7/22/2019

Basic Block WCET Each instruction takes k cycles Count the number of cycles Multiply by the clock speed If only it were that simple Processor timing can depend on many factors Pipelining, speculation, out-of-order execution, memory access These introduce anomalies Memory behavior needs to be considered Caching yields different performance levels Lecture 34: WCET 7/22/2019

Speculation-Based CPU Anomalies Instruction A does conditional branch followed by B or C Speculate B rather than C, but execute C C is in the I-cache, B is not If A is in the I-cache, there is time to prefetch B B drives C out of the cache => Longer time If A is not in the I-cache, then the overall time is faster Lecture 34: WCET 7/22/2019

Scheduling-Based CPU Anomalies Instructions A-B-C-D-E B depends on A, D depends on C, E depends on D D, E use resource 1 (CPU unit) B, C use resource 2 Resource 2 initially in use A is run first If A is quick, then B is run followed by C,D,E This is linear time, with no overlap If A is slow, then C can start (resource 2 freed) B and D can then overlap Result is faster Lecture 34: WCET 7/22/2019

Memory Behavior Caching can change timings considerably Both instruction and data caching Why not just assume worst-case time / instruction What is the cost of an I-cache miss Can be several orders of magnitude It doesn’t happen that often Can’t afford to do this for each instruction D-cache (for memory instructions) is similar, but less often Need to maintain a complex model of processor and cache state Assume start state is unknown Determining worst case input can be difficult Need to handle preemption This could change the processor and cache states at any time But the number of preemptions can be limited Lecture 34: WCET 7/22/2019

Approaches to WCET We need to compute WCET To handle real time scheduling To understand real time limits What can we do with real programs Measurement-based approaches Code-analysis based approaches Hybrid approaches Lecture 34: WCET 7/22/2019

Measurement-Based Approaches Why not just run the code On multiple inputs, multiple times Recording the time it takes Get a graph of execution times Best, worst, distribution Choose wcet > worst Lecture 34: WCET 7/22/2019

Execution Time Distribution Lecture 34: WCET 7/22/2019

Practical Measurements Break the program in subtasks Input distribution can be better controlled Get measurements of the time for each subtask Put these together to get total time This can be a bit better but still not safe Lecture 34: WCET 7/22/2019

How to Get Measurements Getting Measurements Clock time, CPU cycle counters, etc. are available On real hardware, software probes might change processor states Simulation Assumes you know everything about the hardware On real hardware using hardware probes External triggers on hardware lines Picking inputs Randomly (from what space, what distribution) From sample data (how representative) Manually (can be difficult) Lecture 34: WCET 7/22/2019

Static WCET Analysis Compiler technology can be used Much of the same type of work that compilers do in the optimization process Compilers need to understand control flow Compilers want to understand loop bounds Compilers need to understand processor state Model the processor when generating instructions We can use this to compute WCET Lecture 34: WCET 7/22/2019

Static WCET Analysis Lecture 34: WCET 7/22/2019

Static Analysis for WCET Build the program model Control flow graph with connected basic blocks Include information on path dependencies What are path dependencies Might require programmer annotations Compute the loop bounds Have the programmer provide them for you Deduce through symbolic execution and constraints Hybrid approaches Lecture 34: WCET 7/22/2019

Static Analysis for WCET Estimate the time for each basic block Using a model of the CPU/Memory/etc. Tracking processor/cache states Known X, Known not X, unknown Produce a range instead of a single number Typically take into account I-cache, not D-cache Can be done using measurement Put the result back together Using reducible control structures Can be formulated as linear programming Still have to handle calls, … Lecture 34: WCET 7/22/2019

Other Techniques for WCET Partition the task into subtasks and analyze them Partitioning can be heuristic or programmer-defined Generally, the smaller the unit, the easier it is to analyze Hybrid approaches Use measurements for small units Do both measurement and static analysis to get a better approximation Use dynamics to determine possible initial states Lecture 34: WCET 7/22/2019

State-of-the-Art Tools Tools exist to do this work Using programmer annotations and assistance Tools aren’t perfect Don’t handle preemption and scheduling Don’t handle data caching Don’t have the most accurate models of the CPU Models aren’t necessarily correct Other tools Languages, compilers and system design for time-prediction Lecture 34: WCET 7/22/2019

Homework Read Chapter 17 Exercises 17.1,17.2 Lecture 34: WCET 7/22/2019