Presentation is loading. Please wait.

Presentation is loading. Please wait.

Processor Pipelines and Static Worst-Case Execution Time Analysis

Similar presentations


Presentation on theme: "Processor Pipelines and Static Worst-Case Execution Time Analysis"— Presentation transcript:

1 Processor Pipelines and Static Worst-Case Execution Time Analysis
PhD dissertation by: Jakob Egblom Presented by: Sibin Mohan July 4, 2019 Sibin Mohan : Systems Group Seminar

2 Sibin Mohan : Systems Group Seminar
Introduction Worst-Case Execution Time (WCET) Timing Analysis Experimental vs Static Analysis Processor Pipelines Processor pipelines increase performance by overlapping execution of successive instructions July 4, 2019 Sibin Mohan : Systems Group Seminar

3 Sibin Mohan : Systems Group Seminar
Pipelines Various types of pipelines: Simple scalar pipelines Scalar pipelines Superscalar in-order pipelines VLIW ( Very Long Instruction Word ) Superscalar out-of-order pipelines July 4, 2019 Sibin Mohan : Systems Group Seminar

4 Sibin Mohan : Systems Group Seminar
Goals of this work Design a static WCET tool. WCET tool must be : Retargetable Flexible Efficient Broad applicability Correctness July 4, 2019 Sibin Mohan : Systems Group Seminar

5 Sibin Mohan : Systems Group Seminar
WCET Overview Components of WCET Analysis: Flow Analysis Global low-level analysis Local low-level analysis Calculation July 4, 2019 Sibin Mohan : Systems Group Seminar

6 Sibin Mohan : Systems Group Seminar
Flow Analysis Analyze source/object code Determine possible flow in program i.e. dynamic behavior of program Computationally intractable approximate analysis used. Three stages : flow determination flow representation preparation for calculation. Results in: which functions get called, how many times loops iterate, if dependences exist between different if-statements, etc. Determination – ACTUAL analysis of code. Done by manual annotations, flow facts language, automatic flow analysis, etc. Representation – rep of info obtained in the analysis case – rep by constraints on the code/instructions such as loop bounds Prep for Calculation – info is processed to be useful for the particular calculation module used – flow info is preprocessed to be used by a particular calculation method. Problem ( mapping problem ) : flow information, obtained at source level, must be mapped to the object code level – non trivial task. One solution is to perform the analysis inside a compiler on intermediate code, which mostly avoids the mapping problem, ‘cos analysis can be performed on optimised program with full info available to the compiler. July 4, 2019 Sibin Mohan : Systems Group Seminar

7 Sibin Mohan : Systems Group Seminar
Low Level Analysis Global low-level analysis: whole program/large parts of it eg. : cache behaviour/branch prediction approximate & safe analysis used can integrate results in two ways: assign execution time penalty use as input to local low-level analysis. Low-level analysis – analyse program code to obtain timing behaviour Two types : Global low-level : analysis over whole program.large parts of it necessary ; egs : cache behaviour/branch predictions. For caches, analysis must consider many instructions, arbitrarily remote from the current instruction ; exact analysis is not always possible, and hence an approximate and safe method must be used. For eg. : for a cache, unless we can determine absolutely that an instruction hits in cache, then we assume a miss, resulting in overestimation. Different approach – have separate cache analysis phase and integrate this into low-level analysis phase. When results are used as input to local low-level analysis, simulate result of cache miss/branch prediction on actual execution of instructions in processor pipeline. July 4, 2019 Sibin Mohan : Systems Group Seminar

8 Low Level Analysis ( contd. )
Local low-level analysis : machine timing of single instructions egs. : pipeline overlap/memory access assign execution times to instructions pipeline overlap -> negative times overlaps between basic blocks not necessarily neighbours. Higher precisions than global analysis July 4, 2019 Sibin Mohan : Systems Group Seminar

9 Sibin Mohan : Systems Group Seminar
Calculation Tree-based Path-based IPET ( Implicit Path Enumeration ) Parameterized WCET calculation. Tree-based : bottom-up traversal of the a tree corresponding to syntactical parse tree of program ; conceptually simple and computationally cheap ; has problems handling flow information ( cannot consider dependences between statements/blocks ) Path-based : calculating times for different paths in a program, searching for path with longest execution time. IPET : express program flow and exec times using algebraic and/or logical constraints – usually solved using constraint-solving or integer linear programming techniques. Parameterized : result of WCET is a formula, solved probably at a later stage ( typically exec time ) when info is available. July 4, 2019 Sibin Mohan : Systems Group Seminar

10 WCET Tool Architecture
Program Source Manual Annotations i/p data specs Compiler Flow Analysis Global low-level Analysis Local low-level Calculation WCET Object Code Scope Graph Timing Model scope graph : represents structure of function calls and loops Timing model : communicates timing of the program from low-level analysis to calculation phase Can string together different flow analysis, global and local low-level analyses together each adding info to scope graph. For eg. : cache analysis and branch prediction analysis can both be used and results put into scope graph. Scope graph : directed acyclic graph of scopes in program. Each function call and each loop nest is a new scope. Can add “Execution scenarios” in the scope graph, to provide additional info : such as hit/miss info on each instruction, memory access times, bound on execution time of varying len. Instructions – used mainly by pipelines analysis phase. July 4, 2019 Sibin Mohan : Systems Group Seminar

11 WCET Tool Architecture (contd.)
Local low-level Analysis Object Code CPU Simulator Timing Graph Pipeline Analysis Construction of Timing Graph Timing Model Scope Graph all analysis is based on diving object code into basic blocks Timing graph : generated from scope graph – flat representation of the program ; hierarchy of scope graph not present CPU simulator used to obtain timing values for individual instructions or sequences of instructions. July 4, 2019 Sibin Mohan : Systems Group Seminar

12 Sibin Mohan : Systems Group Seminar
Timing Model Abstract representation Capture exec. time of program Model timing effects of Pipeline Compose from smaller parts Store concrete execution times Based on timing graph. composing from smaller parts – avoids need to analyse complete exec paths in program Hardware treated as a black box, and store only concrete exec times and not state of pipeline. Timing model : decoration of graph with times for nodes and sequences of nodes July 4, 2019 Sibin Mohan : Systems Group Seminar

13 Sibin Mohan : Systems Group Seminar
Timing Model (contd.) I1…Im – sequence of instructions T( I1…Im ) – time for above instructions Let T(I) = tI – exec time of single instruction To capture pipeline effects, define: Timing effects, T is calculated based on when I1 enters it’s first stage ( empty pipe ) and when Im leaves it’s last stage An important assumption is that the same sequence of instr always yields same time. This requires that the h/w be deterministic Delta vals are added to the basic execution times and thus they are negative in the case of a pipeline overlap. Basically, exec for an arbit sequence of nodes : Adding node times of all nodes in sequence Timing effects of all subsequences of the sequence Hence, execution time, is: July 4, 2019 Sibin Mohan : Systems Group Seminar

14 Sibin Mohan : Systems Group Seminar
Timing Model ( contd. ) for pairs of instructions, timing effects correspond to the speedup obtained by pipeline overlap between adjacent instructions. When delta( I1…Im ) is not zero, this is due to instruction I1 having some effect that disturbs the execution of the instruction Im ( across sequence I2…Im-1 ) July 4, 2019 Sibin Mohan : Systems Group Seminar

15 Sibin Mohan : Systems Group Seminar
Pipeline Model Single in-order pipeline n pipeline stages Each instruction, i, : sequence of r1i…rni rji corresponds to execution in stage j One instruction per stage All instructions use all stages July 4, 2019 Sibin Mohan : Systems Group Seminar

16 Pipeline Model ( contd. )
Consider execution of I1…Im For instruction Ii, let pji – point when Ii enters stage j and pn+1i is when Ii leaves stage j This can be modeled as: an instruction cannot enter its next stage before current stage is complete The next instruction cannot enter a certain pipeline stage before the current instruction has started its next stage. July 4, 2019 Sibin Mohan : Systems Group Seminar

17 Pipeline Model ( contd. )
Constraints represented as: in the graph, each column in one instruction The weights represent the rij values. Additional dependences between instructions are represented by adding constraints Weighted acyclic graph. July 4, 2019 Sibin Mohan : Systems Group Seminar

18 Pipeline Model ( contd. ) : Branches and Data Dependences
Branch Instructions : Data Dependences : branches – generate dependences between end of stage where branch is decided and fetch of next instruction This shows a branch decided in stage j of instr Ii Data dependences – imply that instr Ii can enter stage j, only after some prev instruction Ik, completed stage l Note : such constraints have meaning ONLY is they connect points that are not transitively connected otherwise - else called irrelevant dependencies. July 4, 2019 Sibin Mohan : Systems Group Seminar

19 Multiple Parallel Pipelines
Not all instructions use all stages Each instruction will have: points corresponding to its entry points showing actual stages used The following functions are used : previ(i,j), nexti(i,j) prev and next instructions using stage j prevs(i,j), nexts(i,j) prev and next stages used by I July 4, 2019 Sibin Mohan : Systems Group Seminar

20 Multiple Parallel Pipelines ( contd. )
Reformulated constraints : Acyclic Graph : July 4, 2019 Sibin Mohan : Systems Group Seminar

21 Sibin Mohan : Systems Group Seminar
Prototype No automatic flow analysis No global low-level analysis Two CPU Models : V580E ARM 9 Two calculation modules : IPET based Path-based Cache Analyse implemented But, not used. Generates WCET estimates. July 4, 2019 Sibin Mohan : Systems Group Seminar

22 Sibin Mohan : Systems Group Seminar
Results Actual number were obtained from the very simulator being used, by providing worst case inputs “no pipeline” – assuming all timing effects are zero “pipeline” – timing effects are used + columns – overestimation as compared to actual cycles Last column – difference in overestimation between no pipe and pipeline numbers July 4, 2019 Sibin Mohan : Systems Group Seminar

23 Sibin Mohan : Systems Group Seminar
Results ( contd. ) load time – time taken to load the program (I.e. read the input files into the tool ) Pipe time – time for timing analysis Sim time – time required to run trace corresponding to the worst-case behaviour in the simulator Conclusion drawn is that the tool set executes in approx. LINEAR time. July 4, 2019 Sibin Mohan : Systems Group Seminar

24 Sibin Mohan : Systems Group Seminar
Conclusions Contributions : Formal mathematical model hardware models low-level timing modeling scheme Timing Analysis method Overall Tool architecture July 4, 2019 Sibin Mohan : Systems Group Seminar

25 Sibin Mohan : Systems Group Seminar
Thank You ! Further Questions ? July 4, 2019 Sibin Mohan : Systems Group Seminar


Download ppt "Processor Pipelines and Static Worst-Case Execution Time Analysis"

Similar presentations


Ads by Google