Download presentation
Presentation is loading. Please wait.
1
Accurate Timing Analysis by Modeling Caches, Speculation and their Interaction
Xianfeng Li Tulika Mitra Abhik Roychoudhury National University of Singapore
2
Why Timing Analysis? Timing guarantees for real time embedded system
Real time scheduling: Worst case bound on execution time Tasks are guaranteed to be schedulable irrespective of inputs Tight bound to avoid idle processor cycles Extremely important for safety critical systems
3
Worst Case Execution Time (WCET)
Maximum execution time of a program on a micro-architecture for all possible inputs Measurement Execute program for all inputs: impractical Execute program for selected inputs to get a lower bound on WCET (Observed WCET) Analysis Employ static analysis to compute an upper bound on WCET (Estimated WCET) Estimated Actual Observed
4
WCET Analysis Program path analysis [Shaw’89, Healy’98,..]
All possible paths in program are not feasible Micro-architectural modeling Dynamically variable instruction execution time Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..] Speculative execution (branch prediction) [Mitra’02] Combined modeling of cache + speculative execution
5
Speculative Execution
No Speculative Execution Misprediction Correct prediction B N T S Misprediction penalty
6
Cache + Speculation: Destructive Effect
Execution B N T S Cache Miss 1: Loading into cache from speculated path & N T map to same cache block Cache Miss 2: Loading into cache from correct path
7
Destructive Effect: Extra Cache Misses
Cache miss penalty (CMP) along speculative path Fully masked by branch misprediction penalty (BMP) Partially masked by BMP wait for cache miss to be serviced before executing correct path Cache miss penalty along correct path due to fetch along speculative path BMP BMP CMP CMP
8
Cache + Speculation: Constructive Effect
Execution B N S Cache Miss 1: Loading into cache from speculated path & B S map to same cache block Cache Hit: Correct block already loaded into cache
9
How serious is the effect?
10
Technique: Integer Linear Programming
Integrate program analysis and micro-architectural modeling in an ILP framework [Li and Malik 1995] Input: Control Flow Graph (CFG) of the program User provided loop bounds, recursion depth etc. Specification of micro-architecture Objective function: Execution time (maximized) Constraints Flow constraints from Control Flow Graph Constraints from micro-architectural modeling ILP formulation of instruction cache + speculative exec.
11
Objective Function CMP x missB : Penalty due to cache misses
WCET = (costB × countB + BMP x mispredictionB + CMP x missB + mp_delayB) costB × countB : Execution time of basic block B without cache miss and branch misprediction BMP x mispredictionB: Penalty due to mispredictions CMP x missB : Penalty due to cache misses Includes constructive and destructive effect of speculation along correct path mp_delayB : Penalty due to partially masked cache misses along speculative path (variable CMP)
12
Flow Constraints: Easy !!
Bounds countB Inflow = Basic Block Execution Count = Outflow Bound on maximum loop iterations es,1 + e3,1 = count1 = e1,2 + e1,4 e1,2 + e2,2 = count2 = e2,3 + e2,2 e2,3 + e4_3 = count3 = e3,1 + e3,E e1_4 = count4 = e4,3 Loop bounds: e2,2 100 e3,1 10 B1 B2 B4 B3
13
Other Constraints Branch misprediction constraints
Bounds mispredictionsB Details appeared in an earlier paper Timing Analysis of Embedded Software for Speculative Processors. T. Mitra, A. Roychoudhury and X. Li. In ACM Intl. Symposium on System Synthesis (ISSS) 2002 Instruction cache miss constraints Bounds missB [Li, Malik and Wolfe 1999]
14
Modeling Cache-Speculation Interaction
Modify instruction cache miss constraints to model constructive/destructive effect of speculation along correct path Add additional constraints on mp_delayB : Penalty due to partially masked cache misses along speculative path
15
Modeling Instruction Cache
B1 pS_1 p1_3 B1 B3 B2 B4 p3_1 p3_E E B3 Cache Conflict Graph Flow among blocks mapping to the same cache line pS_1 + p3_1 = count1 = p1_3 miss1 = pS_1 + p3_1
16
Constructive Effect of Speculation
B1 Miss T N B1 B3 T B2 B4 Miss N T B3 (2,T) B3 Partially Masked CMP N Speculative Path Correct Path
17
Constructive Effect of Speculation
B1 Miss T N B1 B3 T B2 B4 Miss Hit N T B3 (2,T) B3 Partially Masked CMP N Speculative Path Correct Path miss3 will decrease by the amount of flow between B3 (2,T) and B3
18
Destructive Effect of Speculation
B1 T N B2 B4 T B2 B4 Hit Miss N T B4 (1,N) B3 Partially Masked CMP Miss N Speculative Path Correct Path miss2 will increase by the amount of flow between B4 (1,N) and B2
19
General Flow Involving Extra Nodes
b b n X X X Case 1 X m (b,X) b1 m (b,X) n1 Case 2 Case 4 Y Y Case 3 Case 2 m2 (b1,Y) m1 (b,X)
20
Additional Constraints
b X X B1 B2 CMP > BMP BMP Bn i-1 count (mi(b,X)) = misprediction(b, X) - miss (mk(b,X)) k=1 n mp_delay (b, X) = miss (mk(b,X)) × delay (mk(b,X)) k=1 i-1 delay (mi(b,X)) = CMP – (BMP - cost (mk(b, X)) k=1 And some others ….
21
Benchmarks Program Description Paths Loops matsum
Summation of two 100 * 100 matrices S matmult Multiplication of two 10 * 10 matrices isort Insertion sort of 100-element array bsearch Binary search of 100 element array fft 1024-point Fast Fourier Transform fdct Fast Discrete Cosine Transform dhry Dhrystone benchmark des Data Encryption Standard whet Whetstone benchmark djpg Decompress 128 * 96 color JPG image
22
Experimental Methodology
Observed WCET: simulation SimpleScalar cycle-accurate architectural simulator In-order exec, No pipeline, No Data Cache misses Branch misprediction penalty = 5 cycles Cache miss penalty = 10 cycles Estimated WCET: Prototype analyzer Input: benchmark in assembly code, -arch parameters, loop bounds Output: ILP constraints Feed the constraints to CPLEX: a commercial ILP solver
23
Accuracy (Smaller Benchmarks)
Program WCET Ratio Misprediction Est/Obs Cache miss Obs Est matsum 105K 106K 1.00 1.33 matmult 25.1K 25.6K 1.02 1.05 1.03 isort 48.6K 48.8K bsearch 506 546 1.07 1.25 1.06 fft 8798 8803 fdct 219K 229K 1.04 1.66 1.19
24
Accuracy (Larger Benchmarks)
Program WCET Ratio Misprediction Est/Obs Cache miss Obs Est dhry 218.6K 232.5K 1.06 0.96 1.18 des 87.4K 96.4K 1.10 2.54 1.07 whet 545.5K 581.5K 2.81 1.29 djpg 44.9 M 65.2 M 1.44 3.25 1.37
25
Scalability
26
Summary Micro-architectural modeling is crucial for tight estimation of Worst Case Execution Time (WCET) Existing methods typically focus on a single micro- architectural feature Cache Pipeline Speculation A step towards combining micro-architectural features which effect each other Cache misses/hits due to speculation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.