Download presentation
Presentation is loading. Please wait.
Published byNeil Franklin Modified over 9 years ago
1
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel Heiko Falk TU Dortmund, Germany Ulm University, Germany RTAS 2012, Beijing1
2
Timing Analysis RTAS 2012, Beijing2 Hard real time systems require absolute timing guarantees System level analysis Single task analysis Worst case execution time (WCET) analysis An upper bound on execution time for all possible inputs Sound over-approximation is obtained by static analysis
3
WCET Analysis RTAS 2012, Beijing3 Program Micro-architectural modeling Control flow graph WCET of basic blocks constraints Infeasible path constraints Loop bound Path analysis IPET IPET = Implicit Path Enumeration Technique
4
Architecture RTAS 2012, Beijing4 Core 1Core n L1 cache Shared L2 cache Memory Shared bus
5
Micro-architectural Modeling RTAS 2012, Beijing5 pipelinecache branch predictor Single Core Interactions shared cache shared bus Multi Core Rosen et. al RTSS’07 Li et. al RTSS’09 Chattopadhyay et. al SCOPES’10 Kelter et. al ECRTS’11 Unified Multi-core timing analysis
6
Timing Anomaly (shared Cache) RTAS 2012, Beijing6 hitmiss hit miss hit miss hit May not be the worst case path
7
Timing Anomaly (Shared Bus) RTAS 2012, Beijing7 delay min delay max delay min May not be the worst case path
8
Background RTAS 2012, Beijing8 Representing each pipeline stage as a timing interval IF ID EX WB CM Structural dependency R1 := R2 + 5 R5 := R1 * R7 R3 := R5 * 5 Contention A fixed-point analysis derives the timing of each stage as an interval [3,7][4,10] startfinish latency [1,3]
9
Shared Cache + Pipeline RTAS 2012, Beijing9 L1 L2 Abstract interpretation – hit, miss or unclear Timing interval T := T + [1, 1] T := T + [ miss 1 + 1, miss 1 + 1] T := T + [miss 1 + 1, miss 1 + miss 2 + 1] T := T + [1, miss 1 + miss 2 + 1] hit unclearmiss unclear hit latency = 1 cycle miss 1 L1 cache miss penalty miss 2 L2 cache miss penalty (shared)
10
Shared Bus Analysis RTAS 2012, Beijing10 Time Division Multiple Access (TDMA) Offset abstraction Core 0Core 1Core 0Core 1 Core 0Core 1Core 0Core 1 T (core 1) offset round offsetdelay T’ (core 0) delay = 0
11
Shared bus + pipeline RTAS 2012, Beijing11 IF3 IF1ID1 ID3 O1O1 O2O2 O in ID1 IF2 O in = O 1 IF2 ID1 O in = O 2 IF2 ID1 O in = O 1 U O 2 (approximate timing by static analysis) IF2 finishes after ID1ID1 finishes after IF2 Property: Offset content monotonically decreases over different iterations IF2ID2
12
Loop Construct RTAS 2012, Beijing12 C1C1 C2C2 C3C3 C 100 Unrolling loop iterations EXPENSIVE …… Bus contexts C i = bus context of the loop body at i-th iteration
13
Loop Construct RTAS 2012, Beijing13 Bus context flow graph C1C1 C2C2 C3C3 C4C4 C 5 C 3 C5C5 Property: If C i C j, then C i+k C j+k for any k > 0 How do we define bus context?
14
Loop Construct RTAS 2012, Beijing14 How do we define bus context? Bus offsets of all pipeline stages of all instructions? There could be thousands of nodes C1C1 C2C2 C3C3 C4C4 Bus context flow graph
15
Loop Construct RTAS 2012, Beijing 15 How do we define bus context? IF ID EX WB CM previous iteration current iteration Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change
16
Loop Construct RTAS 2012, Beijing16 C1C1 C2C2 C3C3 C4C4 Compute WCET for each bus context Generate ILP flow constraints: E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound E(C1) ≥ E(C2) E(C1) = number of times context C1 is executed Bus context flow graph
17
Branch prediction + Cache RTAS 2012, Beijing17 m’ m m Cache conflict Cache hit branch correctly predicted branch incorrectly predicted m evicted from cache Cache miss
18
Branch prediction + Cache RTAS 2012, Beijing18 m’ m m Branch location Maximum number of speculated instructions JOIN Unclear cache access Cache content Cache content
19
Overall Picture RTAS 2012, Beijing19 pipelinecache branch predictor shared cache shared bus Multi Core WCET of basic blocks constraints Infeasible path constrain s Loop bound Path analysis IPET Bus context constraints
20
Experimental Setup (Chronos Toolkit) RTAS 2012, Beijing20 C source GCC simplescalar Binary codeCFG Micro architectural modeling Private cache pipelineBranch prediction Micro-architectural constraints ILP Flow constraints WCET Shared cacheShared bus
21
Cache Sharing vs Cache Partitioning RTAS 2012, Beijing21 8 4 Shared Cache between 2 cores 8 4 Core 1Core 2 Vertically partition 8 Core 1 Core 2 Horizontally partition 4
22
Evaluation (cache + pipeline) RTAS 2012, Beijing22 jfdctint statemate Imprecision of shared cache analysis
23
Evaluation (Cache + pipeline + Speculation) RTAS 2012, Beijing23 Imprecision of modeling speculation
24
Evaluation (Bus + pipeline) RTAS 2012, Beijing24 Imprecision of shared bus analysis Imprecision of path analysis
25
Evaluation (Bus + pipeline + Speculation) RTAS 2012, Beijing25 Imprecision of shared bus analysis Imprecision of path analysis
26
Conclusion RTAS 2012, Beijing26 A unified WCET analysis framework Handles interaction of shared cache and bus with pipeline and branch prediction Timing anomaly is possible, state explosion is handled by timing interval abstraction Detailed information of the tool and extensive results are available at: http://www.comp.nus.edu.sg/~rpembed/chronos-multi-core.html
27
RTAS 2012, Beijing27 Questions Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.