Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Parallelism & Locality Optimization.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Instruction-Level Parallelism compiler techniques and branch prediction prepared and Instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University March.
Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW Advanced Computer Architecture COE 501.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
T IME -P REDICTABLE E XECUTION OF E MBEDDED S OFTWARE ON M ULTI - CORE P LATFORMS Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury 1.
Dynamic Branch PredictionCS510 Computer ArchitecturesLecture Lecture 10 Dynamic Branch Prediction, Superscalar, VLIW, and Software Pipelining.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Computer Organization and Architecture (AT70.01) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: Based.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
Dynamic Branch Prediction
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
8 Processing of control transfer instructions TECH Computer Science 8.1 Introduction 8.2 Basic approaches to branch handling 8.3 Delayed branching 8.4.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
CPE 731 Advanced Computer Architecture ILP: Part II – Branch Prediction Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
CPE 631: Branch Prediction Electrical and Computer Engineering University of Alabama in Huntsville Aleksandar Milenkovic,
W04S1 COMP s1 Seminar 4: Branch Prediction Slides due to David A. Patterson, 2001.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed., Oct. 8, 2003 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon., Oct. 7, 2002 Topic: Instruction-Level Parallelism (Dynamic Branch Prediction)
EECC551 - Shaaban #1 lec # 5 Fall Reduction of Control Hazards (Branch) Stalls with Dynamic Branch Prediction So far we have dealt with.
EENG449b/Savvides Lec /17/04 February 17, 2004 Prof. Andreas Savvides Spring EENG 449bG/CPSC 439bG.
Chapter 2 Instruction-Level Parallelism and Its Exploitation
Branch Target Buffers BPB: Tag + Prediction
Multiscalar processors
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Branch Prediction Dimitris Karteris Rafael Pasvantidιs.
1 COMP 740: Computer Architecture and Implementation Montek Singh Thu, Feb 19, 2009 Topic: Instruction-Level Parallelism III (Dynamic Branch Prediction)
Dynamic Branch Prediction
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
5-Stage Pipelining Fetch Instruction (FI) Fetch Operand (FO) Decode Instruction (DI) Write Operand (WO) Execution Instruction (EI) S3S3 S4S4 S1S1 S2S2.
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
1 Estimating the Worst-Case Energy Consumption of Embedded Software Ramkumar Jayaseelan Tulika Mitra Xianfeng Li School of Computing National University.
1 Dynamic Branch Prediction. 2 Why do we want to predict branches? MIPS based pipeline – 1 instruction issued per cycle, branch hazard of 1 cycle. –Delayed.
Branch.1 10/14 Branch Prediction Static, Dynamic Branch prediction techniques.
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Superscalar - summary Superscalar machines have multiple functional units (FUs) eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x load/store Requires complex.
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)
Branch Prediction Perspectives Using Machine Learning Veerle Desmet Ghent University.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Dynamic Branch Prediction
Worst-case Execution Time (WCET) Estimation
CS203 – Advanced Computer Architecture
Computer Architecture Advanced Branch Prediction
Flow Path Model of Superscalars
CSCI1600: Embedded and Real Time Software
Lecture 8: ILP and Speculation Contd. Chapter 2, Sections 2. 6, 2
So far we have dealt with control hazards in instruction pipelines by:
CPE 631: Branch Prediction
Dynamic Branch Prediction
So far we have dealt with control hazards in instruction pipelines by:
Lecture 10: Branch Prediction and Instruction Delivery
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
Adapted from the slides of Prof
Dynamic Hardware Prediction
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
So far we have dealt with control hazards in instruction pipelines by:
CSCI1600: Embedded and Real Time Software
CPE 631 Lecture 12: Branch Prediction
Presentation transcript:

Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of Singapore

2ISSS'02 Why Timing Analysis? Timing guarantees for real time embedded sys Timing guarantees for real time embedded sys Real time scheduling: Worst case bound on execution time so that tasks are guaranteed to be schedulable irrespective of inputs Real time scheduling: Worst case bound on execution time so that tasks are guaranteed to be schedulable irrespective of inputs Tight bound to avoid idle processor cycles Tight bound to avoid idle processor cycles Extremely important for safety critical systems Extremely important for safety critical systems

3ISSS'02 Worst Case Execution Time (WCET) Given a program and a micro-architecture, estimate the maximum execution time of the program on the micro-architecture for all possible inputs Given a program and a micro-architecture, estimate the maximum execution time of the program on the micro-architecture for all possible inputs Program path analysis [Shaw’89, Healy’98,..] Program path analysis [Shaw’89, Healy’98,..] All possible paths in control flow graph are not feasible All possible paths in control flow graph are not feasible Micro-architectural modeling Micro-architectural modeling Dynamically variable instruction execution time due to Dynamically variable instruction execution time due to Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..] Cache, Pipeline [Li’99, Theiling’00, Schneider’99,..] Speculative execution (branch prediction) Speculative execution (branch prediction)

4ISSS'02 Speculative Execution No Speculative Execution No Speculative Execution Misprediction Misprediction Correct prediction Correct prediction b NT S Misprediction penalty

5ISSS'02 Impact of Speculative Execution Example: Insertion sort of 100 elements Example: Insertion sort of 100 elements Worst case path without speculation for input Worst case path without speculation for input Worst case path with speculation for input Worst case path with speculation for input Branch misprediction penalty can alter worst case execution path

6ISSS'02 Branch Prediction Schemes SchemeFeature WCET Work Static Static assignment of prediction per branch Chen et. al DynamicLocal Predict based on outcome history of this branch only Colin & Puat 2000 Global [PowerPC, MIPS, AMD, Alpha] Predict based on outcome history of neighboring branches 

7ISSS'02 Global Branch Prediction outcome (B3) = 1 if outcome (B1 B2) = {01, 10, 11} Stores the outcomes of last n branches in a shift register, called Branch History Register (BHR) Index into the prediction table using BHR Prediction table stores the last outcome corresponding to that history b = 0 B1: if (a == 1) b = 1; B2: if (a == 2) b = 1; B3: if (b == 1) BHR Prediction Table

8ISSS'02 Framework for Branch Prediction Prediction Scheme Index Local Branch address Global: GAg BHR Global: gshare BHR  Branch address Global: gselect {Branch address, BHR} Prediction schemes differ in terms of the index into the prediction table

9ISSS'02 Modeling Difficulty Dynamic mapping: A branch can map to different entries in the prediction table Dynamic mapping: A branch can map to different entries in the prediction table Aliasing: Different branches mapping to the same prediction table entry Aliasing: Different branches mapping to the same prediction table entry Constructive/destructive Conflict Constructive/destructive Conflict Conflicting branches with same/different outcomes Conflicting branches with same/different outcomes A single branch with same/different outcomes A single branch with same/different outcomes

10ISSS'02 Our Technique: ILP Formulation Obtain linear constraints on total misprediction count for all possible inputs Obtain linear constraints on total misprediction count for all possible inputs Input: Control Flow Graph of the program Input: Control Flow Graph of the program Objective function: Objective function: WCET =  cost B  count B + penalty  misprediction B

11ISSS'02 Flow Constraints: Easy !! c s = c e = 1 e s,1 = 1 e 2,e + e 1,e = 1 e s,1 + e 2,1 = e 1,2 + e 1,e = c 1 e 1,2 = e 2,e + e 2,1 = c 2 Loop bound: e 2,1 <= 100 start blk 1 blk 2 end U Inflow = Basic Block Execution Count = Outflow Bound on maximum loop iterations

12ISSS'02 Modeling Difficulty 1: Dynamic Mapping Identify possible patterns for each branch Identify possible patterns for each branch Static analysis of CFG for all possible patterns  in branch history register (BHR) at node i Static analysis of CFG for all possible patterns  in branch history register (BHR) at node i c i , e i, j  #exec. of node i, edge e i, j with BHR =  c i , e i, j  #exec. of node i, edge e i, j with BHR =  m i  #mispred. of node i with BHR =  m i  #mispred. of node i with BHR =  m i =  m i  c i =  c i  e i,j =  e i,j  e i,j =  e i,j  m i   c i  m i   c i 

13ISSS'02 Modeling Difficulty 1: Dynamic Mapping Model flow of pattern among nodes and edges Model flow of pattern among nodes and edges start blk 1 blk 2 end U inflow:c 1 01 = e 2, e 2,1 10 inflow:c 1 01 = e 2, e 2,1 10 outflow:c 1 01 = e 1, e 1,e 01 outflow:c 1 01 = e 1, e 1,e 01

14ISSS'02 Modeling Difficulty 2: Aliasing Variable : Number of times Variable p  i  j : Number of times  occurs at node i followed by another occurrence at j  occurs at node i followed by another occurrence at j  does not appear in the intermediate nodes  does not appear in the intermediate nodes i j i j    

15ISSS'02 Modeling Difficulty 2: Aliasing s e p 3  6  p 8  3  p 3  8  p 6  8  p 6  e  p s  3  ∑ j p  j  i = ∑ j p  i  j = c i 

16ISSS'02 Modeling Difficulty 3: Conflict Case 1: Branch of block i with  is taken Case 1: Branch of block i with  is taken Misprediction less than its total outflow under history  & outcome of branch i taken Misprediction less than its total outflow under history  & outcome of branch i taken  ∑ j ,1  ∑ j p ,1 i  j Misprediction less than its total inflow under history  & last outcome non-taken Misprediction less than its total inflow under history  & last outcome non-taken  ∑ j ,0  ∑ j p ,0 j  i Case 2: Branch of block i with  is Case 2: Branch of block i with  is non-taken non-taken i ∑ j ,1 ∑ j p ,1 i  j ∑ j ,0 ∑ j p ,0 j  i

17ISSS'02 Benchmarks ProgramDescription check Negative number search of 100-element array matsum Summation of two 100 x 100 matrices matmul Multiplication of two 10 x 10 matrices fft 1024-point Fast Fourier Transform fdct Fast Discrete Cosine Transform isort Insertion sort of 100-element array bsearch Binary search of 100-element array eqntott Drawn from SPEC’92 integer benchmarks dhry Dhrystone benchmark

18ISSS'02 Modeling Accuracy Program WCETMisprediction Obs.Est.RatioObs.Est. check matsum101,417101, matmult14,73214, fft213,052223, ,1106,865 fdct2,4932, isort74,22574, ,6879,954 bsearch eqntott2,3112, dhry122,026124, ,2072,812

19ISSS'02 Summary Modeling dynamic control speculation for timing analysis of embedded code Modeling dynamic control speculation for timing analysis of embedded code Unified parameterized framework that can be instantiated for various prediction schemes Unified parameterized framework that can be instantiated for various prediction schemes Tight execution time bound for benchmark programs under various prediction schemes and prediction table sizes Tight execution time bound for benchmark programs under various prediction schemes and prediction table sizes