TRACES Code Padding to Improve the WCET Calculability Christine Rochange and Pascal Sainrat Institut de Recherche en Informatique de Toulouse Toulouse.

Slides:



Advertisements
Similar presentations
Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University.
Advertisements

Approximation of the Worst-Case Execution Time Using Structural Analysis Matteo Corti and Thomas Gross Zürich.
Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Static Bus Schedule aware Scratchpad Allocation in Multiprocessors Sudipta Chattopadhyay Abhik Roychoudhury National University of Singapore.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Frequent Closed Pattern Search By Row and Feature Enumeration
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
1 IIES 2008 Thomas Heinz (Saarland University, CR/AEA3) | 22/03/2008 | © Robert Bosch GmbH All rights reserved, also regarding any disposal, exploitation,
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Constraint Programming for Compiler Optimization March 2006.
Eliminating Stack Overflow by Abstract Interpretation John Regehr Alastair Reid Kirk Webb University of Utah.
Cpeg421-08S/final-review1 Course Review Tom St. John.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,
1 ReCPU:a Parallel and Pipelined Architecture for Regular Expression Matching Department of Computer Science and Information Engineering National Cheng.
University of Michigan Electrical Engineering and Computer Science FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Multiscalar processors
Enhancing Embedded Processors with Specific Instruction Set Extensions for Network Applications A. Chormoviti, N. Vassiliadis, G. Theodoridis, S. Nikolaidis.
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Generic Software Pipelining at the Assembly Level Markus Pister
Design Space Exploration
Evaluation of the Gini-index for Studying Branch Prediction Features Veerle Desmet Lieven Eeckhout Koen De Bosschere.
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
ParaScale : Exploiting Parametric Timing Analysis for Real-Time Schedulers and Dynamic Voltage Scaling Sibin Mohan 1 Frank Mueller 1,William Hawkins 2,
Instruction-Level Parallelism for Low-Power Embedded Processors January 23, 2001 Presented By Anup Gangwar.
1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.
Reporter :PCLee The decisions on when to acquire debug data during post-silicon validation are determined by trigger events that are programmed.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
Florida State University Automatic Tuning of Libraries and Applications, LACSI 2006 In Search of Near-Optimal Optimization Phase Orderings Prasad A. Kulkarni.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta.
Power Estimation and Optimization for SoC Design
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Bounding Worst-Case Data Cache Behavior by Analytically.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Programming Languages
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
Optimal Superblock Scheduling Using Enumeration Ghassan Shobaki, CS Dept. Kent Wilken, ECE Dept. University of California, Davis
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Real-time aspects Bernhard Weirich Real-time Systems Real-time systems need to accomplish their task s before the deadline. – Hard real-time:
CS 3220: Compilation Techniques for Parallel Systems Spring Pitt CS
CSC 3130: Automata theory and formal languages Andrej Bogdanov The Chinese University of Hong Kong Normal forms.
RISC / CISC Architecture by Derek Ng. Overview CISC Architecture RISC Architecture  Pipelining RISC vs CISC.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Operation Tables for Scheduling in the presence of Partial Bypassing Aviral Shrivastava 1 Eugene Earlie 2 Nikil Dutt 1 Alex Nicolau 1 1 Center For Embedded.
Instruction Packing for a 32-bit Stack-Based Processor Witcharat Lertteerawattana and Prabhas Chongstitvatana Department of Computer Engineering Chulalongkorn.
Improving Program Efficiency by Packing Instructions Into Registers
CSCI1600: Embedded and Real Time Software
Topic 10: Dataflow Analysis
Sridhar Rajagopal and Joseph R. Cavallaro Rice University
Evaluation and Validation
Design and Analysis of Multi-Factored Experiments
Estimating Timing Profiles for Simulation of Embedded Systems
Fractional Factorial Design
Suhas Chakravarty, Zhuoran Zhao, Andreas Gerstlauer
In Search of Near-Optimal Optimization Phase Orderings
8 Code Generation Topics A simple code generator algorithm
How to improve (decrease) CPI
Normal forms and parsing
Design matrix Run A B C D E
CSCI1600: Embedded and Real Time Software
A Series of Slides in 5 Parts Movement 3. IDFS
Presentation transcript:

TRACES Code Padding to Improve the WCET Calculability Christine Rochange and Pascal Sainrat Institut de Recherche en Informatique de Toulouse Toulouse

TRACES WCET evaluation  Static WCET analysis § IPET: Implicit Path Enumeration Technique flow analysislow-level analysis WCET computation

TRACES Implicit Path Enumeration Technique A B C E D x A = 1 + x DA = 1 + x AB x B = x AB = x BC + x BE x C = x BC = x CD x D = x CD + x ED = x DA x E = x BE = x ED x BC = x BE x DA ≤ N T =  x i.t i max +  x ij.  ij

TRACES Pipelined execution F FU1 FU2 C FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL  B1,B2

TRACES Long Timing Effects (1) F FU1 FU2 C FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL T A-B-C = 7  = 8 +1

TRACES Long Timing Effects (2) t ABC t ABCD t 1…n =  t i +   j…k i=1 n 1 ≤ j ≤ k ≤n tAtA tBtB tCtC tDtD tEtE  AB  BC  CD  DE t AB  ABC  BCD  DEF  ABCD  BCDE  ABCDE J. Engblom

TRACES Motivation  Long timing effects are: § difficult to quantify  they might span over very long sequences § difficult to integrate into WCET computation  Long timing effects increase the variability of execution times  Our goal: eliminate long timing effects

TRACES Outline  Our approach: code padding  Implementation § software framework § analysis algorithms  to identify resource requirements  to compute safe padding lengths  Experimental results  Concluding remarks

TRACES Code padding FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL filler instruction FETCH FU1 FU2 COMPL

TRACES Exemple (1) inst i 1 inst i 2 … inst i ni inst j 1 inst j 2 … inst j nj inst k 1 … inst k nk block i block j block k requires a 4-cycle delay requires a 3-cycle delay requires a 1-cycle delay

TRACES Exemple (2) inst i 1 inst i 2 … inst i ni inst j 1 inst j 2 … inst j nj inst k 1 … inst k nk block i block j block k nop 4-cycle delay 3-cycle delay 1-cycle delay

TRACES Exemple (3) inst i 1 inst i 2 … inst i ni inst j 1 inst j 2 … inst j nj inst k 1 … inst k nk block i block j block k bl delay4 bl delay3 nop 4-cycle delay 3-cycle delay 1-cycle delay delay4:nop nop delay3: nop nop delay2:blr filler block

TRACES Code padding framework C source code gcc compiler assembly code gas assembler object code CFG extractor cycle-level simulator interference analysis code padding safe padded assembly code list of basic blocks execution traces of block sequences padding lengths

TRACES Analyzing resource requirements (1)  Requirements of a basic block foreach block B do { ff[B]  first fetch cycle of B; lf[B]  last fetch cycle of B + 1; foreach resource R do { n[R]  cycle at which R is needed; r[R]  cycle at which R is released; // 0 if R not used by B n[R,B]  n[R] – ff[B]; r[R,B]  r[R] – lf[B]; // 0 if R not used by B } d[B]  0; } FETCH FU1 FU2 COMPL ff[B 2 ] = 1 lf[B 2 ] = 2 n[FU1,B 2 ] = 0 r[FU1,B 2 ] = 0 n[FU2,B 2 ] = 1 r[FU2,B 2 ] = 3

TRACES Analyzing resource requirements (2)  Requirements of a sequence foreach sequence B 1 -…-B x (x < n) do { lf[B x ]  last fetch cycle of B x + 1; foreach resource R do { r[R]  cycle at which R is released; // 0 if R not used by any B i r[R,B 1 -…B x ]  r[R] – lf[B x ]; } } FETCH FU1 FU2 COMPL lf[B 2 ] = 3 r[FU1,B 1 -B 2 ] = 2 r[FU2,B 1 -B 2 ] = 3 r[FU1,B 2 ] = 0

TRACES Computing padding lengths (1)  Depth-1 strategy § objective: r[R,A-B] == r[R,B] § algorithm: § example: foreach sequence A-B do foreach resource R do if r[R,A-B] ≠ r[R,B] then { d  StrictDelay(R,A-B); if d > d[B] then d[B]  d; } computes the padding length (iterative trials) r[FU1,B 2 ] = 0 r[FU1,B 1 -B 2 ] = 2 >

TRACES Computing padding lengths (2)  Depth-n strategy § analyze (n+1)-block sequences (B 0 -B 1 -…-B n ) § objectives:  for i < n : if r[R,B 0 -…-B i ] > n[R,B i+1 ] : r[R,B 0 -…-B i ] == r[R,B 1 -…-B i ]  r[R,B 0 -…-B n ] == r[R,B 1 -…-B n ]

TRACES Computing padding lengths (3)  Example: depth-4 algorithm foreach sequence A-B-C-D-E do foreach resource R do if (n[R,C] > 0)&& (r[R,A-B] > n[R,C]) && (r[R,A-B] > r[R,B]) then { d  MinimumDelay(R,A-B-C); if d > d[B] then d[B]  d; } elsif (n[R,D] > 0)&& (r[R,A-B-C] > n[R,D]) && (r[R,A-B-C] > r[R,B-C]) then { d  MinimumDelay(R,A-B-C);...

TRACES Experimental results (1)  Code size increase 2-way4-way matmul35.24%76.19% ludcmp16.51%28.20% jfdctint11.37%126.97% bsort31.25%76.25% heapsort25.00%51.47% insertsort23.81%59.52% MEAN23.86%69.77% depth-1

TRACES Experimental results (2)  WCET increase

TRACES Concluding remarks  Inter-block long timing effects make the WCET analysis complex and pessimistic  Code padding prevents long timing effects and limit the variability of partial execution times  The cost of padding can be acceptable § code size (  20% for a 2-way pipeline) § real WCET increase (  20%)  future work: cost on the estimated WCET?

TRACES Thank you! Traces stands for Research group on Architectures and Compilers for Embedded Systems