TRACES Code Padding to Improve the WCET Calculability Christine Rochange and Pascal Sainrat Institut de Recherche en Informatique de Toulouse Toulouse
TRACES WCET evaluation Static WCET analysis § IPET: Implicit Path Enumeration Technique flow analysislow-level analysis WCET computation
TRACES Implicit Path Enumeration Technique A B C E D x A = 1 + x DA = 1 + x AB x B = x AB = x BC + x BE x C = x BC = x CD x D = x CD + x ED = x DA x E = x BE = x ED x BC = x BE x DA ≤ N T = x i.t i max + x ij. ij
TRACES Pipelined execution F FU1 FU2 C FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL B1,B2
TRACES Long Timing Effects (1) F FU1 FU2 C FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL T A-B-C = 7 = 8 +1
TRACES Long Timing Effects (2) t ABC t ABCD t 1…n = t i + j…k i=1 n 1 ≤ j ≤ k ≤n tAtA tBtB tCtC tDtD tEtE AB BC CD DE t AB ABC BCD DEF ABCD BCDE ABCDE J. Engblom
TRACES Motivation Long timing effects are: § difficult to quantify they might span over very long sequences § difficult to integrate into WCET computation Long timing effects increase the variability of execution times Our goal: eliminate long timing effects
TRACES Outline Our approach: code padding Implementation § software framework § analysis algorithms to identify resource requirements to compute safe padding lengths Experimental results Concluding remarks
TRACES Code padding FETCH FU1 FU2 COMPL FETCH FU1 FU2 COMPL filler instruction FETCH FU1 FU2 COMPL
TRACES Exemple (1) inst i 1 inst i 2 … inst i ni inst j 1 inst j 2 … inst j nj inst k 1 … inst k nk block i block j block k requires a 4-cycle delay requires a 3-cycle delay requires a 1-cycle delay
TRACES Exemple (2) inst i 1 inst i 2 … inst i ni inst j 1 inst j 2 … inst j nj inst k 1 … inst k nk block i block j block k nop 4-cycle delay 3-cycle delay 1-cycle delay
TRACES Exemple (3) inst i 1 inst i 2 … inst i ni inst j 1 inst j 2 … inst j nj inst k 1 … inst k nk block i block j block k bl delay4 bl delay3 nop 4-cycle delay 3-cycle delay 1-cycle delay delay4:nop nop delay3: nop nop delay2:blr filler block
TRACES Code padding framework C source code gcc compiler assembly code gas assembler object code CFG extractor cycle-level simulator interference analysis code padding safe padded assembly code list of basic blocks execution traces of block sequences padding lengths
TRACES Analyzing resource requirements (1) Requirements of a basic block foreach block B do { ff[B] first fetch cycle of B; lf[B] last fetch cycle of B + 1; foreach resource R do { n[R] cycle at which R is needed; r[R] cycle at which R is released; // 0 if R not used by B n[R,B] n[R] – ff[B]; r[R,B] r[R] – lf[B]; // 0 if R not used by B } d[B] 0; } FETCH FU1 FU2 COMPL ff[B 2 ] = 1 lf[B 2 ] = 2 n[FU1,B 2 ] = 0 r[FU1,B 2 ] = 0 n[FU2,B 2 ] = 1 r[FU2,B 2 ] = 3
TRACES Analyzing resource requirements (2) Requirements of a sequence foreach sequence B 1 -…-B x (x < n) do { lf[B x ] last fetch cycle of B x + 1; foreach resource R do { r[R] cycle at which R is released; // 0 if R not used by any B i r[R,B 1 -…B x ] r[R] – lf[B x ]; } } FETCH FU1 FU2 COMPL lf[B 2 ] = 3 r[FU1,B 1 -B 2 ] = 2 r[FU2,B 1 -B 2 ] = 3 r[FU1,B 2 ] = 0
TRACES Computing padding lengths (1) Depth-1 strategy § objective: r[R,A-B] == r[R,B] § algorithm: § example: foreach sequence A-B do foreach resource R do if r[R,A-B] ≠ r[R,B] then { d StrictDelay(R,A-B); if d > d[B] then d[B] d; } computes the padding length (iterative trials) r[FU1,B 2 ] = 0 r[FU1,B 1 -B 2 ] = 2 >
TRACES Computing padding lengths (2) Depth-n strategy § analyze (n+1)-block sequences (B 0 -B 1 -…-B n ) § objectives: for i < n : if r[R,B 0 -…-B i ] > n[R,B i+1 ] : r[R,B 0 -…-B i ] == r[R,B 1 -…-B i ] r[R,B 0 -…-B n ] == r[R,B 1 -…-B n ]
TRACES Computing padding lengths (3) Example: depth-4 algorithm foreach sequence A-B-C-D-E do foreach resource R do if (n[R,C] > 0)&& (r[R,A-B] > n[R,C]) && (r[R,A-B] > r[R,B]) then { d MinimumDelay(R,A-B-C); if d > d[B] then d[B] d; } elsif (n[R,D] > 0)&& (r[R,A-B-C] > n[R,D]) && (r[R,A-B-C] > r[R,B-C]) then { d MinimumDelay(R,A-B-C);...
TRACES Experimental results (1) Code size increase 2-way4-way matmul35.24%76.19% ludcmp16.51%28.20% jfdctint11.37%126.97% bsort31.25%76.25% heapsort25.00%51.47% insertsort23.81%59.52% MEAN23.86%69.77% depth-1
TRACES Experimental results (2) WCET increase
TRACES Concluding remarks Inter-block long timing effects make the WCET analysis complex and pessimistic Code padding prevents long timing effects and limit the variability of partial execution times The cost of padding can be acceptable § code size ( 20% for a 2-way pipeline) § real WCET increase ( 20%) future work: cost on the estimated WCET?
TRACES Thank you! Traces stands for Research group on Architectures and Compilers for Embedded Systems