Optimizations for Faster Simulation of Esterel Programs Dumitru POTOP-BUTUCARU Advisers: Gérard Berry – Esterel Technologies Robert de Simone – INRIA, project TICK PhD Thesis Defense, 26 November 2002, Agelonde, France
Part 1: Why? –Background –Motivation Part 2: How? –Presentation of the work –Results and conclusion
Two compilation trends 1.Semantic completeness Formal semantics (Esterel v5) Formal models (automata, circuits) Formal analysis and optimization methods Efficiency issues (do not scale up well) 2.Efficient simulation Custom intermediate formats Scale up well Semantic issues
Structural imperative style Why? Because of Esterel properties loop [ await A;emit B || await B ]; emit O; halt every R if(BOOT){A_active=1;B_active=1;} else { if(R){A_active=1;B_active=1;} else if(A_active|B_active) { if(A_active) if(A) {A_active=0;B=1;} if(B_active) if(B) B_active=0; if(!(A_active|B_active)) O=1; } –Esterel source = control-flow specification well-structured code control-flow optimizations –But…
Why? Because of Esterel properties Constructive causality –Correct causality cycles –Instantaneous reaction to signal absence (analysis of not yet executed code) –Solution: Translate into a formal mathematical model –But: Loss of efficiency signal S,T in emit S; present T then present S else emit T end end; end causality cycle break the cycle
Explicit FSMCircuits Very large, Very fast Small Slow Bisimulation (fc2tools) RTL optimizations (SIS) Expensive, slow General Cheap, fast General* Efficient code Very small Very fast Classical control-flow optimizations Cheap, fast Only “acyclic” programs *=sccausal or slow simulation Semantically complete Generated code (without optim.) Optimization Problems Compiling method Do not scale up well Semantics (acyclic=?) Less powerful optim. Current methods ? Intermediate model
What we want Generate efficient code for “good” programs Generate code for all programs Understand cyclicity at a higher level Inexpensive optimizations based on static analysis Formalize the efficient approach –New intermediate format/model (GRC) Hierarchical state representation Control-flow graph No specific encoding Means
Part II - Outline The GRC format –Definition (small example) –Code generation for “acyclic” GRC specifications State encoding Scheduling Static analysis for optimizations Cyclic specifications –What “cyclic” means? Implementation and benchmarks Conclusion
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 enter 6 exit 3 K0 Inactive[4] R exit 2 4 A exit 4 K0[4] K1[4] Inactive[5] 5 B exit 5 K0[5] K1[5] [6] K1 emit O # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R GRC format – a small example emit B
loop [ await A;emit B || await B ]; emit O; halt every R # || boot: await A await B halt loop-every # GRC format – a small example selection tree = parallel/exclusive abstraction of the syntax tree The nodes represent the activation of various statements
» loop [ await A;emit B || await B ]; emit O; halt every R # || boot: await A await B halt loop-every # Initial state GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R # || boot: await A await B halt loop-every # After the first reaction – waiting for A and B GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R # || boot: await A await B halt loop-every # B has been received. Waiting for A GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R # || boot: await A await B halt loop-every # A has been received. Halted GRC format – a small example
loop [ await A;emit B || await B ]; emit O; halt every R # || boot: await A await B halt loop-every # Program reset after R has been received GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 enter 6 exit 3 K0 Inactive[4] R exit 2 4 A exit 4 K0[4] K1[4] Inactive[5] 5 B exit 5 K0[5] K1[5] [6] K1 emit O # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
[2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 R exit 2 4 A exit 4 5 B exit 5 [6] # || boot: await A await B halt loop-every # loop [ await A;emit B || await B ]; emit O; halt every R R absent, A present enter 6 exit 3 K0 Inactive[4] K0[4] K1[4] Inactive[5] K0[5] K1[5] K1 emit O emit B GRC format – a small example
Code generation – acyclic case “Good programs” => acyclic GRC flowgraphs Code generation for acyclic specifications –State encoding Software-specific Bitwise Hierarchic –Static scheduling Respects the causality
– boot instant – « await A » active, « await B » completed – « await A » active, « await B » active – « halt » active – program terminated # || boot: # await A await B loop-every XXX XX 0XXXX State encoding Bit index: States: halt Code generation – acyclic case
# || boot: # await A await B loop-every State encoding halt [2] [3] 0 [1] enter 5 enter 4 enter 3enter 2exit 1 2 enter 6 exit 3 K0 Inactive[4] R exit 2 4 A exit 4 K0[4] K1[4] Inactive[5] 5 B exit 5 K0[5] K1[5] [6] K1 emit O Code generation – acyclic case emit B
# || boot: # await A await B loop-every State encoding halt enter 5 enter 4 enter 3enter 2exit 1 enter 6 exit 3 K0 Inactive[4] R exit 2 4 A exit 4 K0[4] K1[4] Inactive[5] 5 B exit 5 K0[5] K1[5] K1 emit O S[1] S[2] Code generation – acyclic case emit B
# || boot: # await A await B loop-every State encoding halt enter 5 enter 4 enter 3enter 2exit 1 enter 6 exit 3 K0 Inactive[4] R exit 2 A exit 4 K0[4] K1[4] Inactive[5] B exit 5 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] Code generation – acyclic case emit B
# || boot: # await A await B loop-every State encoding halt S[4]=1 S[3]=1 S[2]=0S[1]=1exit 1 S[2]=1 exit 3 K0 Inactive[4] R exit 2 A exit 4 K0[4] K1[4] Inactive[5] B exit 5 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] Code generation – acyclic case emit B
# || boot: # await A await B loop-every State encoding halt S[4]=1 S[3]=1 S[2]=0S[1]=1 S[2]=1 K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] Code generation – acyclic case emit B
# || boot: # await A await B loop-every State encoding halt S[4]=1 S[3]=1 S[2]=0S[1]=1 S[2]=1 K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] Code generation – acyclic case emit B
# || boot: # await A await B loop-every State encoding halt K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]= Code generation – acyclic case emit B
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B if(S[1]){ } else { }
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B if(S[1]){ if(R){ } else { } } else { }
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B bool aux=0; if(S[1]){ if(R){aux=1;} else { } } else {aux=1;} if(aux){ }
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B bool aux=0; if(S[1]){ if(R){aux=1;} else { } } else {aux=1;} if(aux){S[1..4]=1011;}
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B bool aux=0; if(S[1]){ if(R){aux=1;} else { if(!S[2]){ }} } else {aux=1;} if(aux){S[1..4]=1011;}
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B bool aux=0; if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} }} } else {aux=1;} if(aux){S[1..4]=1011;}
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling Code generation – acyclic case emit B bool aux=0; if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} if(S[4])if(B)S[4]=0; }} } else {aux=1;} if(aux){S[1..4]=1011;}
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling bool aux=0; if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} if(S[4])if(B)S[4]=0; if(S[3]=0&S[4]=0){ } }} } else {aux=1;} if(aux){S[1..4]=1011;} Code generation – acyclic case emit B
K0 Inactive[4] R A S[3]=0 K0[4] K1[4] Inactive[5] B S[4]=0 K0[5] K1[5] K1 emit O S[1] S[2] S[3] S[4] S[1..4]=1011 S[2]=1 Static scheduling bool aux=0; if(S[1]){ if(R){aux=1;} else { if(!S[2]){ if(S[3])if(A){S[3]=0;B=1;} if(S[4])if(B)S[4]=0; if(S[3]=0&S[4]=0){ O=1;S[2]=1; } }} } else {aux=1;} if(aux){S[1..4]=1011;} Code generation – acyclic case emit B
Static analysis and optimizations GRC specifications => usually very redundant Optimizations Compatible with the code generation scheme –GRC code optimizations –Software encoding optimizations Semantic-preserving Fast, efficient How? Static analysis Fast, efficient Prepare the optimizations and the encoding Not software-specific
Static analysis, example trap T in sustain A || await B;await C;exit T end || sustain A await B # await C # boot: nt: Static analysis and optimizations Utility: –Simplify the state access/update protocol –Simplify the state encoding Same status at all instants
Optimized state encoding trap T in sustain A || await B;await C;exit T end || sustain A await B # await C # boot: – boot instant – « sustain A »,« await B » active – « sustain A »,« await C » active – program terminated XXX XXXX Unoptimized encoding: States: 0 Static analysis and optimizations
Optimized state encoding trap T in sustain A || await B;await C;exit T end || sustain A await B # await C # boot: – boot instant – « sustain A »,« await B » active – « sustain A »,« await C » active – program terminated X XX Optimized encoding: States: 0 nt: Static analysis and optimizations
Dependency removal (propagation of exclusions) [2] 0 [1] enter 3enter 2exit 1 exit 3 S exit 4 enter 4 exit 2exit 0 [4] 2 [3] pause; present S then emit T end; pause; emit S; # # boot: emit S emit T Static analysis and optimizations
Dependency removal (propagation of exclusions) [2] 0 [1] enter 3enter 2exit 1 exit 3 S exit 4 enter 4 exit 2exit 0 [4] 2 [3] pause; present S then emit T end; pause; emit S; # # boot: emit S emit T Static analysis and optimizations
Dependency removal (propagation of exclusions) [2] 0 [1] enter 3enter 2exit 1 exit 3 S exit 4 enter 4 exit 2exit 0 [4] 2 [3] pause; present S then emit T end; pause; emit S; # # boot: emit S emit T Static analysis and optimizations
Acyclic Correct specification Efficient code generation Depends on the representation(GRC,circuit,…) –Compatibility, correctness, and efficiency issue (algorithmic) –Circuits = privileged representation (finer, cleaner) Unify GRC-level and circuit-level cyclicity Generate simulation code (future work) Cyclic specifications
Difference only on synchronizers Solution: synchronizer splitting –GRC code refinement –Inexpensive local analysis Unify GRC-level, circuit-level cyclicity K 1 [0] K 0 [0] K 0 [1] K1K1 I S GO K0K0 I I K1K1 K0K0 [ present I then present S then pause else pause end || nothing ]; emit S
Synopsys (S. Edwards) –Similar intermediate format, state encoding –State already encoded in the intermediate form –Better context switch encoding FTR&D (Bertin, Closse, Weil, Pulou, Poize) –Hierarchic state representation flattened into a list of pending threads ordered by a static scheduling Comparison with existing compilers G1G1 GnGn G2G2 P1P1 PnPn P2P2...
Results Prototype compiler (acyclic case) Examples: 1.Turbo channel bus 2.Berry’s wristwatch 3.Video generator 4.Shock absorber 5.Operating system model 6.Avionics fuel controller 7.Avionics cockpit 8.Man-machine interface Test configuration: PIII/1GHz/128M/Linux gcc-2.96 –O, 1Mcycle random or given
Semantics of Esterel with data Intermediate model for Esterel programs Static analysis and optimization at this level Characterization of circuit-level cyclicity at this level General code generation scheme Prototype compiler, acyclic case Correctness proofs, complete implementation (work in progress) Cyclic programs… Conclusion Future …
Still cyclic… Hybrid code scheduling technique –Abstract the SCCs => globally acyclic graph –Static scheduling for the acyclic graph => efficiency? –Circuit-level simulation techniques on SCCs Does not guarantee program correctness –Verify otherwise –Simulation (not implementation)