EECS 249, Fall 1999 Task Runtime Response Optimization Using Cost-Based Operation Motion Abdallah Tabbara Bassam Tabbara Alberto Sangiovanni-Vincentelli University of California at Berkeley
© 1999 Tabbara et. al. EECS 249, Fall Embedded System Electronic “brain” found in many applications e.g. àConsumer electronics àTelecommunications Consists of: àSoftware: flexibility àHardware: performance Application requirements on the system: àSmall àEfficient àPower àOther metrics
© 1999 Tabbara et. al. EECS 249, Fall Hardware/Software Co-design SynthesisSynthesis DesignRepresentationDesignRepresentation Design Specification Design Specification EvaluationEvaluation ImplementationImplementation HW/SW Partitioning Micro-processor ASIC SW HW
© 1999 Tabbara et. al. EECS 249, Fall Problem Statement Target: heterogeneous control-dominated embedded system applications àFunctional decomposition captures design as a network of Finite State Machines extended with data computations (EFSMs) (e.g. Esterel front-end) Goal: run-time optimization for the synthesis of each individual task No assumptions on how tasks are composed in the whole system.
© 1999 Tabbara et. al. EECS 249, Fall Intermediate Design Representation Function Flow Graph (FFG) / C-Like Intermediate Format (CLIF) [Tabbara 99] ÞAble to represent EFSM ÞSuitable for control and data flow analysis EFSM FFG Optimized FFG SW/HW Synthesis Data Flow/Control Optimizations
© 1999 Tabbara et. al. EECS 249, Fall Function Flow Graph (FFG) is a triple G = (V, E, N 0 ) where àV is a finite set of nodes àE = (x,y), a subset of V V, is an edge from x to y where x Pred(y), the set of predecessor nodes of y. àN 0 N is the start node corresponding to the EFSM initial state. àOperations are associated with each node N. ÞTESTs performed on the EFSM inputs and internal variables ÞASSIGNs of computations on the input alphabet (inputs/internal variables) to the EFSM output alphabet (outputs and internal (state) variables)
© 1999 Tabbara et. al. EECS 249, Fall EFSM in FFG Form (An Example in State Tree Form) F0 F1 F2 F8 F7 F6 F5 F4 F3 S0 S1 S2
© 1999 Tabbara et. al. EECS 249, Fall Previous Work (1) Code motion (hoisting) from the software (HLS) domain(s) àAvoid unnecessary re-computations at runtime ÞTemporary variables (“registers”) at certain program points àMust be safe: Main strategy ÞAs early as possible [Morel 1979], [Knoop 1992] àPractice: register pressure ÞTemporary lifetime minimzation [Knoop 1994] Limitations: not cost based, laborious and involves addition of “synthetic nodes” in the control structure
© 1999 Tabbara et. al. EECS 249, Fall Previous Work (2) [Hailperin 98]: “cost” extension to [Knoop 94] àMetric based on individual operations (+, *, …) àNo concept of I/O preservation (Embedded Systems) We need task level runtime cost à[Castellucia 96]: Probabilities of inputs/tests guides ordering/restructuring of EFSM nodes in Esterel single automata Cost-guided Relaxed Operation Motion àUse code motion techniques: safe (correct), fast àGuidance from runtime (average/worst-case) statistics
© 1999 Tabbara et. al. EECS 249, Fall (Cost-guided) Relaxed Operation Motion Our Approach (polynomial complexity in FFG nodes) consists of 4 steps: 1.Data Flow and Control Optimizations 2.Reverse Sweep (as early as possible/cost guided) a)Dead operation addition b)Normalization c)Available operation elimination d)Copy propagation e)Dead elimination 3.Forward Sweep (register lifetime minimization) 4.Final optimization pass
© 1999 Tabbara et. al. EECS 249, Fall Motivating Example [Knoop 94] … S8: z = a + b; a = c; goto S9; S9: x = a + b; goto S10; S10: …
© 1999 Tabbara et. al. EECS 249, Fall Relaxed Operation Motion S8: _T30 = a + b; z = _T30; a = c; goto S9; S9: _T30 = a + b; x = _T30; goto S10; Optimization PassS7: _T30 = a + b; y = _T30; _T30 = a + b; _T29 = c + b; _T30 = a + b; goto S8; S8: _T30 = a + b; z = _T30; a = c; _T30 = a + b; _T29 = c + b; _T30 = a + b; goto S9; S9: _T30 = a + b; x = _T30; _T30 = a + b; a = c; _T29 = c + b; _T30 = a + b; a = c; goto S10; Dead addition
© 1999 Tabbara et. al. EECS 249, Fall Relaxed Operation Motion _T30 = a + b; …. S8: z = _T30; a = c; _T30 = c + b; goto S9; S9: x = _T30; a = c; _T30 = c + b; goto S10; Copy Propagation S1: _T31 = a + b; H = _T31; _T29 = c + b; … S8: z = H; H = _T29; goto S9; S9: x = H; goto S10; Optimization Pass S8: z = _T30; a = c; _T30 = a + b; goto S9; S9: x = _T30; a = c; _T30 = a + b; goto S10; Available Elimination
© 1999 Tabbara et. al. EECS 249, Fall Optimization and Synthesis Flow CDFG (SHIFT) Software Compilation Object Code (.o) Hardware Synthesis Netlist Or Cost Estimation Design Optimization HW/SW Co-Synthesis User InputProfiling Inference Engine Attributed FFG Relaxed Operation Motion FFG (back-end)
© 1999 Tabbara et. al. EECS 249, Fall Work In Progress Cost estimation methodology Operation motion àGuidance àLifetime optimality (forward sweep) Results collection on motivating example àWe already beat [Knoop 94] àEvaluate with various cost scenarios àCollect synthesis results
© 1999 Tabbara et. al. EECS 249, Fall Cost Estimation Using Bayesian Belief Networks (1)
© 1999 Tabbara et. al. EECS 249, Fall Cost Estimation Using Bayesian Belief Networks (2)
© 1999 Tabbara et. al. EECS 249, Fall Conclusions Novel approach for task runtime response optimization: àCode motion from software domain limited mostly to loop invariants, no real task runtime cost guidance Our approach: Relaxed Code Motion àIs “natural” in a control/data flow optimization framework àSpecialize to embedded domain tasks e.g. I/O preservation across invocations àApply application/environment driven costs to optimization