Tecniche di ottimizzazione per lo sviluppo di applicazioni embedded su piattatforme multiprocessore su singolo chip Michela Milano

Slides:



Advertisements
Similar presentations
On the Complexity of Scheduling
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
ECE 667 Synthesis and Verification of Digital Circuits
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
Class-constrained Packing Problems with Application to Storage Management in Multimedia Systems Tami Tamir Department of Computer Science The Technion.
VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
© Imperial College London Eplex: Harnessing Mathematical Programming Solvers for Constraint Logic Programming Kish Shen and Joachim Schimpf IC-Parc.
Progress in Linear Programming Based Branch-and-Bound Algorithms
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
1 Logic-Based Benders Methods for Planning and Scheduling John Hooker Carnegie Mellon University August 2003.
Abhijit Davare 1, Qi Zhu 1, Marco Di Natale 2, Claudio Pinello 3, Sri Kanajan 2, Alberto Sangiovanni-Vincentelli 1 1 University of California, Berkeley.
System design-related Optimization problems Michela Milano Joint work DEIS Università di Bologna Dip. Ingegneria Università di Ferrara STI Università di.
1 Optimisation Although Constraint Logic Programming is somehow focussed in constraint satisfaction (closer to a “logical” view), constraint optimisation.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
1 Planning and Scheduling to Minimize Tardiness John Hooker Carnegie Mellon University September 2005.
1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Decision Procedures An Algorithmic Point of View
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
Column Generation Approach for Operating Rooms Planning Mehdi LAMIRI, Xiaolan XIE and ZHANG Shuguang Industrial Engineering and Computer Sciences Division.
CHALLENGING SCHEDULING PROBLEM IN THE FIELD OF SYSTEM DESIGN Alessio Guerri Michele Lombardi * Michela Milano DEIS, University of Bologna.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Approximation Algorithms for Task Allocation with QoS and Energy Considerations Bader N. Alahmad.
A Hybrid Approach to QoS Evaluation Danilo Ardagna Politecnico di Milano Marco Comerio, Flavio De Paoli, Simone Grega Università degli Studi Milano Bicocca.
Scheduling policies for real- time embedded systems.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Column Generation By Soumitra Pal Under the guidance of Prof. A. G. Ranade.
1 Iterative Integer Programming Formulation for Robust Resource Allocation in Dynamic Real-Time Systems Sethavidh Gertphol and Viktor K. Prasanna University.
Branch-and-Cut Valid inequality: an inequality satisfied by all feasible solutions Cut: a valid inequality that is not part of the current formulation.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Resource Allocation in Hospital Networks Based on Green Cognitive Radios 王冉茵
Introduction to Real-Time Systems
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
Martino Ruggiero, Michele Lombardi, Michela Milano and Luca Benini
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Roman Barták (Charles University in Prague, Czech Republic) ACAT 2010.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Scheduling with Constraint Programming
TensorFlow– A system for large-scale machine learning
Decision Support Systems
OPERATING SYSTEMS CS 3502 Fall 2017
Combinatorial Optimization for Embedded System Design
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
6.5 Stochastic Prog. and Benders’ decomposition
FPGA: Real needs and limits
CSCI1600: Embedded and Real Time Software
1.3 Modeling with exponentially many constr.
Chapter 6. Large Scale Optimization
Objective of This Course
Integer Programming (정수계획법)
Integer Programming (정수계획법)
Presented By: Darlene Banta
6.5 Stochastic Prog. and Benders’ decomposition
CSCI1600: Embedded and Real Time Software
Chapter 6. Large Scale Optimization
Presentation transcript:

Tecniche di ottimizzazione per lo sviluppo di applicazioni embedded su piattatforme multiprocessore su singolo chip Michela Milano DEIS Università di Bologna

Digital Convergence – Mobile Example Broadcasting TelematicsImaging Computing Communication Entertainment One device, multiple functions Center of ubiquitous media network Smart mobile device: next drive for semicon. Industry

SoC: Enabler for Digital Convergence Today Future > 100X Performance Low Power Complexity Storage SoCSoC

Design as optimization Design space The set of “all” possible design choices Constraints Solutions that we are not willing to accept Cost function A property we are interested in (execution time, power, reliability…)

Embedded system design MOTIVATION & CONTEXT System design with MPSoCs Exploit application and platform parallelism to achieve real time performance  Given a platform description  and an application abstraction  compute an allocation & schedule  verify results & perform changes Design flow P1P2ram P1 P t Crucial role of the allocation & scheduling algorithm

MPSoC platform  Identical processing elements (PE)  Local storage devices  Remote on-chip memory  Shared bus PROBLEM DESCRIPTION MPSoC platform Resources: Constraints:  PEs  Local memory devices  Shared BUS  PE frequencies  Local device capacity  Bus bandwidth (additive resource)  Architecture dependent constraints DVS

Application  Nodes are tasks/processes  Arcs are data communication PROBLEM DESCRIPTION RD Each task:  Reads data for each ingoing arc  Performs some computation  Writes data for each outgoing arc EXEC WR COMM. BUFFER LOC/REM PE PROG. DATA LOC/REM FREQ. Application Task Graph

Application  Memory allocation  Remote memory is slower than the local ones  Execution frequency PROBLEM DESCRIPTION RDEXEC WR COMM. BUFFER LOC/REM PE PROG. DATA LOC/REM FREQ. Durations depend on: Different phases have different bus requirements Application Task Graph

Application PROBLEM VARIANTS O. F. We focused on problem variants with different  Objective function  Graph features G.F. Bus traffic Energy consumption Makespan Pipelined Generic Generic with cond. branches DVS no DVS Problem variants

Objective function PROBLEM VARIANTS Bus traffic  Tasks produce traffic when they access the bus  Completely depends on memory allocation Makespan  Completely depends on the computed schedule Energy (DVS)  The higher the frequency, the higher the energy consumption  Cost for frequency switching 123 t 4 Time & energy cost Allocation dependent Schedule dependent Objective function

Graph structure PROBLEM VARIANTS 1234 Pipelined Typical of stream processing applications Generic Generic with cond. branches a !a b !b Stochastic problem Stochastic O.F. (exp. value) Graph Structure

Application Development Flow CTG Characterization Phase Simulator Optimization Phase Optimizer Application Profiles Optimal SW Application Implementation Allocation Scheduling Application Development Support Platform Execution

When & Why Offline Optimization? Plenty of design-time knowledge Applications pre-characterized at design time Dynamic transitions between different pre- characterized scenarios Aggressive exploitation of system resources Reduces overdesign (lowers cost) Strong performance guarantees Applicable for many embedded applications

Question? Complete or incomplete solver? Can I solve the instances with a complete solver? See the structure of the problem and the average instance dimension If yes, which technique to use? If no, which is the quality of the heuristic solution proposed?

Optimization in system design Complete solvers find the optimal solution and prove its optimality The System Design community uses Integer Programming techniques for every optimization problem despite the structure of the problem itself Scheduling is poorly handled by IP Incomplete solvers: lack of estimated optimality gap Decomposition of the problem and sequential solution of each subproblem. Local search/metaheuristic algorithms Require a lot of tuning

Optimization techniques We will consider two techniques: Constraint Programming Integer Programming Two aspects of a problem : Feasibility Optimality Merging the two, one could obtain better results

Constraint Programming Quite recent programming paradigm declarative paradigm Inherits from logic programming operations research software engineering AI constraint solving

Constraint Programming Problem model variable domains constraints Problem solving constraint propagation search

Example 1 X::[1..10], Y::[5..15], X>Y Arc-consistency: for each value v of X, there should be a value in the domain of Y consistent with v X::[6..10], Y::[5..9] after propagation Example 2 X::[1..10], Y::[5..15], X=Y X::[5..10], Y::[5..10] after propagation Example 3 X::[1..10], Y::[5..15], X  Y No propagation Mathematical Constraints

Every variable involved in many constraints: each changing in the domain of a variable could affect many constraints Agent perspective: Example: Constraint Interaction X::[1..5], Y::[1..5], Z::[1..5] X = Y + 1 Y = Z + 1 X = Z - 1

Constraint Interaction X::[1..5], Y::[1..5], Z::[1..5] X = Y + 1 Y = Z + 1 X = Z - 1 First propagation of X = Y + 1 leads to X::[2..5] Y::[1..4] Z::[1..5] X = Y + 1 suspended

Constraint Interaction Second propagation of Y = Z + 1 leads to X::[2..5] Y::[2..4] Z::[1..3] Y = Z + 1 suspended The domain og Y is changed and X = Y + 1 is awaken X::[3..5] Y::[2..4] Z::[1..3] X = Y + 1 suspended

Constraint Interaction Third propagation of Z = X - 1 leads to The order in which constraints are considered does not affect the result BUT can influence the efficiency FAIL X::[] Y::[2..4] Z::[1..3]

Global Constraints cumulative([S 1,..,S n ],[D 1,..,D n ],[R 1,..,R n ], L) S 1,...S n activity starting time (domain variables) D 1,...D n duration (domain variables) R 1,...R n resource consumption (domain variables) L resource capacity Given the interval [min,max] where min = min i { S i }, max = max{ S i +D i } - 1, the cumulative constraint holds iff max{  R i }  L j|S j  i  S j +D j

Global Constraints cumulative([1,2,4],[4,2,3],[1,2,2],3)

One of the propagation algorithms used in resource constraints is the one based on obligatory parts 8 8 S min S max Obligatory part Propagation

Another propagation is the one based on edge finding [ Baptiste, Le Pape, Nuijten, IJCAI95] Consider a unary resource R and three activities that should be scheduled on the R S1 S2 S3

Propagation S1 S2 S3 We can deduce that the earliest start time of S1 is 8. In fact, S1 should be executed after both S2 e S3. Global reasoning Global reasoning: if S1 is executed before the two, then there is no space for executing both S2 and S

CP solving process The solution process interleaves propagation and search Constraints propagate as much as possible When no more propagation can be performed, either the process fails since one domain is empty or search is performed. The search heuristics chooses Which variable to select Which value to assign to it Optimization problems Solved via a set of feasibility problems with constraints on the objective function variable

Pros and Cons Declarative programming: the user states the constraint the solver takes into account propagation and search Strong on the feasibility side Constraints are symbolic and mathematic: expressivity Adding a constraint helps the solution process: flexibility  Weak optimality pruning if the link between problem decision variables and the objective function is loose  No use of relaxations

Standard form of Combinatorial Optimization Problem (IP) min z =  c j x j subject to  a ij x j = b i i = 1..m x j  0 j = 1..n x j integer Inequality y  0 recasted in y - s = 0 Maximization expressed by negating the objective function j =1 n n May make the problem NP complete Integer Programming

0-1 Integer Programming Many Combinatorial Optimization Problem can be expressed in terms of 0-1 variables (IP) min z =  c j x j subject to  a ij x j = b i i = 1..m x j :[0,1] j =1 n n May make the problem NP complete

Linear Relaxation min z =  c j x j subject to  a ij x j = b i i = 1..m x j  0 j = 1..n xj integer The linear relaxation is solvable in POLYNOMIAL TIME The SIMPLEX ALGORITHM is the technique of choice even if it is exponential in the worst case j =1 n n Removed Linear Relaxation

Geometric Properties The set of constraints defines a polytope The optimal solution is located on one of its vertices min z =  c j x j subject to  a ij x j = b i i = 1..m x j  0 j = 1..n The simplex algorithm starts from one vertex and moves to an adjacent one with a better value of the objective function j =1 n n Optimal solution Objective function

IP solving process The optimal LP solution is in general fractional: violates the integrality constraint but provides a bound on the solution of the overall problem The solution process interleaves branch and bound: relaxation search

Pros and Cons Declarative programming: the user states the constraint the solver takes into account relaxation and search Strong on the optimality side Many real problem structure has been deeply studied  Only linear constraints should be used  If sophisticated techniques are used, we lose flexibility  No pruning on feasibility (only some preprocessing)

Resource-Efficient Application mapping for MPSoCs Given a platform 1. Achieve a specified throughput 2. Minimize usage of shared resources MULTIMEDIA APPLICATIONS

Allocation and scheduling Given: An hardware platform with processors, (local and remote) storage devices, a communication channel A pre characterized task graph representing a functional abstraction of the application we should run Find: An allocation and a scheduling of tasks to resources respecting Real time constraints Task deadlines Precedences among tasks Capacity of all resources Such that the communication cost is minimized

Allocation and scheduling The platform is a multi-processor system with N nodes Each node includes a processor and a schretchpad memory The bus is a shared communication channel In addition we have a remote memory of unlimited capacity (realistic assomption for our application, but easily generalizable) The task graph has a pipeline workload Real time video graphic processing pixel of a digital image Task dependencies, i.e., arcs between tasks Computation, communication, storage requirements on the graph

Embedded system design MOTIVATION & CONTEXT  Given a platform description  and an application abstraction  compute an allocation & schedule Design flow P1P2ram P1 P t Allocation and scheduling

MPSoC platform  Identical processing elements (PE)  Local storage devices  Remote on-chip memory  Shared bus PROBLEM DESCRIPTION MPSoC platform Resources: Constraints:  PEs Unary resource  Local memory devices Limited capacity  Shared BUS Limited bandwidth  Remote memory assumed to be infinite  Local device capacity  Bus bandwidth (additive resource)  Architecture dependent constraints

Application  Nodes are tasks/processes  Arcs are data communication PROBLEM DESCRIPTION RD Each task:  Reads data for each ingoing arc  Performs some computation  Writes data for each outgoing arc EXEC WR COMM. BUFFER LOC/REM PE PROG. DATA LOC/REM FREQ. Application Task Graph

Application  Memory allocation  Remote memory is slower than the local ones PROBLEM DESCRIPTION RDEXEC WR COMM. BUFFER LOC/REM PE PROG. DATA LOC/REM FREQ. Durations depend on: Different phases have different bus requirements Application Task Graph

Problem structure As a whole it is a scheduling problem with alternative resources: very tough problem It smoothly decomposes into allocation and scheduling Allocation better handled with IP techniques Not with CP due to the complex objective function Scheduling better handled with CP techniques Not with IP since we should model for each task all its possible starting time with a 0/1 variable INTERACTION REGULATED VIA CUTTING PLANES

Logic Based Benders Decomposition Obj. Function: Communication cost Valid allocation Allocation INTEGER PROGRAMMING Scheduling: CONSTRAINT PROGRAMMING No good: linear constraint Memory constraints Timing constraint Decomposes the problem into 2 sub-problems: Allocation → IP Objective Function: communication of tasks Scheduling → CP Secondary Objective Function: makespan

Master Problem model Assignment of tasks and memory slots  Assignment of tasks and memory slots (master problem) T ij = 1 if task i executes on processor j, 0 otherwise, Y ij =1 if task i allocates program data on processor j memory, 0 otherwise, Z ij =1 if task i allocates the internal state on processor j memory, 0 otherwise X ij =1 if task i executes on processor j and task i+1 does not, 0 otherwise Each process should be allocated to one processor  T ij = 1 for all j Link between variables X and T: X ij = | T ij – T i+1 j | for all i and j (can be linearized) If a task is NOT allocated to a processor nor its required memories are: T ij = 0  Y ij =0 and Z ij =0 Objective function   mem i (T ij - Y ij ) + state i (T ij - Y ij ) + data i X ij /2 i ij

Improvement of the model With the proposed model, the allocation problem solver tends to pack all tasks on a single processor and all memory required on the local memory so as to have a ZERO communication cost: TRIVIAL SOLUTION  With the proposed model, the allocation problem solver tends to pack all tasks on a single processor and all memory required on the local memory so as to have a ZERO communication cost: TRIVIAL SOLUTION  To improve the model we should add a relaxation of the subproblem to the master problem model:  For each set S of consecutive tasks whose sum of durations exceeds the Real time requirement, we impose that their processors should not be the same  WCET i > RT   T ij  |S| -1 i  S

Sub-Problem model  Task scheduling with static resource assignment  Task scheduling with static resource assignment (subproblem)  We have to schedule tasks so we have to decide when they start  Activity Starting Time: Start i ::[0..Deadline i ]  Precedence constraints: Start i +Dur i  Start j  Real time constraints: for all activities running on the same processor  ( Start i +Dur i )  RT  Cumulative constraints on resources processors are unary resources: cumulative([Start], [Dur], [1],1) memories are additive resources: cumulative([Start],[Dur],[MR],C) What about the bus?? i

Bus model BANDWIDTH BIT/SEC TIME Max bus bandwidth Size of program data TaskExecTime Task0 accesses input data: BW=MaxBW/NoProc Task0 reads state Task0 writes state task0 task1 Additive bus model The model does not hold under heavy bus congestion Bus traffic has to be minimized

Results Algorithm search time The combined approach dominates, and its higher complexity comes out only for simple system configurations

Energy-Efficient Application mapping for MPSoCs Given a platform 1. Achieve a specified throughput 2. Minimize power consumption MULTIMEDIA APPLICATIONS

Logic Based Benders Decomposition Obj. Function: Communication cost & energy consumption Valid allocation Allocation & Freq. Assign.: INTEGER PROGRAMMING Scheduling: CONSTRAINT PROGRAMMING No good and cutting planes Memory constraints Timing constraint Decomposes the problem into 2 sub-problems: Allocation & Assignment (& freq. setting) → IP Objective Function: minimizing energy consumption during execution Scheduling → CP Objective Function: E.g.: minimizing energy consumption during frequency switching

Allocation problem model Xtfp = 1 if task t executes on processor p at frequency f; Wijfp = 1 if task i and j run on different core. Task i on core p writes data to j at freq. f; Rijfp = 1 if task i and j run on different core. Task j on core p reads data to i at freq. f; Each task can execute only on one processor at one freq. Communication between tasks can execute only once for execution and one write corresponds to one read The objective function: minimize energy consumption associated with task execution and communication

Communication energy for Reads from shared memory. Reads carried out at the same frequency of the task Allocation problem model Bus Mem CPU Computation energy for all tasks in the system Communication energy for Writes to shared memory. Writes carried out at the same frequency of the task

Scheduling problem model INPUTEXECOUTPUT The objective function: minimize energy consumption associated with frequency switching Processors are modelled as unary resource Bus is modelled as additive resource Duration of task i is now fixed since mode is fixed: Task i Task j Tasks running on the same processor at the same frequency Tasks running on the same processor at different frequencies Tasks running on different processors

Computational efficiency Search time for an increasing number of tasks Similar plot for an increasing number of processors Standalone CP and IP proved not comparable on a simpler problem We varied the RT constraint: tight deadline: few feasible solutions very loose deadline: trivial solution search time within 1 order of magnitude P4 2GHz 512 Mb RAM Professional solving tools (ILOG)

Conditional Task Graphs Conditional Task graphs: the problem becomes stochastic since the outcome of the conditions labeling the arcs are known at execution time. We only know the probability distribution Minimize the expected value of the objective function Min communication cost: easier Min makespan: much more complicated Promising results

Other problems What if the task durations are not known: only the worst and best case are available Scheduling only the worst case is not enough Scheduling anomalies Change platform: We are facing the CELL BE Processor architecture Model synchronous data flow applications Model communication intensive applications

OptimizationDevelopment The abstraction gap between high level optimization tools and standard application programming models can introduce unpredictable and undesired behaviours. Programmers must be conscious about simplified assumptions taken into account in optimization tools. Platform Modelling Optimization Analysis Optimal Solution Starting Implementation Platform Execution Abstraction gap (. Final Implementation Challenge: the Abstraction Gap

MAX error lower than 10% AVG error equal to 4.51%, with standard deviation of 1.94 All deadlines are met Optimizer Optimal Allocation & Schedule Virtual Platform validation 250 instances Validation of optimizer solutions Throughput Probability (%) Throughput difference (%)

MAX error lower than 10%; AVG error equal to 4.80%, with standard deviation of 1.71; Optimizer Optimal Allocation & Schedule Virtual Platform validation 250 instances Validation of optimizer solutions Power 250 instances Probability (%) Energy consumption difference (%)

GSM Encoder Throughput required: 1 frame/10ms. With 2 processors and 4 possible frequency & voltage settings: Task Graph:  10 computational tasks;  15 communication tasks. Without optimizations: 50.9μJ With optimizations: 17.1 μJ - 66,4%

Challenge: programming environment A software development toolkit to help programmers in software implementation: a generic customizable application template  OFFLINE SUPPORT; a set of high-level APIs  ONLINE SUPPORT in RT-OS (RTEMS) The main goals are: predictable application execution after the optimization step; guarantees on high performance and constraint satisfaction. Starting from a high level task and data flow graph, software developers can easily and quickly build their application infrastructure. Programmers can intuitively translate high level representation into C-code using our facilities and library