- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.

Slides:

Advertisements

Similar presentations

Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.

Advertisements

Computer Architecture

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Hardware/Software Codesign.

© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.

ECE-777 System Level Design and Automation Hardware/Software Co-design

Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.

Programming Languages Marjan Sirjani 2 2. Language Design Issues Design to Run efficiently : early languages Easy to write correctly : new languages.

RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor Paul Regnier † George Lima † Ernesto Massa † Greg Levin ‡ Scott Brandt ‡

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Vertically Integrated Analysis and Transformation for Embedded Software John Regehr University of Utah.

System Level Design: Orthogonalization of Concerns and Platform- Based Design K. Keutzer, S. Malik, R. Newton, J. Rabaey, and A. Sangiovanni-Vincentelli.

1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.

CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.

Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.

Chess Review May 10, 2004 Berkeley, CA Platform-based Design for Mixed Analog-Digital Designs Fernando De Bernardinis, Yanmei Li, Alberto Sangiovanni-Vincentelli.

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

February 21, 2008 Center for Hybrid and Embedded Software Systems Mapping A Timed Functional Specification to a Precision.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.

November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli

1 Platform-Based Design A paper by Alberto Sangiovanni-Vincentelli EE 249, 11/5/2002 Presenter: Mel Tsai.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Platform-based Design for Mixed Analog-Digital Designs Fernando De Bernardinis, Yanmei Li, Alberto Sangiovanni-Vincentelli May 10, 2004 Analog Platform.

A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.

Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.

Architecture and Software Product Lines A software architecture represents a significant investment of time and effort, usually by senior talent. So it.

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.

Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.

Precision Going back to constant prop, in what cases would we lose precision?

- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”

Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.

EECE **** Embedded System Design

Floating-point to fixed-point code conversion with variable trade-off between computational complexity and accuracy loss Alexandru Bârleanu, Vadim Băitoiu.

Fixed-Point Arithmetics: Part II

Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.

ECE232: Hardware Organization and Design

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.

1 H ardware D escription L anguages Modeling Digital Systems.

High Performance Embedded Computing © 2007 Elsevier Lecture 3: Design Methodologies Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based.

8-1 Embedded Systems Fixed-Point Math and Other Optimizations.

High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.

© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.

Hardware/Software Partitioning of Floating-Point Software Applications to Fixed-Point Coprocessor Circuits Lance Saldanha, Roman Lysecky Department of.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

 Embedded Digital Signal Processing (DSP) systems  Specification with floating-point data types  Implementation in fixed-point architectures  Precision.

- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.

CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.

ESPL 1 Wordlength Optimization with Complexity-and-Distortion Measure and Its Application to Broadband Wireless Demodulator Design Kyungtae Han and Brian.

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Overview of Compilers and JikesRVM John.

System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.

Automatic Evaluation of the Accuracy of Fixed-point Algorithms Daniel MENARD 1, Olivier SENTIEYS 1,2 1 LASTI, University of Rennes 1 Lannion, FRANCE 2.

High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.

Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.

An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.

Efficient Partitioning of Fragment Shaders for Multiple-Output Hardware Tim Foley Mike Houston Pat Hanrahan Computer Graphics Lab Stanford University.

Hardware/Software Co-Design of Complex Embedded System NIKOLAOS S. VOROS, LUIS SANCHES, ALEJANDRO ALONSO, ALEXIOS N. BIRBAS, MICHAEL BIRBAS, AHMED JERRAYA.

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.

On the Relation Between Simulation-based and SAT-based Diagnosis CMPE 58Q Giray Kömürcü Boğaziçi University.

Evaluating Register File Size

VLSI Testing Lecture 5: Logic Simulation

Objective of This Course

Overview of Workflows: Why Use Them?

Presentation transcript:

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign

- 2 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Design productivity gap

- 3 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund © Lauro Rizzatti Marketing Vice President Emulation & Verification Engineering (EVE)

- 4 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Today: people taking about crises! Previous ITRS editions have documented a design productivity gap: the number of available transistors grows faster than the ability to meaningfully design them. Yet, investment in process technology has by far dominated investment in design technology. Good news: Enabling progress in DT continues. :-) Bad news:  Test cost has grown exponentially relative to manufacturing cost.  Today, many design technology gaps are crises. Previous ITRS editions have documented a design productivity gap: the number of available transistors grows faster than the ability to meaningfully design them. Yet, investment in process technology has by far dominated investment in design technology. Good news: Enabling progress in DT continues. :-) Bad news:  Test cost has grown exponentially relative to manufacturing cost.  Today, many design technology gaps are crises. [ ITRS, Design Report 2003, ]

- 5 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Current approach: Improving DT step-by-step [ ITRS, Design Report 2003, ]

- 6 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Reuse as a way out Pre-designed standard components to be used. Standard software components Standard hardware components  Platform-based design Pre-designed standard components to be used. Standard software components Standard hardware components  Platform-based design

- 7 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Platform-based design A platform is a family of architectures satisfying a set of constraints imposed to allow the reuse of hardware and software components. However, a hardware platform is not enough. Quick, reliable, derivative design requires using a platform application programming interface (API) to extend the platform toward application software. In general, a platform is an abstraction layer that covers many possible refinements to a lower level. Platform-based design is a meet-in-the-middle approach: In the top-down design flow, designers map an instance of the upper platform to an instance of the lower, and propagate design constraints [Sangiovanni-Vincentelli, 2002].

- 8 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Platform-based design Platform instances Platform abstraction levels Top-Down: Map an instance of the upper platform onto an lower platform considering appropriate constrains. Bottom-Up: Find the appropriate platform levels. Define platform level parameters

- 9 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Platform-based design [Sangiovanni-Vincentelli, DAC 2004] Decouples the application development process from the architectural implementation process. System Platform Stack The main application area. The primary notion of PBD originates here. Network Platforms Equivalent to protocol stacks. Analog Platform Performance models, behavioral models and interconnection models. System Platform Stack The main application area. The primary notion of PBD originates here. Network Platforms Equivalent to protocol stacks. Analog Platform Performance models, behavioral models and interconnection models. Few design areas suitable for PBD:

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Iterative approach (1) Guided by performance evaluation

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Essentially the same with our flow … System architecture Performance simulation Refine System behavior Implementation Mapping

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Iterative approach: SpecC model

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Overview of design activities Task level concurrency management Which tasks in the final system? High level transformations Transformation that are outside the scope of traditional compilers Hardware/software partitioning Which operation mapped to hardware, which to software? Compilation Hardware-aware compilation Scheduling Performed several times, with varying precision Design space exploration Set of possible designs, not just one. Task level concurrency management Which tasks in the final system? High level transformations Transformation that are outside the scope of traditional compilers Hardware/software partitioning Which operation mapped to hardware, which to software? Compilation Hardware-aware compilation Scheduling Performed several times, with varying precision Design space exploration Set of possible designs, not just one.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Task-level concurrency management Granularity: size of tasks (e.g. in instructions) Readable specifications and efficient implementations can possibly require different task structures.  Granularity changes Granularity: size of tasks (e.g. in instructions) Readable specifications and efficient implementations can possibly require different task structures.  Granularity changes

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Merging of tasks Reduced overhead of context switches, More global optimization of machine code, Reduced overhead for inter-process/task communication. Reduced overhead of context switches, More global optimization of machine code, Reduced overhead for inter-process/task communication.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Splitting of tasks No blocking of resources while waiting for input, more flexibility for scheduling, possibly improved result. No blocking of resources while waiting for input, more flexibility for scheduling, possibly improved result.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Merging and splitting of tasks The most appropriate task graph granularity depends upon the context  merging and splitting may be required. Merging and splitting of tasks should be done automatically, depending upon the context. The most appropriate task graph granularity depends upon the context  merging and splitting may be required. Merging and splitting of tasks should be done automatically, depending upon the context.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Automated rewriting of the task system - Example -

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Attributes of a system that needs rewriting Tasks blocking after they have already started running

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Work by Cortadella et al. 1.Transform each of the tasks into a Petri net, 2.Generate one global Petri net from the nets of the tasks, 3.Partition global net into “sequences of transition” 4.Generate one task from each such sequence 1.Transform each of the tasks into a Petri net, 2.Generate one global Petri net from the nets of the tasks, 3.Partition global net into “sequences of transition” 4.Generate one task from each such sequence Mature, commercial approach not yet available

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Result, as published by Cortadella Reads only at the beginning Initialization task Always true Never true

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Optimized version of Tin Tin () { READ (IN, sample, 1); sum += sample; i++; DATA = sample; d = DATA; L0: if (i < N) return; DATA = sum/N; d = DATA; d = d*c; WRITE(OUT,d,1); sum = 0; i = 0; return; } Tin () { READ (IN, sample, 1); sum += sample; i++; DATA = sample; d = DATA; L0: if (i < N) return; DATA = sum/N; d = DATA; d = d*c; WRITE(OUT,d,1); sum = 0; i = 0; return; } Always true j==i-1 j  i Never true

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Task-level concurrency management (2)  The dynamic behavior of applications getting more attention.  Energy consumption reduction is the main target.  Some classes of applications (i.e. video processing) have a considerable variation in processing power requirements depending on input data.  Static design-time methods becoming insufficient.  Runtime-only methods not feasible for embedded systems.  How about mixed approaches?  The dynamic behavior of applications getting more attention.  Energy consumption reduction is the main target.  Some classes of applications (i.e. video processing) have a considerable variation in processing power requirements depending on input data.  Static design-time methods becoming insufficient.  Runtime-only methods not feasible for embedded systems.  How about mixed approaches?

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Example of a mixed TCM [IMEC, Belgium, …or they can define a probability for violating the deadline. t Deadline Task1 Task2 Task3 Static (compile-time) methods can ensure WCET feasible schedules, but waste energy in the average case. t E Deadline Runtime scheduler selects the most energy saving, deadline preserving combination. t Deadline Mixed methods use compile-time analysis to define a set of possible execution parameters for each task.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Example of an mixed TCM [IMEC, Belgium, „Gray-box“: Extract only the information needed for scheduling. Transformations: Merge and/or split task. (Functionality comparable to Cortadella’s approach.) Find Pareto-curves for each task. Runtime scheduler: uses an heuristic to combine the Pareto-curves.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Floating-point to fixed point conversion Pros: –Lower cost –Faster –Lower power consumption –Sufficient SQNR, if properly scaled –Suitable for portable applications Cons: –Decreased dynamic range –Finite word-length effect, unless properly scaled Overflow and excessive quantization noise –Extra programming effort Pros: –Lower cost –Faster –Lower power consumption –Sufficient SQNR, if properly scaled –Suitable for portable applications Cons: –Decreased dynamic range –Finite word-length effect, unless properly scaled Overflow and excessive quantization noise –Extra programming effort © Ki-Il Kum, et al. (Seoul National University): A Floating-point To Fixed-point C Converter For Fixed-point Digital Signal Processors, 2nd SUIF Workshop, 1996

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Fixed-Point Data Format S hypothetical binary point IWL=3 S (a) Integer (b) Fixed-Point FWL © Ki-Il Kum, et al Floating-Point vs. Fixed-Point Integer vs. Fixed-Point –exponent, mantissa –Floating-Point automatic computation and update of each exponent at run-time –Fixed-Point implicit exponent determined off-line –exponent, mantissa –Floating-Point automatic computation and update of each exponent at run-time –Fixed-Point implicit exponent determined off-line

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Assignment and Addition/Subtraction Assume y = x, with -x (IWL=2) and -y (IWL=3): Assume y = x, with -x (IWL=2) and -y (IWL=3): s s x x>>1 y s Let result = x + y: equalizing each IWL s y s result + © Ki-Il Kum, et al s x x>>1 s

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Multiplication Assume result = x * y, with -x (IWL=2) and -y (IWL=3) ->result (IWL=2+3) Assume result = x * y, with -x (IWL=2) and -y (IWL=3) ->result (IWL=2+3) s x * y s s result © Ki-Il Kum, et al s s

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Development Procedure Range Estimation C Program Execution Floating-Point C Program Fixed-Point C Program Floating- Point to Fixed-Point C Program Converter Range Estimator Manual specification IWL information © Ki-Il Kum, et al

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Range Estimator C pre-processor C front-end ID assignment Subroutine call insertion SUIF-to-C converter Floating-Point C Program Range Estimation C Program IWL Information Execution float iir1(float x) { static float s = 0; float y; y = 0.9 * s + x; range(y, 0); s = y; range(s, 1); return y; } float iir1(float x) { static float s = 0; float y; y = 0.9 * s + x; range(y, 0); s = y; range(s, 1); return y; } Range Estimation C Program © Ki-Il Kum, et al

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Floating-Point to Fixed-Point Program Converter int iir1(int x) { static int s = 0; int y; y=sll(mulh(29491,s)+ (x>> 5),1); s = y; return y; } Fixed-Point C Program mulh –to access the upper half of the multiplied result –target dependent implementation sll –to remove 2 nd sign bit –opt. overflow check mulh –to access the upper half of the multiplied result –target dependent implementation sll –to remove 2 nd sign bit –opt. overflow check © Ki-Il Kum, et al

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Floating-Point to Fixed-Point Program Converter int iir1(int x) { static int s = 0; int y; y=sll(mulh(29491,s)+ (x>> 5),1); s = y; return y; } Fixed-Point C Program © Ki-Il Kum, et al IWL = 0  = 0x7333 = x IWL = 0 y IWL = 4 s IWL = 4 “mulh” IWL = 0+4+1s = 5  x>>5 for add

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Performance Comparison - Machine Cycles - © Ki-Il Kum, et al

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Performance Comparison - Machine Cycles - © Ki-Il Kum, et al

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Performance Comparison - SNR - © Ki-Il Kum, et al

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Fundamental considerations of tradeoffs by Brodersen (Berkeley)

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Fridge Fixed-Point Programming and Design Environment RWTH Aachen, commercialized by Synopsys as part of the CoCentric tool suite. Uses type definition features of C++ to define abstract data types (i.e. ‘fixed’) Incorporated into SystemC. (It’s used for bit-true simulation.) Needs architecture dependent back-end optimizations. RWTH Aachen, commercialized by Synopsys as part of the CoCentric tool suite. Uses type definition features of C++ to define abstract data types (i.e. ‘fixed’) Incorporated into SystemC. (It’s used for bit-true simulation.) Needs architecture dependent back-end optimizations. [ISS, Aachen,

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Fridge Fixed-Point Programming and Design Environment [ISS, Aachen, Workflow overview: Input: floating-point algorithm + designer supplied annotations. Conversion. Iterative, feedback through simulation. Back-end exploits architectural features. (i.e. mulh, sat, round) Output: Target optimized integer C code. Input: floating-point algorithm + designer supplied annotations. Conversion. Iterative, feedback through simulation. Back-end exploits architectural features. (i.e. mulh, sat, round) Output: Target optimized integer C code.

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Fridge Fixed-Point Programming and Design Environment [ISS, Aachen, DSP Back End Designer annotates some operands (with WL, IWL, …) Hybrid code: Partially converted to fixed-point. Interpolation: Automatic annotate of remaining operands, transfer each operand into fixed-point type. Code Gen.: Generates pure C code. Back End: Optimize for target. Bit-true simulation. Designer annotates some operands (with WL, IWL, …) Hybrid code: Partially converted to fixed-point. Interpolation: Automatic annotate of remaining operands, transfer each operand into fixed-point type. Code Gen.: Generates pure C code. Back End: Optimize for target. Bit-true simulation. Conversion steps:

 P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Today’s summary Design-Productivity-Gap: No final remedy available, but step-by-step improvements keep costs in a reasonable range. Platform based design: Reuse is the key. PBD is the systematic approach to it. Task-Concurrency-Management: Optimize the task set. Goals: Non-blocking job execution / Increased energy efficiency. Float-point to Fixed-point: Fixed-point arithmetic uses integer operations  Simpler and faster hardware than for float-point operations. Design-Productivity-Gap: No final remedy available, but step-by-step improvements keep costs in a reasonable range. Platform based design: Reuse is the key. PBD is the systematic approach to it. Task-Concurrency-Management: Optimize the task set. Goals: Non-blocking job execution / Increased energy efficiency. Float-point to Fixed-point: Fixed-point arithmetic uses integer operations  Simpler and faster hardware than for float-point operations.