Download presentation
Presentation is loading. Please wait.
Published byErin Wilkinson Modified over 10 years ago
1
fakultät für informatik informatik 12 technische universität dortmund Optimizing embedded software for timing-predictability and memory-awareness Peter Marwedel TU Dortmund, Informatik 12 Informatik Centrum Dortmund (ICD) Dortmund, Germany
2
- 2 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 What is an Embedded System?
3
- 3 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Embedded Systems Dortmund Definition [Peter Marwedel]: Information processing systems embedded into a larger product Berkeley Modell [Ed Lee]: Embedded software is software integrated with physical* processes. The technical problem is managing time and concurrency in computational systems. Main reason for buying is not information processing * cyber-physical systems
4
- 4 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 The Problem with Timing in Current IT Difficulty in expressing timing in specs PCs are designed for good average case timing Sources of unpredictable or difficult-to-predict timing Caches Virtual memory Many communication systems Speculation Multiprocessing Shared resources Real timing abstracted in O-notation Computability ill-defined for embedded software
5
- 5 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Problems with Classical CS The lack of timing in the core abstraction is a flaw, from the perspective of embedded software, … What is needed is nearly a reinvention of computer science Ed Lee: Absolutely Positively on Time, IEEE Computer, July, 2005 Introduction of the term Cyber-physical system (CPS) Initially another term for ES; Emphasizing physics: time energy space
6
- 6 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Motivation Worst-case execution time aware compilation Memory-architecture aware compilation Summary Outline
7
- 7 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Worst Case Execution Time Analysis © AbsInt Real-time constraint WCET EST
8
- 8 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Current Trial-and-Error Based Development 1.Specification of ES software 2.Generation of Code (ANSI-C or similar) 3.Compilation for given target processor 4.Execution and/or simulation of machine code, using a (e.g. random) set of input data 5.Measurement-based computation of estimated worst case execution time (WCET meas ) 6.Adding safety margin (e.g. 20%) on top of WCET meas and call this WCET hypo 7.If WCET hypo > real-time constraint: change some detail, go back to 1 or 2.
9
- 9 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Problems with this Approach Dependability Computed WCET hypo not a safe approximation Time constraint may be violated Design time How to find necessary changes? How many iterations until successful? Make the common case fast a wrong approach for RT-systems Computer architecture and compiler techniques focus on average speed Circuit designers know its wrong Compiler designers (typically) dont Optimizing compilers unaware of cost functions other than code size period Common case fast
10
- 10 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Contributions towards Solving the Problem Predictable Hardware Architectures (currently an exception) Predictable Operating Systems Modified Algorithms: same execution time for all data (too far) Design of formal WCET EST analysis tools Design of a Compiler which considers WCET as cost function Approach: -Integration with WCET EST analysis -Using timing model for specific optimizations
11
- 11 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 A WCET-Aware C-Compiler (WCC) ICD-C Parser ANSI C ICD-C IR Code Selector LLIR Register Allocator LLIR CRL2 aiT WCET Analysis CRL2 + WCET EST CRL2 LLIR WCET- Opt. ASM Analyses, Optimi- zations Target Processor: Infineon TriCore TC1796 [H. Falk et al.]
12
- 12 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Challenges for WCET EST -Minimization Worst-Case Execution Path (WCEP) WCET EST of a program = Length of longest execution path (WCEP) in that program WCET EST -Minimization: Reduction of the longest path Other optimizations do not result in a reduction of WCET EST Optimizations need to know the WCEP
13
- 13 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 WCET-oriented optimizations Extended loop analysis (CGO 09) Instruction cache locking (CODES/ISSS 07) Cache partitioning (WCET-Workshop 09) Procedure cloning (WCET-Workshop 07, CODES/ISSS 07, SCOPES 08) Procedure positioning (ECRTS 08) Function inlining (SMART 09) Loop unswitching/invariant paths (SCOPES 09) Loop unrolling (ECRTS 09) Register allocation (DAC 09) Scratchpad optimization (DAC 09) Extension towards multi-objective optimization (RTSS 08)
14
- 14 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Loop Unrolling as an Example Unrolling replaces the original loop with several instances of the loop body Positive Effects Reduced overhead for loop control Enables instruction level parallelism (ILP) Offers potential for following optimizations Unroll early in optimization chain Negative Effects Aggressive unrolling leads to I-cache overflows Additional spill code instructions Control code may cancel positive effects Consequences of transformation hardly known
15
- 15 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 WCET-aware Loop Unrolling via Backannotation WCET-information available at assembly level Unrolling to be applied at internal representation of source code Solution: Back-annotation: experimental worst-case execution time aware compiler WCC allows feeding information from assembly code back to source code WCET data Assembly code size Amount of spill code Memory architecture info available High-Level IR (Source Code) Low-Level IR (Assembly Code) Back- Annotation Mem. Spec.
16
- 16 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Results – WCET 100%: Avg. WCET for all benchmarks with –O3 & no unrolling WCET reduction between 10.2% and 15.4% WCET-driven Unrolling outperforms std. unrolling by 13.7% Relative WCET [%] STANDARD
17
- 17 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Register Allocation: Results 100% = WCET EST using Standard Graph Coloring (highest degree) 93% 24% 69% [H. Falk]
18
- 18 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Relative WCET EST with I-Cache Locking 5 Benchmarks/ARM920T/Postpass-Optimization (ARM920T) [S. Plazar et al.]
19
- 19 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Motivation Worst-case execution time aware compilation Memory-architecture aware compilation Summary Outline
20
- 20 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Importance of Energy Efficiency Courtesy: Philips© Hugo De Man, IMEC, 2007 Efficient software design needed, otherwise, the price for software flexibility cannot be paid. poor design techniques IPE=Inherent power efficiency AmI=Ambient Intelligence GOPs/J
21
- 21 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Energy Consumption in Mobile Devices [O. Vargas (Infineon Technologies): Minimum power consumption in mobile-phone memory subsystems; Pennwell Portable Design - September 2005;] Thanks to Thorsten Koch (Nokia/ Univ. Dortmund) for providing this source.
22
- 22 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Trends for the Speeds Speed gap between processor and main DRAM increases [P. Machanik: Approaches to Addressing the Memory Wall, TR Nov. 2002, U. Brisbane] 2 4 8 245 Speed years CPU Performance (1.5-2 p.a.) DRAM (1.07 p.a.) 31 2x every 2 years 1 0 Similar problems also for embedded systems & MPSoCs In the future: Memory access times >> processor cycle times Memory wall problem
23
- 23 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Hierarchical Memories using Scratch Pad Memories (SPM) Address space Fast Energy- efficient Usually timing- predictable scratch pad memory 0 FFF.. main SPM processor Hierarchy Example no tag memory SPM select Selection is by an appropriate address decoder (simple!) SPM is a small, physically separate memory mapped into the address space
24
- 24 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Migration of Data and Instructions Global Optimization Model Which object (array, loop, etc.) to be stored in SPM? Non-overlaying memory allocation: Gain g k & size s k for each object k. Maximise gain G = g k, respecting size of SPM SSP s k. Solution: knapsack algorithm. Overlaying allocation: Moving objects back and forth between hierarchy levels Processor Scratch pad memory, capacity SSP main memory ? For i.{ } for j..{ } while... Repeat function... Array... Int... Array Example:
25
- 25 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Scratchpad optimizations Energy model (PATMOS, 2001) Comparison with cache (TR 762, 2001; CODES 2002) Non-overlaying allocation (TR 756, 2001; DATE 2002) Overlaying allocation (ISSS 2002; PACS 2003; CODES 2004; GI 2005; TVLSI 2006; SAMOS 2006; Springer 2007) Partitioning (ASPDAC 2003; WMPI 2004) Predictability (ASPDAC 2004; WCET 2004; DATE 2005; Springer 2006) Cooperation with cache (DATE 2004; TCAD 2006) Multi-processes (ESTIMEDIA 2005) Allocation at run-time (SCOPES 2007)
26
- 26 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Dynamic Set of Multiple Applications MEM CPU SPM Manager SPM Manager App. 2 App. 1 App. n SPM App. 3 App. 2 App. 1 t Address space: SPM ? ? Compile-time partitioning of SPM no longer feasible Introduction of SPM-manager Runtime decisions, but compile-time supported [R. Pyka, Ch. Faßbach, M. Verma, H. Falk, P. Marwedel: Operating system integrated energy aware scratchpad allocation strategies for multi-process applications, SCOPES, 2007]
27
- 27 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Comparison of SPMM to Caches for SORT Baseline: Main memory only SPMM peak energy reduction by 83% at 4k Bytes scratchpad Cache peak: 75% at 2k 2-way cache SPMM capable of outperforming caches OS and libraries are not considered yet Chunk allocation results: SPM SizeΔ 4-way 102474,81% 204865,35% 409664,39% 819265,64% 1638463,73%
28
- 28 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Motivation Worst-case execution time aware compilation Memory-architecture aware compilation Summary Outline
29
- 29 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 Conclusion Embedded systems are tightly integrated with physics Timing is currently poorly handled Integration of WCET cost model into compiler WCC allows for a systematic reduction of the WCET in the PREDATOR project Optimizations can be reconsidered for WCET reduction Memory architecture aware compilation is the consequence of non-uniform memory accesses Scratch pads combine energy- and timing-efficiency
30
- 30 - technische universität dortmund fakultät für informatik p. marwedel, informatik 12, 2009 WCET estimation: AiT (AbsInt) Executable program CFG-Builder Loop Trafo AIP File ILP-Generator LP-Solver Evaluation WCET/ Visuali- zation Loop Bounds Static analyzer Value Analyzer Cache/Pipeline Analyzer PER-File CRL2-File Path Analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.