Download presentation
Presentation is loading. Please wait.
Published byAddison Whiten Modified over 9 years ago
1
- 1 - P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund Memory-aware compilation enables fast, energy-efficient, timing predictable memory accesses Peter Marwedel 12, Heiko Falk 1, Christian Ferdinand 3 Paul Lokuciejewski 1, Manish Verma 1, Lars Wehmeyer 12 1 Universität Dortmund, Informatik 12 2 Informatik Centrum Dortmund (ICD) 3 AbsInt GmbH, Saarbrücken Peter Marwedel 12, Heiko Falk 1, Christian Ferdinand 3 Paul Lokuciejewski 1, Manish Verma 1, Lars Wehmeyer 12 1 Universität Dortmund, Informatik 12 2 Informatik Centrum Dortmund (ICD) 3 AbsInt GmbH, Saarbrücken
2
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 2 - Key properties of embedded systems embedded real-time embedded real-time Strong correlation between embedded and real-time systems „A reactive system is one which is in continual interaction with is environment and executes at a pace determined by that environment“ [Bergé, 1995] Strong correlation between embedded and reactive systems
3
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 3 - Serious mismatch Despite considerable progress in software and hardware techniques, when embedded computing systems absolutely must meet tight timing constraints, many of the advances in computing become part of the problem rather than part of the solution. What would it take to achieve concurrent and networked embedded software that was absolutely positively on time … ?..What is needed is nearly a reinvention of computer science. Edward A. Lee: Absolutely Positively On Time: What Would It Take?, Editorial, Draft version: May 18, 2005, Published in: Embedded Systems Column, IEEE Computer, July, 2005
4
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 4 - Technology "advances" will make the situation worse 2 4 8 245 Speed years CPU (1.5-2 p.a.) DRAM (1.07 p.a.) 31 2x every 2 years 1 0 Increasing gap between processor and memory speeds Future semiconductor technology will be inherently unreliable, e.g. due to quantum effects and will require fault tolerance mechanisms to be used. Timing "redundancy" used?
5
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 5 - Scratchpad seen to help with timing problems Fortunately, there is quite a bit to draw on. To name a few examples, architecture techniques such as software-managed caches (scratchpad memories) promise to deliver much of the benefit of memory hierarchy without the timing unpredictability… [E.Lee, 2005] Fortunately, there is quite a bit to draw on. To name a few examples, architecture techniques such as software-managed caches (scratchpad memories) promise to deliver much of the benefit of memory hierarchy without the timing unpredictability… [E.Lee, 2005]
6
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 6 - Scratch pad memories (SPM): Fast, energy-efficient, timing-predictable Address space scratch pad memory 0 FFF.. ARM7TDMI cores, well- known for low power consumption Example main memory Called “tightly coupled memory” by ARM Small; no tag memory
7
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 7 - Worst case timing analysis using aiT SP size C program encc executable ARMulator aiT Actual performance WCET
8
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 8 - Results for G.721 L. Wehmeyer, P. Marwedel: Influence of Onchip Scratchpad Memories on WCET: 4th Intl Workshop on worst-case execution time analysis, (WCET), 2004 L. Wehmeyer, P. Marwedel: Influence of Memory Hierarchies on Predictability for Time Constrained Embedded Software, Design Automation and Test in Europe (DATE), 2005 Using Scratchpad:Using Unified Cache:
9
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 9 - Impact on access time and energy consumption Energy Access times Small memories also provide faster access time and reduced energy consumption CACTI model for SRAM
10
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 10 - Energy savings for memory system energy
11
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 11 - Static allocation of memory objects Which object (array, function, etc.) to be stored in SPM? Gain g k and size s k for each object k. Maximise gain G = g k, respecting size of SPM s k ≤ SSP. Static memory allocation: Solution: knapsack algorithm. Processor Scratch pad memory, capacity SSP board Main memory ? For i.{ } for j..{ } while... Repeat call... Array... Int... Array Example:
12
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 12 - Dynamic replacement within scratch pad Effectively results in a kind of compiler- controlled swapping for SPM Address assignment within SPM required (paging or segmentation-like) Effectively results in a kind of compiler- controlled swapping for SPM Address assignment within SPM required (paging or segmentation-like) M.Verma, P.Marwedel (U. Dortmund): Dynamic Overlay of Scratchpad Memory for Energy Minimization, ISSS, 2004 CPU Memory SPM
13
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 13 - Dynamic replacement of data within scratch pad: based on liveness analysis SP Size = |A| = |T3| Solution: A SP & T3 SP Solution: A SP & T3 SP SPILL_STORE(A); SPILL_LOAD(T3); SPILL_STORE(A); SPILL_LOAD(T3); SPILL_LOAD(A); T3 DEF A USE A MOD A USE T3 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10
14
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 14 - Dynamic replacement within scratch pad - Results for edge detection relative to static allocation -
15
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 15 - Impact of partitioning scratch pads "main" memory Scratch pad 2, 16 k entries Scratch pad 1, 2 k entries Scratch pad 0, 256 entries 0 addresses
16
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 16 - Results for parts of GSM coder/decoder A key advantage of partitioned scratchpads for multiple applications is their ability to adapt to the size of the current working set. „Working set“
17
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 17 - Multiple Processes: Non-Saving Context Switch Process P1 Process P3 Process P2 Scratchpad Process P1 Non-Saving Context Switch (Non-Saving) Partitions SPM into disjoint regions Each process is assigned a SPM region Copies contents during initialization Good for large scratchpads Non-Saving Context Switch (Non-Saving) Partitions SPM into disjoint regions Each process is assigned a SPM region Copies contents during initialization Good for large scratchpads Process P2 Process P3 P1 P2 P3
18
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 18 - Saving/Restoring Context Switch Saving Context Switch (Saving) Utilizes SPM as a common region shared all processes Contents of processes are copied on/off the SPM at context switch Good for small scratchpads Saving Context Switch (Saving) Utilizes SPM as a common region shared all processes Contents of processes are copied on/off the SPM at context switch Good for small scratchpads P1 P2 P3 Scratchpad Process P3 Process P1 Process P2 Saving/Restoring at context switch
19
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 19 - Hybrid Context Switch Hybrid Context Switch (Hybrid) Disjoint + Shared SPM regions Good for all scratchpads Analysis is similar to Non-Saving Approach Runtime: O(nM 3 ) Hybrid Context Switch (Hybrid) Disjoint + Shared SPM regions Good for all scratchpads Analysis is similar to Non-Saving Approach Runtime: O(nM 3 ) P1 P2 P3 Scratchpad Process P1 Process P3 Process P2 Process P1,P2, P3 Process P1 Process P2 Process P3 Process P1 Process P2 Process P3
20
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 20 - Multi-process Scratchpad Allocation: Results Hybrid is the best for all SPM sizes. Energy reduction @ 4kB SPM is 27% for Hybrid approach. Avoids poor timing predictability of cache-based system after context switch. Hybrid is the best for all SPM sizes. Energy reduction @ 4kB SPM is 27% for Hybrid approach. Avoids poor timing predictability of cache-based system after context switch. edge detection, adpcm, g721, mpeg 27% SPA: Single Process Approach
21
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 21 - Multi-processor ARM (MPARM) Framework –Homogenous SMP ~ CELL processor –Processing Unit : ARM7T processor –Shared Coherent Main Memory –Private Memory: Scratchpad Memory SPM Interrupt Device Semaphore Device ARM Interconnect (AMBA or STBus) Shared Main Memory
22
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 22 - Using optimization in an gcc-based tool flow Source is split into 2 different file by specially developed memory optimizer tool *. Memory Optimizer ICD-C Compiler.c.txt.c ARM-GCC Compiler.ld.exe application source profile Info. main mem. src spm src. linker script executable *Built with new tool design suite ICD-C available from ICD (see www.icd.de/es)
23
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 23 - Results (MOMPARM) DES-Encryption: 4 processors: 2 Controllers+2 Compute Engines Energy values from ST Microelectronics Result of ongoing cooperation between U. Bologna and U. Dortmund supported by ARTIST2 network of excellence.
24
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 24 - State of the art of SPM algorithms FeatureStatic allocationDynamic allocation Partitioned SPMsWehmeyer et al. [WMPI 2004] - WCET analysisWehmeyer et al. [WS WCET 04, DATE 05] Wehmeyer et al. [Thesis] Multiple processesVerma et al. [ISSS 2004] Future work Multiprocessor Systems Verma et al. [Estimedia 2005] Verma et al. [ongoing work] Sections from arraysNot always applicableIMEC (MHLA), Kandemir
25
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 25 - Extension: WCET-aware compiler Loop bounds analysis Standard input to aiT ANSI-C Programm ANSI-C Frontend Parse Tree IR-Code Generator Medium Level IR LLIR-Code Generator Low Level IR Code Generator WCET optimized assembly code Optimization Techniques Analyses LLIR2crl crl2llir Pipeline Analysis Cache Analysis Value Analysis CRL2 CRL2 with WCET Info Path Analysis ARTIST2
26
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 26 - Opportunities Precise WCET information for run-time optimizations -Single implementation of hardware timing models -Accurate information on pipeline influence -Accurate information on timing of memory -Trade-off Cache vs. Scratchpad Optimization Pass additional information (flow facts) to aiT Potential for tighter bounds? (e.g. due to pointer disambiguation) Aggressive optimizations for code on WCET path Respecting WCET constraints during compilation Reduction of jitter in multimedia applications Alternative input to aiT (compare compiler output) Precise WCET information for run-time optimizations -Single implementation of hardware timing models -Accurate information on pipeline influence -Accurate information on timing of memory -Trade-off Cache vs. Scratchpad Optimization Pass additional information (flow facts) to aiT Potential for tighter bounds? (e.g. due to pointer disambiguation) Aggressive optimizations for code on WCET path Respecting WCET constraints during compilation Reduction of jitter in multimedia applications Alternative input to aiT (compare compiler output)
27
P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2005 Universität Dortmund - 27 - Conclusion Timeliness and timing predictability seriously missing in key concepts of current information technology Scratchpads are seen as a potential contribution towards new architectural concepts -Comprehensive set of allocation methods has been developed Static allocation Dynamic allocation Full integration of WCET tools into compiler tool chain enables further explicit considerations of time. Timeliness and timing predictability seriously missing in key concepts of current information technology Scratchpads are seen as a potential contribution towards new architectural concepts -Comprehensive set of allocation methods has been developed Static allocation Dynamic allocation Full integration of WCET tools into compiler tool chain enables further explicit considerations of time.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.