Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Electrical & Computer Engineering

Similar presentations


Presentation on theme: "Department of Electrical & Computer Engineering"— Presentation transcript:

1 Department of Electrical & Computer Engineering
Design Space Optimization of Embedded Memory Systems via Data Remapping Present by Xiang Mao Department of Electrical & Computer Engineering University of Florida

2 Outline Introduction Data Remapping and Design Space Exploration
Data Remapping Algorithm Experimental Methodology Result Analysis Conclusions 20:19:36

3 Introduction Memory in embedded system: Valuable Resource ~ Power Sink
Compile-time data remapping/reorganization Improve the spatial locality Lower memory cost and power needs (Viewed as a tool for design-space exploration) Improve the execution time and energy consumption (Viewed as a conventional complier optimization) 45% In the context of 2 levels of cache Explain the figure;-> explain the two ends 20:19:36

4 Introduction Features:
Fully automated; Applicable for pointer-based programming language; Running time linear in the size of the program. All the work is based on hardware models and an instuction set architecture(ISA) for the ARM family of processors using floating point and integer benchmarks. Simulation environment models an ARM-like processor but also includes floating point support. Previous work in this area: 1. semi-automated 2. restricted to memory that is statically allocated but C programming language, used extensively in embedded system domain, are usually pointer based. 20:19:36

5 Outline Introduction Data Remapping and Design Space Exploration
Data Remapping Algorithm Experimental Methodology Result Analysis Conclusions 20:19:36

6 Design Space Exploration
The goal of design exploration, as illustrated in this fig, is to fix the program under consideration and to vary its performance via optimizations, in search of the best hardware configuration. In this paper, the authors focus on the cache subsystem and seek to optimize its energy and cost requirement. 20:19:36

7 Design Space Exploration
Negative value in execution reduction. Since the cache size reduced SA110 – Intel StrongARM 110 Processor 179.ART – floating point benchmark | Perimeter,TreeAdd – integer benchmarks 20:19:36

8 Outline Introduction Data Remapping and Design Space Exploration
Data Remapping Algorithm Experimental Methodology Result Analysis Conclusions 20:19:36

9 Data Remapping Algorithm
Goal: New layout exhibits a better correlation with the application reference sequence. Target: Record data types ubiquitous to real-word, pointer-heavy applications. Record: A set of diverse data types grouped within a unique declaration; Field: Elements of the set; Object: Instances of a record. 20:19:36

10 Record Model Key Field Datum Field Next Field K D N K D N K D N
20:19:36

11 Data Remapping Algorithm
The remapping optimization consists of 3 phases: Gathering Phase Remapping of Global Data Objects Remapping of Dynamic Data Objects Based on the above and some other thoughts… 20:19:36

12 Gathering Phase NAP – Neighbor Affinity Probability
Only data types with NAP lower than some threshold are marked for remapping 20:19:36

13 Remapping of Global Data Objects
20:19:36

14 Remapping of Dynamic Data Objects
Light-weight wrappers are automatically generated around traditonal memory allocation requests in the program. Large memory pool is allocated and smaller portions within the pool are reassigned with successive allocation requests. The need for cache-conscious data placement is even more important for dynamically allocated objects. Traditional allocation strategies ignore the underlying memory hierarchy in favor of low run-time overhead but results in poor interactions between data layout and program access pattern. The goal is to produce a filed allocation layout as the figure. 20:19:36

15 Remapping of Dynamic Data Objects
Rely on a run-time comparison of the pointer value against the stack pointer register to determine the proper offset. 20:19:36

16 Outline Introduction Data Remapping and Design Space Exploration
Data Remapping Algorithm Experimental Methodology Result Analysis Conclusions 20:19:36

17 The Target Processor Verilog model of ARM-like processor
Synthesize the core using Synopsys Design Complier targeted toward a TSMC 0.25μ library from LEDA System, Inc System Clock 100MHz, 5-stage RISC, the processor core is about 250,000 NAND gates. 20:19:36

18 The Target Processor The power consumption is constant. This is likely due to the fact that in a simple RISC processor with one ALU, the datapath is always busy, and thus the power variation is minimal. 20:19:36

19 Model of Cache Power Consumption
Assume L1, L2 cache to be SRAM and use the approach of Kamble and Ghose Drawbacks: Need to collect runtime statistics such as hit/miss counts and ratio of read/write request; The model only accounts for dynamic power dissipation. (For 0.25μ technology, dynamic power ≈ 102*static power) 20:19:36

20 Outline Introduction Data Remapping and Design Space Exploration
Data Remapping Algorithm Experimental Methodology Result Analysis Conclusions 20:19:36

21 Result Analysis The benchmarks used here includes floating-point and integer applications like neural network simulation, large database management, image matching and scientific computation from the Data Intensive System(DIS), OLDEN and SPEC2000 20:19:36

22 Result Analysis Two no energy reduction, others average 20-30percentage. Floating-point 71%!!! Almost all have execution time reduction 20:19:36

23 Result Analysis A Half 20:19:36

24 Result Analysis A Half 20:19:36

25 Result Analysis Energy for ARM-like core vs. L1+L2 cache 20:19:36

26 Outline Introduction Data Remapping and Design Space Exploration
Data Remapping Algorithm Experimental Methodology Result Analysis Conclusions 20:19:36

27 Conclusion The paper propses a novel compile-time data remapping algorithm that applicable to pointer-intensive dynamic applications and leads to a 50% reduction of both L1 and L2 cache, yeielding a energy savings of 57%. It also improves the energy savings of an ARM-like core. Further works such as adding in static(leakage) power are still needed. 20:19:36

28 ? Thanks and Questions. 20:19:36


Download ppt "Department of Electrical & Computer Engineering"

Similar presentations


Ads by Google