Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and.

Similar presentations


Presentation on theme: "Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and."— Presentation transcript:

1 Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and Test in Europe Conference and Exhibition Presenter : Hung Yu Chen

2 Hung-Yu Chen 2/21 2015/6/21 Abstract  Memory partitioning is a effective approach to memory energy optimization in embedded systems. Spatial locality of the memory address profile is the key property that partitioning exploits to determine an efficient multi-bank memory architecture. This paper presents an approach, called address clustering, for increasing the locality of given memory access profile, and thus improving the efficiency of partitioning. Results obtained on several embedded applications running on an ARM7 core show average energy reductions of 25% (maximum 57%) w.r.t a partitioned memory architecture synthesized without resorting to address clustering.

3 Hung-Yu Chen 3/21 2015/6/21 Outline  What’s the problem?  Memory Energy  Memory Partitioning  Address Clustering  Experimental Result  Conclusions

4 Hung-Yu Chen 4/21 2015/6/21 What’s the problem?  Modern SoC platforms usually contain one or more processors.  the increasing gap between processor and memory speed.  Various types of on-chip embedded memories providing shorting latencies and wider interfaces.  Problem:  Ubiquity of embedded memories makes them the largest contributor to the overall energy budget of a chip.

5 Hung-Yu Chen 5/21 2015/6/21 Memory Energy  Model : E men = ∑ N i=1 Cost(i);  N : number of accesses during the computation.  Cost(i) : cost of an access due to the memory organization and the cost of the physical access given by technology.  Memory energy optimization : 1. Reducing Cost(i):  build low-energy memory architecture. 2. Reducing N:  modify the memory access pattern. 3. Both two.

6 Hung-Yu Chen 6/21 2015/6/21 Memory Partitioning  memory partitioning technique.

7 Hung-Yu Chen 7/21 2015/6/21 Memory Partitioning (cont.)  Figure 1-a :  The whole address space of the application is mapped to a single SRAM memory array.  Figure 1-b :  A dynamic access profile.  Figure 1-c :  The partitioned memory.  Notice that we need to account for the power consumed in the entire partitioned memory system.

8 Hung-Yu Chen 8/21 2015/6/21 Address Clustering-Example  MPEG Decoding application for ARM7 core  Instruction stream

9 Hung-Yu Chen 9/21 2015/6/21 Address Clustering-Example (cont.)  Figure 2 show :  Total number of addresses : 31,233 (range from 0 to 124,892) Memory cut has 1,952 rows * 512 columns.  Power consumes 170mJ. (44.4 million total read)  Memory partitioning :  Three memory blocks of sizes 736*256696*512892*512  Power consumes 96mJ. (inclusive of the overhead)  43.5% Energy reduction :  696*512 : keep the majority (82%) of the memory accesses. (36 million out of 44.4)

10 Hung-Yu Chen 10/21 2015/6/21 Address Clustering-Example (cont.)  Figure 3 : Clustered Address Profile of a MPEG Decoder  Two memory block sizes : 212*1281900*512  Power : 42mJ. (an additional 56% of energy saved)  99% of the memory access. (43.99 million out of 44.4 )

11 Hung-Yu Chen 11/21 2015/6/21 Address Clustering-Problem  Find a relocation of a proper subset of the address space.  Maximize the locality of the dynamic trace.  Minimizing the energy consumption of the memory architecture  Cost Metrics  Dynamic access profile C = {c 0,c 1,….,c N-1 }  D(C,W) = max i (S i ), i = 0, 1, …, N-W (S i ) = ∑ W-1 j=0 c i+j, W : a sliding window of size  d(C,W) = D(C,W) / Tot. Tot = ∑ N i=0 C i

12 Hung-Yu Chen 12/21 2015/6/21 Address Clustering-Problem (cont.)  Figure4 shows the values of d(C,W) for w = 32, 64, 128, 256, 512, about Figure2. 80%

13 Hung-Yu Chen 13/21 2015/6/21 Address Clustering-Exploration  High-level pseudo-code :  Explore : find a good value of W

14 Hung-Yu Chen 14/21 2015/6/21 Address Clustering-Clustering Algorithm  Cluster : returns a modified trace whose first M locations contain the M most visited addresses.

15 Hung-Yu Chen 15/21 2015/6/21 Address Clustering-Encoder  Hardware Encode :  the swap of address pair -> 2M Cluster Address.  f(X) represents a function if X belongs to the set of 2M.  Clustering address X’ = R(X).  32 input, combinational network.

16 Hung-Yu Chen 16/21 2015/6/21 Experimental Result  Benchmarks are taken from the Ptolemy distribution, others come from the MediaBench suite.  Platform : ARM software development kit.  Table1 :  #Addr : total number of distinct addresses.  E mono : the energy of the monolithic memory that contains all the data/instructions.  E partitioned : total memory energy of a partitioned memory architecture.  M = 256, 512, 1024 : memory partitioning combined with address clustering.

17 Hung-Yu Chen 17/21 2015/6/21 Experimental Result (cont.)

18 Hung-Yu Chen 18/21 2015/6/21 Experimental Result (cont.)  Original vs. Clustering (Energy)

19 Hung-Yu Chen 19/21 2015/6/21 Encoder Overhead Analysis  Encoders have been synthesized with Synopsys DesignCompier on a 0.25um technology by STMicroelectronics  Power figure (Figure 8) are obtained with Synopsys PowerCompier.  The energy figures over the various applications is relatively small 1. The complexity of the decoder is basically independent of the set of addresses that are clustered. 2. The switching activity of the address lines is very similar for all benchmarks.

20 Hung-Yu Chen 20/21 2015/6/21 Encoder Overhead Analysis (cont.)  16K memory which dissipates about 375 mW  frequency of 150Mhz.  Power = 7.5 mW for M = 1024.

21 Hung-Yu Chen 21/21 2015/6/21 Conclusions  Energy reduction achievable by memory partitioning technology can be improved sensibly by increasing the locality of the trace.  Proposed an architectural solution, called Address Clustering.  Experimental results on a set of typical embedded applications running on an ARM- based system.  Address Clustering is able to reduce the energy consumption of a partitioned memory architecture by 25% on average (maximum 57%) with respect to the partitioning driving by the original trace.


Download ppt "Improving the Efficiency of Memory Partitioning by Address Clustering Alberto MaciiEnrico MaciiMassimo Poncino Proceedings of the Design,Automation and."

Similar presentations


Ads by Google