The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece nivas@physics.auth.gr Aristotle University of Thessaloniki

2 Strong requirement for energy consumption reduction (e.g. in portable multimedia applications)  Great need for power optimization strategies, especially in higher design levels Since memory accesses are the main source of energy consumption in digital systems  Code transformations aiming at an improved memory organization provide significant power savings Since ASICs lack flexibility and GPP are prohibitively expensive in terms of energy-performance  The embedded systems industry has an increasing interest in ASIPs Aristotle University of Thessaloniki Motivation

3 Objectives Explore the effect of data-reuse transformations in terms of energy consumption and performance of a multimedia application executed on an ASIP Design an ASIP for multimedia applications based on low-cost enhancements of an existing processor Run a simple methodology for the implementation of the ASIP Aristotle University of Thessaloniki

4 Data Reuse Transformations – Memory organization Applying data-reuse transformations on data- dominated applications leads to a custom memory organization Exploitation of the temporal locality in memory accesses Most of the accesses to smaller memories Benchmark: The two dimensional Three-Step Search (TSS) algorithm for motion estimation Aristotle University of Thessaloniki

5 Data Reuse Transformations – Example Aristotle University of Thessaloniki Original CodeTransformed Code Introduction of a line buffer of reference windows for the previous frame (indicated bold) for(x=0;x<N/B;x++) /* For all blocks in the current frame */ for(y=0;y<M/B;y++) for(i=-p;i<p+1;i++) /* For all candidate blocks */ for(j=-p;j<p+1;j++) for(k=0;k<B;k++) /* For all pixels in the block */ for(l=0;l<B;l++) { read pixel in current frame; if (current pixel displaced by i, j) lies outside frame previous pixel = 0; else read pixel from previous frame; } for(x=0;x<N/B;x++) /* For all blocks in a line of blocks */ for(i=0;i<B+2p;i++) /* For a line of ref. windows */ for(j=0;j<M;j++); { if (current pixel displaced by i) lies outside frame previous_line[i][j] = 0; else read previous_line from previous frame; } for(y=0;y<M/B;y++) for(i=-p;i<p+1;i++) /* For all candidate blocks */ for(j=-p;j<p+1;j++) for(k=0;k<B;k++) /* For all pixels in the block */ for(l=0;l<B;l++) { read pixel in current frame; if (current pixel displaced by j) lies outside frame previous pixel = 0; else read pixel from previous_line; }

6 ASIP Design Flow A RISC, MIPS-like machine is used as the base processor The instruction set is extended by special instructions in order to decrease the execution cycles These special instructions correspond to simple instruction patterns that appear frequently Aristotle University of Thessaloniki

7 ASIP Design Flow - Dynamic Profiling Dynamic profiling with the GNU tools configured for the MIPS processor was performed Heavily executed portions of the code was identified  Control flow statements overhead is 24% of the total execution cycles  Addressing generation instructions and memory accesses are 62% of the total execution cycles  Only 14% of the execution time is consumed on pure computational operations Aristotle University of Thessaloniki

8 ASIP Design Flow - Instruction Set Extensions DescriptionAdditional Hardware RequirementsPenalty Inc+Branch_Rs_Rd_TargetControl Logic + Incrementer UnitArea+Delay Add+SW_L#_Rs_Rt_RdControl LogicArea Add+LW_L#_Rs_Rt_RdControl LogicArea L# is the desired level of the custom memory hierarchy “Increment and Branch” instruction to reduce loop iteration overhead Store/Load Word with addition for address calculation Direct support of the custom memory hierarchy Aristotle University of Thessaloniki

9 ASIP Design Flow- Code Re-Generation Original code is parsed and the simple instructions are reordered to construct the instruction extensions-patterns Patterns are substituted by the new defined instructions Simple instructions are reordered to keep the pipeline as full as possible Aristotle University of Thessaloniki

10 ASIP Design Flow- Cycle Accurate and Hardware Models The code for the different transformations applied on the TSS application was created A Cycle Accurate simulator for the ASIP in the SystemC language, was constructed, to exercise these codes For each transformation we measured the number of  the executed instructions (accesses to the instruction memory)  the execution cycles  the accesses to the data memories A hardware model in VHDL language was designed using 0.18um STM technology Aristotle University of Thessaloniki

11 Experimental Results The TSS was executed on digital pictures of MxN=144x176 pixels. The block size B was set to 16 while the search window size [-p,p] was set to [-7,7]. SRAMs of appropriate size were assumed for memories Energy models for the memories obtained from the Embedded Memory Generator of Dolphin Integration The best cases of ASIP and MIPS were compared Aristotle University of Thessaloniki Performance Gain54% Energy reduction of memory system42%  No degradation in clock period (250Mhz)

12 Performance Results Simplified address equations => Performance gain Performance gain of 55% for P4 compared to the original TSS Aristotle University of Thessaloniki

13 Energy Results Data Memory savings due to smaller memories Instruction Memory savings due to fewer accesses Energy consumption is dominated by accesses on the instruction memory P4 delivers 61% energy savings compared to the original TSS Aristotle University of Thessaloniki

14 Energy Results - Instruction Buffering TSS code consists of nested loops where the inner loop is the most heavily executed Instruction Buffering: the code of the inner loop can be moved to a local storage structure This structure was modeled as a 8-registers file with negligible energy consumption 33% average and 68% maximum (P4) energy reduction compared to the case with no instruction buffering Aristotle University of Thessaloniki

15 Conclusions The effect of data-reuse transformations on the ASIP performance and energy consumption was explored An ASIP for multimedia applications was designed following a simple design flow Significant boost in benefits of applying data reuse to GPP environment is achieved by selecting the appropriate instruction set extension for an ASIP approach Preliminary results on instruction buffering indicate that significant energy reduction is feasible Aristotle University of Thessaloniki

The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

Similar presentations

Presentation on theme: "The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S.

Similar presentations

Presentation on theme: "The Effect of Data-Reuse Transformations on Multimedia Applications for Application Specific Processors N. Vassiliadis, A. Chormoviti, N. Kavvadias, S."— Presentation transcript:

Similar presentations

About project

Feedback